Spatial Patterns of Urban Innovation and Productivity
1. Radu Stancut
Foundations of Urban Science
Assignment #5
Final Paper
Spatial Patterns of Urban Innovation and Productivity
The purpose of creating a science of cities is to bring a fact-based rigor and standardization to a
critical human subject: the way we live with others and the planet. To the extent that we can
fruitfully observe our urban environment, capture accurate readings, and allow for hypothesis
testing we are beholden to do so. To affect our surroundings in an intentional and predictive
manner, ideally for the mutual benefit of our civilization and the environment, grants us greater
control over our long-term success as a species.
Many practices may come to bear when developing this new science and we should be
opportunistic in taking what works in other fields, and applying their techniques. Jane Jacobs
famously tackled the topic of what kind of a problem a city is.1
Whether or not we come to agree
with her assessment, that cities are problems of organized complexity, we should follow her
rationale: identify the features and functions of the urban environment, see what analogous
problems we have tackled in other areas, most especially the sciences, and apply similar
approaches and methods, modified appropriately for the urban field and that most messy of
subject matters: people.
Recent increases in technological capabilities, such as storage, computational power, and easy
access to data, coupled with a belief that there are valuable and actionable insights to be found in
data have ushered in the concept of a science of cities. This paper takes the notion of a “science
of cities” to mean that urban environments may now be considered objects of study within a
1
Jacobs, J. 1961. The Death and Life of Great American Cities. New York: Random House, Inc.
2. scientific framework, where the structure and behavior of cities may be systematically studied
via observation and experiment.2
Lit Review
Any science of cities approach would appear to require delving into big data. The availability of new
forms and sources of data are opening up the possibility of taking measurements at a speed never
previously available in human history.3
The belief would seem to be that with enough data we will be
able to identify patterns and delve deeper,4
perhaps identifying underlying principles and laws.
Big data is certainly a social phenomenon,5
but its effectiveness will depend on how it is used and the
principles put in place. We have for instance the following challenges to consider:6
• Exponential data growth
• New types of data
• Privacy and access
• Institutional barriers
• Use and relevance
This paper deals primarily with the last item and uses data to attempt to extract insights on urban
behavior and outcomes through a modest analysis of GDP and patent information. The goal is to pick up
on potential power laws and see if they hold and can tell us something about how a city behaves.7
2
http://www.oxforddictionaries.com/us/definition/american_english/science
3
Koonin, S. Big data and city living - what can it do for us?. 2012 The Royal Statistical Society
4
http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory
5
danah boyd & Kate Crawford (2012) CRITICAL QUESTIONS FOR BIG DATA, Information, Communication & Society,
15:5, 662-679, DOI: 10.1080/1369118X.2012.678878
6
Koonin, S. Big data and city living - what can it do for us?. 2012 The Royal Statistical Society
7
Bettencourt, L.M.A. West, G. (2011) Bigger Cities do More with Less. Scientific American
3. Materials and Methods
In exploring cities for patterns and regularities this paper focused on economic, population, and
innovation features of Metropolitan Statistical Areas (MSAs). The unit of analysis for all research below
was the MSA, unless specified otherwise. The three main sections below correspond to the following
questions, and will be expanded on during the analysis in each respective portion:
1. What is the relationship between patenting performance and economic performance?
2. What is the technological profile of the New York MSA? How does this profile contrast/compare
with that of (Boston, Houston, and the San Jose MSAs)?
3. How diverse are the metropolitan patenting portfolios and what does the resulting pattern
reveal about patenting across metropolitan areas?
Data was collected from three main sources: the Bureau of Economic Analysis (BEA), which provided
GDP per capita numbers by MSAs and broke down the technology classes found within MSAs; the
Census, for population numbers; and the U.S. Patent Office (USPTO), where patents by both technology
class and MSAs could be found.
The variables for each section are described below, as part of the methods of data manipulation and
analysis. The table below outlines the breakdown of data by section and may be used as a reference
guide.
Section Unit of Analysis Variables Sources
I. Patents and Economic
Development MSA
1) per capita real GDP
2) population
3) patent intensity
BEA; Census;
USPTO
II. Technological Profiles of
Metropolitan Areas MSA
1) patents by MSA by
technology class USPTO
III. Technological Heat Maps of
Metropolitan Areas MSA
1) Tally of technology class
patents by MSA BEA; USPTO
4. I. Patents and Economic Development
Per capita real GDP was compared against patent intensity, within each MSA, in order to better
understand the relationship between patenting performance and economic performance. Per capita
real GDP was acquired directly from the BEA, while patent intensity had to be constructed from MSA
patents (USPTO) and population (Census). Patent intensity was defined as (MSA patents / MSA
population) x 100,000 (since the numbers tend to be small for some locations).
Data was collected from the sources mentioned above, uploaded into Python, and the disparate
resulting data tables matched on MSA ID codes/FIPS. Having merged the data sets together we now had
in one table information on population, per capita GDP, and, patent counts, spanning from 2001
through 2012, inclusive. As suggested in the assignment document, per capita GDP and patents were
averaged over a five year window. The five year time frame was used to smooth the numbers and help
minimize distortions. Additionally, different time frames were used for the patents (2001-2005) and GDP
(2008-2012) to account for the time delay in patents “coming on-line” and to set up the analysis for
possible causality, with patents leading to greater economic activity and not the other way around. With
the data merged, and the averages calculated, it was now possible to construct the patent intensity
variable (formula above) and generate plots.
Plotting the log of both GDP and patent intensity shows a positive correlation (coefficient of 3.45; R-
squared: 0.875; see Appendix). Subsequent plots show additional positive correlations between
population and patent intensity (Appendix: ‘Population Influence on Patent Intensity’) and population
and average GDP (Appendix: 'Population and MSA GDP'). All three plots/numbers provide evidence that
size does matter and that it is likely the larger the MSA, the more patent activity there exists and the
higher the GDP.
5. II. Technological Profiles of Metropolitan Areas
Below I describe the technological profile of the New York MSA and compare it with three other
metropolitan areas: Boston, Houston, and the San Jose. In each instance, the variables analyzed were
counts of patents by technology class within each MSA. All data for this section was acquired from the
USPTO and uploaded to Python where numbers were tabulated and plots/graphs generated. The focus
was on the top 10 technologies of each MSA and what could be ascertained from this information.
New York
The top 10 technologies of New York account for nearly a third (32%, Appendix) of all patent
technologies. This is in line with what we will see from the other three MSAs below. As for each MSA, an
index was created on the top 10 technologies, pegged against the top technology and we see a marked
drop off of ~40% from the top technology (Drug, Bio-Affecting and Body Treating Compositions) to the
technology in second place (Multiplex Communications). This drop off is also not uncommon for the
selected MSAs, with one exception (San Jose).
6. The final exploratory step was to plot
collected (2000-2011) to get a better idea
cities may be found in the Appendix,
reference.
was to plotting the top 10 technologies for each city through the years
2011) to get a better idea of patent activity over time. Plots for each of the subsequent
cities may be found in the Appendix, the New York ones were presented here for convenience and
through the years
of patent activity over time. Plots for each of the subsequent
for convenience and as a
7. Boston
Of the three additionally selected MSAs, Boston is the one most in line with New York. Boston
technologies account for a third of all patent activity and there is a similar
(Drug, Bio-Affecting and Body Treating Compositions
and Microbiology).
Houston
Houston introduces our first difference
northeast MSAs, with the top 10 technologies accounting for 43% of all patent activity and the drop off
from the first place patent class (Wells [
second (Synthetic Resins or Natural Rubbers
innovatively, with respect to patent activity over the past decade, of
report.
San Jose
San Jose is also unlike the northeast MSA
to be both more concentrated and more diverse
Of the three additionally selected MSAs, Boston is the one most in line with New York. Boston
a third of all patent activity and there is a similar drop off from technology #1
Affecting and Body Treating Compositions) to technology #2 (Chemistry: Molecular Biology
Houston introduces our first differences in the MSA comparison: it is more “top heavy
technologies accounting for 43% of all patent activity and the drop off
Wells [shafts or deep borings in the earth, e.g., for oil and gas]
Synthetic Resins or Natural Rubbers) is over 60%. Houston would appear to be
tent activity over the past decade, of the four MSA’s highlighted in this
the northeast MSAs but in a different way than Houston. San Jose would appear
to be both more concentrated and more diverse than New York, a paradox revealed by the numbers.
Of the three additionally selected MSAs, Boston is the one most in line with New York. Boston’s top ten
drop off from technology #1
Chemistry: Molecular Biology
top heavy” than the two
technologies accounting for 43% of all patent activity and the drop off
he earth, e.g., for oil and gas]) to the
) is over 60%. Houston would appear to be the least diverse
s highlighted in this
in a different way than Houston. San Jose would appear
revealed by the numbers.
8. San Jose’s top 10 technologies account for nearly 40% of all patent activity, but within this group the
patents are more evenly distributed; five of the San Jose industries are within 40% of the lead patent
category (Semiconductor Device Manufacturing: Process), while New York and Boston only have one
such industry each within their MSA
Lastly, the plotting of patents by year shows that 2010 and 2011 were exceptional for all four MSAs in
the following technological areas, something that would require additional research to explain:
New York – Multiplex Communications; DP: Financial, Business Practice, Management, or
Cost/Price Determination (Data Processing)
Boston – Multiplex Communications; Multicomputer Data Transferring (Electrical Computers
and Digital Processing Systems)
Houston – Boring or Penetrating the Earth
San Jose – Multiplex Communications; Multicomputer Data Transferring (Electrical Computers
and Digital Processing Systems); DP: Database and File Management or Data Structures (Data
Processing)
III. Technological Heat Maps of Metropolitan Areas
Here again we take a global look at MSAs and through the use of a different visualization, a heat map,
attempt to glean a better understanding of urban innovation by comparing tallies of technology class
patents by MSAs. Two variables were mapped against one another, patent technology classes on the
vertical axis and MSAs on the horizontal axis, both from the USPTO. This resulted in a large grid, a 481
(patent technology classes) x 367 (MSAs) matrix. A ‘for’ loop was implemented in Python to read each
instance of a technology class per MSA and where a match was found a “Y” was placed in that
respective patent/MSA cell. Following the completion of the ‘for’ loop the “Y” instances were summed
by MSA and the grid was sorted along the horizontal axis (MSAs) from least “Y’s” to most. Due to the
density of the matrix, “Y” cells were further highlighted in green to provide a clearer visual
representation.
9. Above we can see the green areas, instances of patent activity by MSAs within technology areas, picking
up or becoming denser as we scan from left to right. This is the expected and uninteresting part; what is
non-trivial, however, are the gaps or black areas shown above. Based on the image above and
corresponding data we can report that MSAs are lagging in several patent areas (listed in Appendix).
10. Conclusion
Based on the plots and numbers presented I would tentatively argue that MSAs, at least in the United
States show a consistent and super-linear effect in relation to population and GDP per capita and
population and patent intensity. Throughout our group we saw increases above the ratio of 1.0
suggesting that greater populations lead to greater returns, in this case on wealth and innovation as
measured by our proxy statistics.
Additional data could be collected to investigate the topics pointed out in the Materials and Methods
section more thoroughly. So far, what has been shown is correlation; it would be interesting to test for
causality and see in which direction the effect is more pronounced: GDP to patent intensity or vice
versa.
Population was investigated on an MSA level but not taken into consideration by land area, in other
words by density. Digging into population density could be helpful in identifying if there is an optimal
MSA for innovation.
Patents, and specifically the top 10 patents, can be delved into deeper, specifically by comparing
performance against industry payroll and C-level employees due to outsourcing of industry, as well as
reviewing changes in MSA top 10 patents over time to review changes in innovation and economic
drivers over decades.
14. New York
Class Class Title Total Class %
Class IDX of
Top
424
Drug, Bio-Affecting and Body Treating Compositions
(includes Class 514) 5212 8.462824947 1
370 Multiplex Communications 3138 5.095231136 0.602072141
705
DP: Financial, Business Practice, Management, or
Cost/Price Determination (Data Processing) 1848 3.000633251 0.354566385
455 Telecommunications 1589 2.580089954 0.304873369
435 Chemistry: Molecular Biology and Microbiology 1509 2.450192411 0.289524175
532 Organic Compounds (includes Classes 532-570) 1473 2.391738516 0.282617038
375 Pulse or Digital Communications 1416 2.299186517 0.271680737
709
Multicomputer Data Transferring (Electrical Computers
and Digital Processing Systems) 1323 2.148180623 0.253837299
438 Semiconductor Device Manufacturing: Process 1295 2.102716482 0.248465081
707
DP: Database and File Management or Data Structures
(Data Processing) 1139 1.849416273 0.218534152
Top 10% of total 32.38021011
Boston
Class Class Title Total Class %
Class IDX of
Top
424
Drug, Bio-Affecting and Body Treating Compositions
(includes Class 514) 3326 8.274661027 1
435 Chemistry: Molecular Biology and Microbiology 2143 5.331508894 0.644317498
370 Multiplex Communications 1397 3.475556661 0.420024053
709
Multicomputer Data Transferring (Electrical Computers
and Digital Processing Systems) 1136 2.826222167 0.341551413
128 Surgery (includes Class 600) 1089 2.709292201 0.327420325
250 Radiant Energy 1004 2.497823112 0.301864101
707
DP: Database and File Management or Data Structures
(Data Processing) 994 2.472944396 0.298857486
606 Surgery (instruments) 871 2.166936186 0.261876127
532 Organic Compounds (includes Classes 532-570) 847 2.107227267 0.254660253
382 Image Analysis 631 1.569846996 0.189717378
Top 10% of total 33.43201891
15. Houston
Class Class Title Total Class %
Class IDX of
Top
166
Wells (shafts or deep borings in the earth, e.g., for oil and
gas) 3259 15.49322558 1
520
Synthetic Resins or Natural Rubbers (includes Classes 520-
528) 1272 6.047064416 0.390303774
175 Boring or Penetrating the Earth 1049 4.986926551 0.321877877
702 DP: Measuring, Calibrating, or Testing (Data Processing) 636 3.023532208 0.195151887
424
Drug, Bio-Affecting and Body Treating Compositions
(includes Class 514) 551 2.619443784 0.169070267
324 Electricity: Measuring and Testing 537 2.552888044 0.164774471
585 Chemistry of Hydrocarbon Compounds 502 2.386498693 0.15403498
532 Organic Compounds (includes Classes 532-570) 479 2.277157119 0.1469776
73 Measuring and Testing 468 2.224863323 0.143602332
507 Earth Boring, Well Treating, and Oil Field Chemistry 391 1.858806751 0.119975453
Top 10% of total 43.47040647
San Jose
Class Class Title Total Class %
Class IDX of
Top
438 Semiconductor Device Manufacturing: Process 5418 6.050453952 1
370 Multiplex Communications 4785 5.343562598 0.88316722
257
Active Solid-State Devices (e.g., Transistors, Solid-State
Diodes) 3695 4.126324723 0.681985973
365 Static Information Storage and Retrieval 3466 3.870593096 0.639719454
709
Multicomputer Data Transferring (Electrical Computers
and Digital Processing Systems) 3420 3.819223425 0.631229236
707
DP: Database and File Management or Data Structures
(Data Processing) 3219 3.594760293 0.594130676
360 Dynamic Magnetic Information Storage or Retrieval 2789 3.114565535 0.514765596
711
Memory (Electrical Computers and Digital Processing
Systems) 2578 2.878935084 0.475821336
345
Computer Graphics Processing and Selective Visual Display
Systems 2416 2.698024501 0.445921004
714 Error Detection/Correction and Fault Detection/Recovery 2221 2.480261762 0.409929863
Top 10% of total 37.97670497
16.
17.
18.
19. Gaps in Patent Activity
901 Robots
902 Electronic funds transfer
903 Hybrid electric vehicles (hevs)
930 Peptide or protein sequence
968 Horology
976 Nuclear technology
977 Nanotechnology
984 Musical instruments
987
Organic compounds containing a bi, sb, as, or p atom or
containing a metal atom of the 6th to 8th group of the
periodic system
D01 Edible products
D02 Apparel and haberdashery
D03 Travel goods and personal belongings
D04 Brushware
D05 Textile or paper yard goods; sheet material
D06 Furnishings
D07
Equipment for preparing or serving food or drink not
elsewhere specified
D08 Tools and hardware
D09 Packages and containers for goods
D10 Measuring, testing, or signalling instruments
D11 Jewelry, symbolic insignia, and ornaments
D12 Transportation
D13
Equipment for production, distribution, or transformation of
energy
D14
Recording, communication, or information retrieval
equipment
D15 Machines not elsewhere specified
D16 Photography and optical equipment
D17 Musical instruments
D18 Printing and office machinery
D19 Office supplies; artists and teachers materials
D20 Sales and advertising equipment
D21 Games, toys, and sports goods
D22 Arms, pyrotechnics, hunting and fishing equipment
D23
Environmental heating and cooling; fluid handling and
sanitary equipment
D24 Medical and laboratory equipment
D25 Building units and construction elements
20. D26 Lighting
D27 Tobacco and smokers' supplies
D28 Cosmetic products and toilet articles
D29 Equipment for safety, protection, and rescue
D30 Animal husbandry
D32 Washing, cleaning, or drying machine
D34 Material or article handling equipment
D99 Miscellaneous
G9B
INFORMATION STORAGE BASED ON RELATIVE MOVEMENT
BETWEEN RECORD CARRIER AND TRANSDUCER
PLT Plants