We could cluster the city into three distinct regions.
If we use 7 clusters we start to see the different towns across the city becoming the focus of the clusters.
12 clusters proves to be the ideal number of clusters to represent the geographic spread of businesses around the area.
So, we know where these businesses are centered across the city. What else can we find out?
This is unfiltered; it’s just what’s taken out of the data straightwaway
This removes some stop words to do with companies etc. The bag of words analysis doesn’t yield anyting obvious – lots of words used quite infrequently across the sample.
Leeds clusters report
IS THERE A WAY TO…
• Map the economic structures of a city?
• Audit the communities present in a city and how well
connected they are?
• Identify strengths and weakness of a city?
• Understand the areas which need civic support?
• Analyse the impact of policy decisions in real time?
1. Identifies all organisations registered or
holding offices in a location
2. An iterative process of automated google
searches and use of web crawlers, finds
information on the organisations such as their
website addresses, social profiles, how they
describe what they do on their websites etc.
3. As organisations
is returned from
other data sources
4. A definition of a sector such as ‘digital and
technology’ is then run as a query against the
This analysis is based on:
• 3,339 businesses identified as “digital & technology”
• 22.7 million tweets, from 350,000 people, collected in
Audio & Visual
The 3,339 businesses are
located across the Leeds
area. The majority are
located in the city centre
but there are clusters of
businesses in Ilkley,
Wetherby/Boston Spa and
3 GEOGRAPHIC CLUSTERS
Using a clustering
algorithm the first level of
useful clustering splits the
city into three groups.
The algorithm tells us how
many clusters make
“sense” so as to group the
businesses into useful
groups. The three clusters
partition the businesses
into East / West groups
but there isn’t enough
definition in the centre of
the City to distinguish
between those in the “city
centre” compared with
those situated towards
the Ring Roads.
7 GEOGRAPHIC CLUSTERS
If we partition the
businesses into 7 clusters
we start to see the towns
of the Leeds area
However, with only 7
clusters we don’t get
between the centre of
Leeds City Centre and the
wedge from the City
towards Roundhay and
Chapel Allerton. We still
need some additional
12 GEOGRAPHIC CLUSTERS
12 clusters is the
optimal number of
clusters to help us
see the different
we have in the city
area. Each cluster
has a “useful”
center that maps
onto Leeds City
helps us tell the
story of how and
Wetherby / Boston Spa
Seacroft / NE Leeds
Leeds City Centre
INDUSTRY & GEOGRAPHY
0 200 400 600 800 1000
4: Boston Spa
6: Leeds City
Audio & Visual
This word cloud is
generated from the text
collected for each
business. It is not
filtered and so could be
a little misleading. What
does “none” refer to for
example. To learn
something useful about
the businesses in our
area we need to do
some filtering on the
words and remove some
and up weight others.
These are the words that
Leeds digital businesses use
to describe themselves.
Whilst this is interesting
because we can see useful
descriptions we are lacking
some additional context to
help us understand how
these businesses are
organised across the city.
The previous diagram shows us how the different
disciplines are related. Each dot is a Leeds
business and the businesses are connected
together by the things that they do. The distance
on this diagram is significant, so disciplines that are
closer together are more closely related.
This technique could be used to map the city’s
businesses each year and the patterns we see will
be different. Each year we could see how the
disciplines are merging or separating and what is
the most important part of the landscape.
Data is at the heart of the digital structure and this
tells us that this is a very important part of the
landscape. Publishing is a fragmented community
and it will be interesting to see whether gets more
or less important next year. Social stands out as
connecting creative/brand, development and
consultancy groups and it is interesting to see that
“design” is seen as the linkage between the
creative/advertising/brand communities and the
internet/software development groups.
SOCIAL DATA SET
• 22.7 million tweets from 350,000 individuals from
• The individuals were included because their Twitter
biographies tell us they live in one of 10 cities across the
• 864,981 (3.8%) tweets from people who live in Leeds
from 23,274 (6.6%) people.
Using the social data we can uncover
the types of community that exist in
each city. Whilst they look quite
similar, there are some significant
diffierences and these need analysing
to understand what causes the
difference in each city.
We see communities emerge around:
• The University (the student population)
• Sports clubs (Leeds United, Bristol City)
• Night life
Is it possible to build up a matrix of “things” that a city has and in what
proportion? Can we link a “successful” city to the presence of absence of
these “things”. Does this lead us to having an always on, real time view of
success in a city that we can use in our decision making process?
LEEDS VS BRISTOL
Total 23,274 14,311
“University” 33.1% 10.0%
“Student” 6.3% 4.0%
“Fan” 5.3% 2.6%
“Love” 5.9% 4.2%
We could go
If we analyse mentions of
“Leeds” on two separate
days, we find similar trends
but there is a significant
difference in volume on
If we are to find a stable
metric that is capable of
analysing a whole city, we
need a metric which
doesn’t change too much
day by day.
Our method must remove
noise. We do this through
the application of novel
If we remove the noise from
our network visualisations,
we find the “stable”
structures which don’t
change too much over time.
This helps us see the
communities which matter
and need further study.
Our method is capable of
producing a “barcode” for a
city, capable of expressing
fine detail in a unique way
which helps us understand
how communities in a city
The new method allows us to clearly see the
communities which exist in a city. This helps
us understand those communities which
make the city “tick” and those communities
who are isolated from the main group of
In Leeds, we find that the student community
is isolated from the day to day communities
we see focused on digital & tech, health and
sport. What does this mean for the long
term economic growth plans for our city?
This can help decision makers understand
which communities to focus on when
considering policy changes. It can help us
plot a course to build a city which has been
optimised for growth.
These diagrams show the significant communities of these 6 cities once we
remove noise from the data we’ve collected.