SlideShare a Scribd company logo
1 of 18
Download to read offline
This article was downloaded by: [Ball State University]
On: 23 April 2013, At: 11:24
Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,
37-41 Mortimer Street, London W1T 3JH, UK
Cartography and Geographic Information Science
Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/tcag20
Spatial, temporal, and socioeconomic patterns in the
use of Twitter and Flickr
Linna Li
a
, Michael F. Goodchild
a
& Bo Xu
b
a
Department of Geography, Center for Spatial Studies, University of California, Santa
Barbara, CA, USA
b
Department of Geography and Environmental Studies, California State University, San
Bernardino, CA, USA
Version of record first published: 19 Apr 2013.
To cite this article: Linna Li , Michael F. Goodchild & Bo Xu (2013): Spatial, temporal, and socioeconomic patterns in the use
of Twitter and Flickr, Cartography and Geographic Information Science, 40:2, 61-77
To link to this article: http://dx.doi.org/10.1080/15230406.2013.777139
PLEASE SCROLL DOWN FOR ARTICLE
Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to
anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representation that the contents
will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should
be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims,
proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in
connection with or arising out of the use of this material.
Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr
Linna Lia
*, Michael F. Goodchilda
and Bo Xub
a
Department of Geography, Center for Spatial Studies, University of California, Santa Barbara, CA, USA; b
Department of Geography
and Environmental Studies, California State University, San Bernardino, CA, USA
(Received 11 September 2012; accepted 27 January 2013)
Online social networking and information sharing services have generated large volumes of spatio-temporal footprints,
which are potentially a valuable source of knowledge about the physical environment and social phenomena. However, it is
critical to take into consideration the uneven distribution of the data generated in social media in order to understand the
nature of such data and to use them appropriately. The distribution of footprints and the characteristics of contributors
indicate the quantity, quality, and type of the data. Using georeferenced tweets and photos collected from Twitter and Flickr,
this research presents the spatial and temporal patterns of such crowd-sourced geographic data in the contiguous United
States and explores the socioeconomic characteristics of geographic data creators by investigating the relationships between
tweet and photo densities and the characteristics of local people using California as a case study. Correlations between
dependent and independent variables in partial least squares regression suggest that well-educated people in the occupations
of management, business, science, and arts are more likely to be involved in the generation of georeferenced tweets and
photos. Further research is required to explain why some people tend to produce and spread information over the Internet
using social media from the perspectives of psychology and sociology. This study would be informative to sociologists who
study the behaviors of social media users, geographers who are interested in the spatial and temporal distribution of social
media users, marketing agencies who intend to understand the influence of social media, and other scientists who use social
media data in their research.
Keywords: spatio-temporal footprints; socioeconomic; Flickr; Twitter; georeference
Introduction
There has been a rapid expansion in the use of social
media and data sharing services recently. Data generated
by these sources have been widely used to study social
networks (Huberman, Romero, and Wu 2008; Lerman and
Ghosh 2010) and behavioral trends (Sakaki, Okazaki, and
Matsuo 2010; Bollen, Mao, and Zeng 2011). However, it
is critical to understand the uneven distribution of such
data sources in order to evaluate their validity, accuracy,
representativeness, and uncertainty when they are used to
imply the social and behavioral characteristics of the
users. This article, using Twitter and Flickr as two exam-
ples, explores spatiotemporal patterns of geographic data
generated in social media, within the bounding box of the
contiguous United States and further infers the character-
istics of the users by examining the relationships between
geographic data densities and the socioeconomic charac-
teristics of local residents at the county level using
California as a case study.
Twitter is a popular social media site that is widely
used for daily chatter, conversations, sharing information,
and reporting news (Java et al. 2007). Flickr is an online
photo management service that allows uploading and shar-
ing photos within and outside of groups. Tweeting and
photo-taking behaviors, specifically the distribution of
tweets and photos, may suggest the diverse characteristics
of different users. To understand spatiotemporal patterns
of tweets and photos, we study two cases: georeferenced
tweet messages in Twitter and georeferenced photos in
Flickr, which are used as proxies for the spatio-temporal
footprints of their creators. In the past few years, the two
data sources have been widely used to investigate research
questions in different disciplines. For example, the loca-
tion and spatial boundary of places may be delineated
based on the aggregation of geotagged photos in Flickr
(Hollenstein and Purves 2010; Li and Goodchild 2012).
Representative photos at different locations and tourist
paths can also be extracted by analyzing spatial, temporal,
and visual information associated with Flickr photos
(Crandall et al. 2009). Both Antoniou, Morley, and
Haklay (2010) and Purves, Edwardes, and Wood (2011)
carried out comparative studies of Flickr and other image
collections and have investigated issues in such collec-
tions, including bias. Ames and Naaman (2007) have
studied motivations in tagging. The similarity of spatial
and temporal information associated with photos provided
by different contributors may even indicate the probability
of a social tie between them (Crandall et al. 2010). Twitter,
the other rich data source, has been used to study people’s
response to emergencies (Goodchild and Glennon 2010;
*Corresponding author. Email: linna@geog.ucsb.edu
Cartography and Geographic Information Science, 2013
Vol. 40, No. 2, 61–77, http://dx.doi.org/10.1080/15230406.2013.777139
© 2013 Cartography and Geographic Information Society
Downloadedby[BallStateUniversity]at11:2423April2013
Sakaki, Okazaki, and Matsuo 2010), the automatic detec-
tion of local events (Lee and Sumiya 2010), and predict
election results based on the sentiments expressed in
tweets (Tumasjan et al. 2010). Furthermore, check-ins
collected from location-sharing services were used to
study human mobility patterns (Cheng et al. 2011).
Although data collected from social media, such as
Twitter, have been increasingly used to study geographic
landscapes and human behaviors (Li and Goodchild,
2013), it is difficult to estimate the representativeness of
such data. Despite the various studies, thus far there is no
research on the socio-demographic characteristics of users,
which is of great value since georeferenced data from
Twitter and Flickr, are implicative of the characteristics
of places, as well as local residents.
However, research has been done on socio-demo-
graphic characteristics of Internet users using surveys in
many countries. For example, Soule, Shell, and Kleen
(2003) found that gender is not a significant variable in
explaining heavy Internet usage, but education is, based
on the data from the Tenth Graphic, Visualizations, and
Usability Center (GVU) Survey conducted on the Web. A
study in the Philippines showed that younger, more afflu-
ent, and well-educated people in places with better infra-
structure are more capable of using Information and
Communications Technology (ICT, Alampay 2006).
Different Internet usage patterns of people from different
socio-economic groups were identified in central
Queensland (Taylor et al. 2003). As demonstrated in
these studies, the characteristics of Internet users are cru-
cial for understanding a range of relevant phenomena,
such as Internet addiction, social opportunities through
the access to ICT, and behavioral patterns in using such
technologies. Since conducting surveys is time-consuming
and labor-intensive, all the studies primarily collect data
through questionnaires, so they can only rely on a small
number of participants. In our study, we use geographic
location as a link to associate social media usage and
characteristics of local residents based on the data auto-
matically collected using social media APIs and the aux-
iliary census data.
This study provides an exploratory analysis of a subset
of Twitter and Flickr users, those who provide locational
information for tweets and photos, in terms of their demo-
graphic and socioeconomic properties at the county level
in California. Georeferenced tweets and photos indicate
the presence of their creators at that location. There are
three major reasons why people are present at a particular
location: location of residence, location of work, or loca-
tion of tourist attractions. In this article, we select geor-
eferenced tweets and photos contributed by local residents
to explore the demographic and socioeconomic character-
istics of these users. A user is considered a local resident
in a county only when the time interval between two
tweets or photos produced in that specific county by the
user is longer than 10 days.
The remainder of the article is structured as follows.
The section “Twitter and Flickr data collection and pre-
processing” describes the collection and pre-processing of
georeferenced data from Twitter and Flickr. The section
“The spatial distribution of georeferenced tweets and
photos” presents the spatial distributions of georeferenced
tweets and photos over the contiguous United States,
followed by a discussion of the temporal patterns of geor-
eferenced tweets and photos in the section “Temporal
patterns of tweets and photos.” We propose two descrip-
tive models in the section “Descriptive models of tweet
and photo densities in California” to illustrate the relation-
ships between the tweet and photo densities and the char-
acteristics of people in different counties of California.
The article concludes with a discussion of implications
and future research directions.
Twitter and Flickr data collection and pre-processing
Tweets and photo metadata were collected using Twitter
and Flickr’s public APIs and stored in a MySQL database.
We collected data from 21 January to 7 March 2011; these
dates were chosen to avoid major events that might cause
unusual patterns. In total, there are 19,758,954 records for
Twitter and 4,263,227 records for Flickr within the bound-
ing box of the contiguous United States. Location asso-
ciated with each tweet is in a variety of forms with
different precision levels. It may be automatically captured
by built-in Global Positioning System (GPS) receivers in
mobile devices like smart phones, calculated according to
the relative position of the user’s equipment in a cellular
network, or manually selected by a user from a set of
place names provided by Twitter. In the first case, location
is in the form of latitude and longitude, while in other
cases location is usually recorded as a neighborhood, a
city, or even a country. Other than coordinates, Twitter
takes the estimated location of a user’s device or an
Internet Protocol (IP) address of a computer and reverse
geocodes it to a few possible places provided to the user
for selection. The positional accuracy varies from one
method to another. For location recorded by GPS, it is
usually at the magnitude of several meters. For location
determined by triangulation in a cellular network, accu-
racy ranges from 30 to 3000 m, depending on the spatial
distribution of cells (Zandbergen 2009). For IP address,
the positional accuracy of georeference depends on the
method used to convert IP addresses to geographic coor-
dinates, usually at the level of ZIP code, city, state, or even
country. For example, Maxmind’s free GeoLite City data-
base claims that the spatial accuracy of georeference is
“over 99.5% on a country level and 78% on a city level
for the U.S. within a 40 kilometer radius.” Finally, the
62 L. Li et al.
Downloadedby[BallStateUniversity]at11:2423April2013
accuracy of a place name depends on the spatial extent of
the place. Information about the tweets in the database
contains tweet ID, tweet text, time, location, and user ID.
In Flickr, photos were either georeferenced by built-in
GPS in cameras or manually georeferenced by a user who
identified photo location on a map. The location could
either be the place where a photo was taken or be the
location of an object in the photo. Automatic recording by
a GPS receiver is always the former case, while manually
georeferenced photos could be either way. One typical
error in location of photos occurs when a user uploads a
group of photos that involve several places to the same
location. Photo metadata contain information about photo
ID, photo title, description, tags, upload time, time when a
photo was taken, location, and owner ID.
For both tweets and photos, the locations are resolved
to five decimal places of latitude and longitude (approxi-
mately 1 m), but we should expect that the accuracy of
location is dependent on the accuracy of GPS in mobile
devices (which could be several meters) or the map scale
when a user specifies a photo location. Because the objec-
tive of this article is to study the spatial and temporal
patterns of tweets and photos, only data that have point
locational information with relatively high precision are
used, and those that are not georeferenced are excluded. It
is estimated that the percentage of georeferenced tweets is
less than 1% and geotagged photos around 3.33%.1
However, the total numbers of tweets and photos are
very large, so we can still obtain great volumes of geor-
eferenced data. In addition, we must be aware that these
data were contributed by users who are willing to share
their locations and not by everyone who uses the two
services. Therefore, the data are a subset of the entire
datasets of Twitter and Flickr given spatial and temporal
constraints, and the users are a subset of the entire user
groups. Like other data created by volunteers, there is bias
in terms of contributions made by different users, because
most contributions come from a very small percentage of
the total number of contributors. For instance, “In most
online communities, 90% of users are lurkers who never
contribute, 9% of users contribute a little, and 1% of users
account for almost all the action” (Nielsen 2006). Haklay
(2010) showed that most of the data for England were
contributed by only a few users and the difference of road
data coverage between wealthy areas and poor areas is
about 8% in OpenStreetMap (OSM). Contribution bias is
also present in our datasets. The 300 heaviest contributors
of local Twitter and Flickr users who share geographic
footprints are represented in Figure 1a and 1b, showing
the long tail effect: a large number of tweets and photos
are created by the first few hundred contributors.
When examining the relationships between georefer-
enced data densities and socioeconomic characteristics of
residents in California, we verify that the data were
produced by local users. First, we chose county as the
data aggregation level, because a person is more likely to
live in one census tract and work in another. Therefore, it
is difficult to tell whether a location is a user’s home or
work place at a finer spatial scale. By contrast, people are
more likely to live and work in the same county.
According to the 2000 Census Bureau county-to-county
commuting data for California, the percentage of resi-
dents who commute within the same county is as high as
83%. Second, we calculated the time a user stays in a
county by comparing the time interval between two
tweets and photos that are produced by the same user.
Only when a time interval is greater than 10 days, a user
is regarded as local, and data created by this user are
retained for further analysis.
Correlations between tweet and photo densities and
contributors’ properties were calculated at the county
level. Ideally, socioeconomic characteristics of users
would be determined at the individual level, but that
type of data is not available for obvious reasons, so loca-
tions were used to link the data densities and the residents.
This type of correlation based on group data rather than
individual data is called ecological correlation (Robinson,
1950). Ecological correlations between tweet and photo
densities and the socioeconomic characteristics of people
suggest that certain people with specific characteristics are
more involved in the generation of georeferenced tweets
and photos. However, it would be fallacious to infer
individual behaviors from data aggregated to geographic
areas (Openshaw 1984; Piantadosi, Byar, and Green 1988;
King 1997). For example, correlation between the number
of tweets from a place and the number of Native
Americans present in that place does not imply that
Native Americans are more likely to tweet. This study is
a first step toward an understanding of the relationships
between georeferenced tweets and photos and population;
the results suggest that it would be valuable to further
investigate these relationships.
The spatial distribution of georeferenced tweets and
photos
We plotted the locations of georeferenced tweets on a
map. As demonstrated in Figure 2, tweet locations roughly
describe the administrative boundary of the United States
and major roads at a very good resolution, which is similar
to the representation of Flickr photos in other research
(Crandall et al. 2009). Figure 3 shows georeferenced
tweets in part of Los Angeles. At this scale, the blocks
and local roads are delineated by tweet locations. For
instance, tweet locations are well aligned with the location
and shape of freeways, such as Interstate 405, as well as
some local roads. High density along major roads might
indicate people tweeting from vehicles, and perhaps from
locations adjacent to major roads such as hotels and gas
stations as well.
Cartography and Geographic Information Science 63
Downloadedby[BallStateUniversity]at11:2423April2013
Flickr photos have similar spatial patterns to tweet
locations. However, the number of photos is substantially
smaller than that of tweets during the same time period. It
takes more effort to take and upload photos than it does to
generate tweets. Despite a smaller number of photos than
tweets, some places are associated with more photos.
Intensive tweets are usually generated at places with
high population density, such as big metropolitan areas;
3500
3000
2500
2000
1500
1000
20,000
18,000
16,000
14,000
12,000
10,000
8000
6000
4000
2000
0
500
0
0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300
0 20 40 60 80 100 120
(b)
140 160 180 200 220 240 260 280 300
Numberofgeoreferencedtweets
Numberofgeoreferencedphotos
Ranked user - Top 300 users generating most tweets
Ranked user - Top 300 users generating most photos
(a)
Figure 1. (a) The number of georeferenced tweets generated by the top 300 contributors (highest: left; lowest: right). (b) The number of
georeferenced photos generated by the top 300 contributors (highest: left; lowest: right).
64 L. Li et al.
Downloadedby[BallStateUniversity]at11:2423April2013
Figure 2. Georeferenced tweets within the bounding box of the contiguous United States.
Figure 3. A close-up of georeferenced tweets in part of Los Angeles.
Cartography and Geographic Information Science 65
Downloadedby[BallStateUniversity]at11:2423April2013
however, many photos are also taken at places with low
population density, such as Yosemite National Park.
To estimate the number of tweet and photo occur-
rences per unit area, we performed a kernel density ana-
lysis of the national data using tweet and photo locations.
Kernel density is a way of estimating the intensity of
points by creating a smooth surface using a bivariate
probability density function (Bailey and Gatrell 1995).
The kernel estimator is defined as
f ðxÞ ¼
1
nh
Xn
i¼1
K
x À xi
h
 
(1)
where n is the total number of points, h is the bandwidth
that determines the amount of smoothing, K is the kernel
function, x is the location of estimation, and xi is known
point location. The kernel function K could have differ-
ent forms, such as a Gaussian distribution, negative
exponential, or a simple binary function (it is constant
within the bandwidth and zero otherwise). The quadratic
function we used in the analysis is given below
(Silverman 1986):
KðcÞ ¼
3
π ð1 À cT
cÞ2
ifcT
c  1
0 otherwise

(2)
There are two parameters in kernel density estimation:
kernel bandwidth and cell size. The kernel was 100 km
and the cell size was 1 km given the size of the region.
The kernel bandwidth of 100 km is a compromise between
a map that is too smooth to interpret and one that is too
noisy to interpret. The cell size of 1 km was used to show
fine detail. As shown in Figures 4 and 5, both tweets and
photos tend to cluster in major cities with high population
density. For example, Seattle, Portland, San Francisco, and
Los Angeles on the west coast and Boston, New York
City, Baltimore, and Washington DC on the east coast are
clusters of both tweets and photos. We can almost identify
all major cities with significant economic, political, and
social influence in the United States from these two maps.
Although there are consistent patterns of tweets and
photos occurring at cities with high population density,
there are some differences, too. We calculated the normal-
ized density difference as follows:
Dd ¼
Dp
max ðDpÞ
À
Dt
max ðDtÞ
(3)
where Dd measures the relative difference between tweet
density and photo density, Dp and Dt are photo density and
tweet density at a location, respectively, and max (Dp) and
max (Dt) are the maximum photo and tweet density within
the study area. To account for the total amount of differ-
ence between the two sources, we normalized the density
value by the maximum density in each source, so the
range of density for both sources is between 0 and 1.
This allows us to compare density at each location as
opposed to other locations. As shown in Figure 6, some
locations stand out in the map of density difference as
places with high photo density, such as Lake Tahoe and
Yosemite National Park in California, Charleston in South
Carolina, and Orlando in Florida – which are popular
tourist attractions. The normalized photo density for
these places is substantially higher than the normalized
tweet density. On the other hand, Atlanta in Georgia,
Figure 4. Tweet density within the bounding box of the contiguous United States.
66 L. Li et al.
Downloadedby[BallStateUniversity]at11:2423April2013
Cincinnati and Columbus in Ohio, and Detroit in
Michigan have significantly higher normalized tweet den-
sity. Furthermore, there are many tweets in the city of
Denver but a considerable number of photos in the
Rockies west of Denver.
At a finer scale, we generated a tweet density surface in
Los Angeles using a kernel of 10 km and a cell size of
100 m. As shown in Figure 7a, downtown Los Angeles and
Beverly Hills have the highest tweet density and it gradu-
ally decreases in the surrounding areas. The photo density
surface in Los Angeles is demonstrated in Figure 7b, with
three major clusters in downtown Los Angeles, Pasadena,
and Santa Monica. In these two figures, density estimation
does not stop at the coast and the values are not zero in the
ocean; however, a spatial constraint clearly could be applied
in the density calculation.
Figure 5. Flickr photo density within the bounding box of the contiguous United States.
Figure 6. Normalized density difference between Flickr photos and tweets.
Cartography and Geographic Information Science 67
Downloadedby[BallStateUniversity]at11:2423April2013
Figure 7. (a) Tweet density in Los Angeles County. (b) Flickr photo density in Los Angeles County.
68 L. Li et al.
Downloadedby[BallStateUniversity]at11:2423April2013
Temporal patterns of tweets and photos
The density of tweets varies from place to place and also
through time. The hourly number of georeferenced tweets
in Los Angeles within a week is shown in Figure 8. The
highest rates of tweeting occurred between 8:00 in the
morning and at midnight. There are generally two tweet
peaks: one around 13:00–14:00 in the afternoon and the
other around 20:00–21:00 in the evening. The lowest rate
of tweeting is around 4:00–5:00 in the morning when most
people are sleeping. This trend is relatively consistent in
each day of the week and represents the activity pattern of
georeferenced tweets. A comparison of temporal patterns
of tweets and photos is shown in Figure 9a and 9b. In
contrast to the temporal pattern of tweets, Flickr users are
substantially more active during weekends and the rate of
photo-taking is highest during the afternoon hours.
However, temporal uncertainty should be considered
when interpreting the results. The time when a photo
was taken is provided by a camera, but not all photogra-
phers consistently keep the right time setting.
Descriptive models of tweet and photo densities in
California
In this section, we infer the characteristics of georefer-
enced tweet and photo users by studying the relationships
between tweet and photo densities and the socioeconomic
characteristics of people in different counties of California.
The hypothesis is that areas with high tweet or photo
density tend to have people with some specific character-
istics which may be age, race, educational attainment, the
type of occupation, and household income. The tweet
dataset contains 602,371 tweets in California that were
georeferenced by GPS, created by 44,097 users. Because
the study uses socioeconomic data of local residents only,
the raw data were preprocessed to exclude data that were
likely to be generated by tourists. As mentioned above, a
user is regarded as a local resident if he or she stays in a
county for a relatively long period of time (i.e., 10 days),
which is verified by the time interval between two tweets
or photos generated by the same user. As a result, there are
432,475 georeferenced tweets generated by 18,315 local
users, which represent about 71.80% of all georeferenced
tweets.
Data on distributions of age, race, educational attain-
ment, occupation, and household income were obtained
from the American Community Survey (ACS) 2006–
2010. These data made up the set of explanatory variables.
To create spatially intensive variables, all variables were
normalized by the total number of people in each county.
For instance, tweet density was calculated by the number
of tweets over the total population in a county. Hence, the
tweet density in the model is different from the tweet
density represented as a kernel density surface in the
section “The spatial distribution of georeferenced tweets
and photos”: It is the number of tweets per person in a
Figure 8. The average number of tweets per hour in Los Angeles County.
Cartography and Geographic Information Science 69
Downloadedby[BallStateUniversity]at11:2423April2013
Figure 9. (a) Time chart for georeferenced tweets. (b) Time chart for georeferenced photos.
70 L. Li et al.
Downloadedby[BallStateUniversity]at11:2423April2013
county, rather than the number of tweets per land area
unit. The explanatory variables consist of the percentage
of people who fall into each of the categories (e.g., there
are 23 age groups, ranging from “under 5 years” to “85
years and over,” so there are 23 variables for the percen-
tage of people in all age groups and they add up to 1).
Since there are many categories in each of these types
of data, the number of explanatory variables is large
compared to the number of observations, and some expla-
natory variables are correlated with each other, thus multi-
ple linear regression is not appropriate because it requires
the absence of multicollinearity. Partial least squares
regression (PLSR), on the other hand, is a method parti-
cularly useful for describing the correlation between a
dependent variable and a set of strongly collinear inde-
pendent variables. It aims to reduce the set of variables to
a smaller number of uncorrelated components that char-
acterize most of the covariance between the dependent
variable and independent variables. PLSR was introduced
by Wold (1966) in the social sciences, and was later
widely adopted in chemometrics (Wold, Sjöström, and
Eriksson 2001). PLSR is related to principal component
regression (PCR): Both extract components from original
independent variables for regression modeling; however,
they differ in several ways. The major difference is that
principal components in PCR are solely determined by the
variance of independent variables, while those in PLSR
are determined by the covariance between dependent and
independent variables (Garthwaite 1994). Therefore, the
methods for constructing components in PCR and PLSR
are different, and the latter has the capability to capture
most of the information in independent variables that
explains the dependent variable by avoiding the problem
in PCR of discarding important principal components with
a low variance (Jolliffe 1982).
Fifty-eight explanatory variables in the model can be
grouped into five categories: age, race, educational attain-
ment, income, and occupation. Performance of PLSR on
the data resulted in five components that explain most of
the variance in tweet density (70.81%) and in the original
58 independent variables (82.89%). Table 1 lists the per-
centages of variance in the dependent and independent
variables explained by each component, and Table 2
gives a sample loading matrix for the five components
obtained from the original variables (see Appendix 1 for
the entire loading matrix for PLS components in the tweet
density model). The loading measures the importance of
each variable in accounting for the variance of a compo-
nent. A high loading value means that a specific variable
accounts for much variance in a component. Table 3 gives
a brief description of the meaning of the five components
based on the loading values. The first component accounts
for 37.94% of the variation in tweet density and 28.59% of
the variation in the independent variables. It is positively
highly loaded on the occupation variable of management,
business, science, and arts, the education variables of
bachelor’s degree and graduate or professional degree,
and the household income variables of $200.000 or more
Table 1. The percentage of variances explained by components in the Twitter model.
Component
1 2 3 4 5
Explained variance in independent variables 0.2859 0.0987 0.0352 0.3199 0.0893
Explained variance in dependent variable 0.3794 0.1200 0.1381 0.0159 0.0548
Note: Independent variables are percentages of people falling into different subcategories of age, race, educational attainment, occupation, and household
income, respectively, obtained from ACS (2006–2010), and the dependent variable is tweet density.
Table 2. Sample loading matrix for PLS components in the Twitter model.
Component
Explanatory variables 1 2 3 4 5
Bachelor’s degree 0.401932 −0.0772 0.042405 0.024567 0.009941
Graduate or professional degree 0.29582 −0.03287 0.007762 0.025411 0.000464
$150,000 to $199,999 0.174187 0.013374 −0.04549 −0.02102 −0.0099
$200,000 or more 0.245107 0.042515 −0.02049 −0.00515 −0.02389
Management, business, science, and arts occupations: 0.49972 −0.17736 0.009559 0.054147 −0.02251
Service occupations: −0.14328 −0.05453 0.092773 0.047814 −0.03232
Sales and office occupations: 0.021245 0.022759 0.069738 −0.03638 −0.02886
Natural resources, construction, and maintenance occupations: −0.25131 0.09103 −0.11676 −0.03511 0.029064
Production, transportation, and material moving occupations: −0.12638 0.118096 −0.05531 −0.03048 0.054624
Cartography and Geographic Information Science 71
Downloadedby[BallStateUniversity]at11:2423April2013
and $150,000 to $199,999. We may broadly call it a well-
educated people component. The second component
explains 12% of the variation in tweet density and
9.87% of the variance in the independent variables. It
has high positive loadings on low level of education
(i.e., less than 9th grade and 9th to 12th, no diploma)
and occupations in transportation and material moving.
This is a component for less-educated people. The third
component represents other race people and accounts for
13.81% of the dependent variable but only 3.52% of the
independent variables. The last two components both have
low explanatory powers for tweet density and are not
considered important in the model. Interestingly, there is
no obvious difference between male and female in the
behavior of generating georeferenced tweets, so sex was
not included in the final model. In simple correlations,
tweet density is also highly correlated with the percentage
of people between the ages of 25 and 44 years, but age is
correlated with income in this dataset, so variables of age
do not show up as highly loaded predictors on the
components.
The scores on each component may be mapped, as
demonstrated in Figure 10 for the first component. There
are five shades of color classified by natural breaks from
the darkest for the highest positive scores (the maximum:
0.36) to the lightest for the negative scores (the minimum:
–0.16). The San Francisco Bay area is described by high
positive scores, shown as the darkest area in the map. The
first component characterizes the percentage of people
with high education and salary, and associates this combi-
nation of characteristics with a high rate of tweeting. Take
San Francisco and Santa Clara Counties as an example.
These are places where many people work in high-tech
jobs with an advanced degree and where tweet density is
high. In contrast, northern and central California has a
dominance of negative scores, suggesting that the percen-
tage of well-educated people and tweet density are low in
these areas.
Component scores
–0.159129–0.122556
–0.122555–0.074462
–0.074461–0.013799
–0.013798–0.184898
0.184899–0.355314
0 100 200 400 km
Figure 10. First component scores for tweet density: linear combinations of the independent variables.
Table 3. Description of the PLS components in the Twitter
model.
Component Description
1 Well-educated people
2 Less-educated people
3 Other race people
4 White people
5 Asian people
72 L. Li et al.
Downloadedby[BallStateUniversity]at11:2423April2013
The same procedure was applied to photo density. A
total of 752,176 georeferenced photos created by 19,594
users in California were collected from Flickr. Similarly,
only photos contributed by local residents were retained
for further analysis, resulting in 440,026 georeferenced
photos created by 7216 local users. Five components
constructed by PLSR capture 47.34% of the variation in
photo density and 81.49% of the variance in the original
independent variables (see Table 4). The entire loading
matrix for PLS components in the photo density model is
provided in Appendix 2. The explanatory power of this
model is not as high as the tweet density model for several
reasons. Although the total number of photos is about the
same as that of tweets, the number of unique photo con-
tributors (7216) is smaller than that of tweet creators
(18,315); therefore, photos were contributed by a much
smaller number of users compared to the tweet dataset. In
addition, the uncertainty of time when a photo is taken
may be present in Flickr photos, leading to judgmental
errors when time interval was used to infer whether a user
is a local resident or a tourist. The first component
explains 10.97% of the variance in the dependent variable
and 33.16% in the independent variables. The second
component captures only 7.33% of variation in the depen-
dent variable and 21.35% in the independent variables.
This contrast demonstrates the use of the covariance
between dependent and independent variables to construct
components in PLSR, rather than the use of only variance
of independent variables in PCA. The explanations of the
five components are listed in Table 5. The first component
is highly loaded on occupations of management, business,
science, and arts, bachelor’s degree, and graduate or pro-
fessional degree, and generally describes the percentage of
well-educated white people. The second component is
positively highly loaded on Asian people with bachelor’s
degree in the occupation of management, business,
science, and arts and is interpreted as well-educated
Asian people. The third component accounts for 9.52%
of variance in the dependent variable and 17.29% of
variance in the independent variables. It has positive high
loadings on white people, high school graduate, General
Educational Development (GED), or alternative, and
service occupations, which represents moderately edu-
cated white people. The last two components explain
10.87% and 8.67% of the dependent variable, but their
explanation powers are very low for independent vari-
ables, so they are not regarded as significant in the
model. Similar to the model of tweet density, gender
does not seem to make a difference in the interpretation
of photo density.
A straightforward interpretation of the models would
be the relationship between tweet and photo densities and
the demographic and socioeconomic characteristics of
people in these places. As the raw data were preprocessed
to retain only tweets and photos generated by local resi-
dents, socioeconomic properties of people who contribute
to these data may be inferred from this relationship, such
as race, education, occupation, and income. A distinction
between time intervals of tweets and photos indicate that
71.80% of georeferenced tweets are generated by local
people, while only 58.80% of georeferenced photos are
uploaded by local residents. Therefore, the tweet density
model may be more accurate in terms of inference about
properties of Twitter users from their spatial footprints.
Although at a coarse scale, these two models provide a big
picture of the properties of local people who contribute
georeferenced tweets and photos, and offer an exploratory
analysis on the representativeness of a subset of Twitter
and Flickr users.
Conclusion
The growing popularity of social networking and social
media services has attracted researchers from various dis-
ciplines, and this new form of geographic data has been
used in a variety of applications. However, many ques-
tions must still be answered in order to use these data
more appropriately. For example, who uses these services?
Why do people use them? How can we take advantage of
Table 4. percentage of variances explained by components in the Flickr model.
Component
1 2 3 4 5
Explained variance in independent variables 0.3316 0.2135 0.1729 0.0640 0.0359
Explained variance in dependent variable 0.1097 0.0733 0.0952 0.1087 0.0867
Note: Independent variables are percentages of people falling into different subcategories of age, race, educational attainment, occupation, and household
income, respectively, obtained from ACS (2006–2010), and the dependent variable is photo density.
Table 5. Description of the PLS components in the Flickr
model.
Component Description
1 Well-educated white people
2 Well-educated Asian people
3 Moderately educated white people
4 Less-educated people
5 Other race people
Cartography and Geographic Information Science 73
Downloadedby[BallStateUniversity]at11:2423April2013
this new source of information that may be potentially
used for any possible topic but with uncertainty and
bias? Understanding the spatial and temporal distribution
of georeferenced data would provide insight into these
questions. This article visualizes the spatial and temporal
patterns of georeferenced tweets and Flickr photos col-
lected within the contiguous United States. The tweets
collected within only a few weeks delineate the adminis-
trative boundaries of the United States and the major roads
at a very good resolution, especially in areas with high
population density. Flickr photos have similar spatial pat-
terns, although the total number of photos taken during the
same period of time is substantially smaller than that of
tweets. However, some places have considerably higher
normalized photo density than tweet density – a character-
istic of tourist attractions, such as Yosemite National Park.
The temporal patterns of tweets are relatively consistent
each day of the week, with two major peaks around
13:00–14:00 and 20:00–21:00 hours, but there are sub-
stantially more photos taken over weekends.
Two descriptive models using PLSR were con-
structed to explain the variation of tweet and photo
densities from place to place in California, using demo-
graphic and socioeconomic variables of people in each
county. According to the first model, tweet density is
highly dependent on the percentage of well-educated
people with an advanced degree and a good salary who
work in the areas of management, business, science, and
arts. The second model suggests that high photo density
is correlated with a high percentage of white and Asian
people with an advanced degree in the areas of manage-
ment, business, science, and arts. This study would be
informative to sociologists who study the behaviors of
social media users, geographers who are interested in the
spatial and temporal distribution of social media users,
marketing agencies who intend to understand the influ-
ence of social media, as well as other scientists who use
social media data in their research.
This research provides an exploratory analysis of the
characteristics of the contributors of georeferenced data,
so we may be aware of the representativeness of such
specific groups of people in the total population when
using the data. Two major sources of bias may be
reduced in the future: the bias caused by people’s move-
ment and the bias due to ecological correlation. Finally,
further research from the perspectives of psychology and
sociology is required to explain why people with some
specific social and demographic properties are more
involved in creating georeferenced tweets and photos.
Note
1. http://code.flickr.com/blog/2009/02/04/100000000-geo-
tagged-photos-plus/
Acknowledgments
The research was supported by the US National Science
Foundation, award 0849910, and by the U.S. Army Research
Office, award W911NF-09-1-0302.
References
Alampay, E. 2006. “Analysing Socio-Demographic Differences
in the Access and Use of ICTs in the Philippines Using the
Capability Approach.” The Electronic Journal of
Information Systems in Developing Countries 27 (5): 1–39.
Ames, M., and M. Naaman. 2007. “Why We Tag: Motivations
for Annotation in Mobile and Online Media.” In
Proceedings of the SIGCHI Conference on Human Factors
in Computing Systems, April 28–May 3, San Jose, CA,
971–980. New York: ACM.
Antoniou, B., J. Morley, and M. Haklay. 2010. “Web 2.0
Geotagged Photos: Assessing the Spatial Dimension of the
Phenomenon.” Geomatica 64 (1): 99–110.
Bailey, T. C., and A. C. Gatrell. 1995. Interactive Spatial Data
Analysis. London: Longman.
Bollen, J., H. Mao, and X. Zeng. 2011. “Twitter Mood Predicts
the Stock Market.” Journal of Computational Science 2 (1):
1–8.
Cheng, Z., J. Caverlee, K. Lee, and D. Z. Sui. 2011. “Exploring
Millions of Footprints in Location Sharing Services.” In
Proceedings of the Fifth International AAAI Conference on
Weblogs and Social Media (ICWSM), July 2011, Barcelona,
81–88. Palo Alto, CA: AAAI press.
Crandall, D. J., L. Backstrom, D. Cosley, S. Suri, D.
Huttenlocher, and J. Kleinberg. 2010. “Inferring Social Ties
from Geographic Coincidences.” Proceedings of the
National Academy of Sciences 107 (52): 22436–22441.
Crandall, D. J., L. Backstrom, D. Huttenlocher,and J. Kleinberg.
2009. “Mapping the World’s Photos.” In Proceedings of the
18th International Conference on World wide web, April 20–
24, Madrid. New York: ACM.
Garthwaite, P. H. 1994. “An Interpretation of Partial Least
Squares.” Journal of the American Statistical Association
89: 122–127.
Goodchild, M. F., and J. A. Glennon. 2010. “Crowdsourcing
Geographic Information for Disaster Response: A Research
Frontier.” International Journal of Digital Earth 3 (3):
231–241.
Haklay, M. 2010. “How Good Is Volunteered Geographical
Information? A Comparative Study of OpenStreetMap and
Ordnance Survey Datasets.” Environment and Planning B,
Planning  Design 37 (4): 682–703.
Hollenstein, L., and R. Purves. 2010. “Exploring Place Through
User-Generated Content: Using Flickr to Describe City
Cores.” Journal of Spatial Information Science 1 (1):
21–48.
Huberman, B., D. Romero, and F. Wu. 2008. Social Networks
that Matter: Twitter under the Microscope. Accessed March
6, 2013. http://ssrn.com/abstract=1313405.
Java, A., X. Song, T. Finin, and B. Tseng. 2007. “Why We
Twitter: Understanding Microblogging Usage and
Communities.” In Proceedings of the 9th WebKDD and 1st
SNA-KDD Workshop on Web Mining and Social Network
Analysis, August 12, San Jose, CA, 56–65. New York:
ACM.
Jolliffe, I. T. 1982. “A Note on the Use of Principal Components
in Regression.” Applied Statistics 31: 300–303.
74 L. Li et al.
Downloadedby[BallStateUniversity]at11:2423April2013
King, G. 1997. A Solution to the Ecological Inference Problem:
Reconstructing Individual Behavior from Aggregate Data.
Princeton, NJ: Princeton University Press.
Lee, R., and K. Sumiya. 2010. “Measuring Geographical
Regularities of Crowd Behaviors for Twitter-Based Geo-
Social Event Detection.” In Proceedings of the 2nd
ACMSIGSPATIAL International Workshop on Location
Based Social Networks (LBSN2010), 1–10. New York:
ACM.
Lerman, K., and R. Ghosh. 2010. “Information Contagion: An
Empirical Study of the Spread of News on Digg and Twitter
Social Networks.” In Proceedings of 4th International
Conference on Weblogs and Social Media (ICWSM),
Washington, DC, May 23–26, Menlo Park, CA: AAAI Press.
Li, L., and M. F. Goodchild. 2012. “Constructing Places from
Spatial Footprints.” In Proceedings of the 1st ACM
SIGSPATIAL International Workshop on Crowdsourced
and Volunteered Geographic Information, edited by M. F.
Goodchild, D. Pfoser, and D. Sui, November 6, Redondo
Beach, CA. New York: ACM.
Li, L., and M. F. Goodchild. 2013. “Spatio-Temporal Footprints
in Social Networks.” Encyclopedia of Social Networks and
Mining, edited by R. S. Alhajj, and J. G. Rokne, Springer.
Nielsen, J. 2006. “Participation Inequality: Encouraging More
Users to Contribute.” Jakob Nielsen’s Alertbox 9: 2006.
Openshaw, S. 1984. “Ecological Fallacies and the Analysis of
Areal Census Data.” Environment and Planning A 16:
17–31.
Piantadosi, S., D. P. Byar, and S. B. Green. 1988. “The
Ecological Fallacy.” American Journal of Epidemiology
127: 893–904.
Purves, R., A. Edwardes, and J. Wood. 2011. “Describing Place
through User Generated Content.” First Monday 16: 9–5.
Robinson, W. S. 1950. “Ecological Correlations and the
Behavior of Individuals.” American Sociological Review 15
(3): 351–357.
Sakaki, T., M. Okazaki, and Y. Matsuo. 2010. “Earthquake
Shakes Twitter Users: Real-Time Event Detection by Social
Sensors.” In Proceedings of the 19th International
Conference on World wide web, April 2010, Raleigh, NC,
851–860. New York: ACM.
Silverman, B. W. 1986. Density Estimation for Statistics and
Data Analysis. London: Chapman and Hall.
Soule, L. C., L. W. Shell, and B. A. Kleen. 2003. “Exploring
Internet Addiction: Demographic Characteristics and
Stereotypes of Heavy Internet Users.” Journal of Computer
Information Systems 44 (1): 64–73.
Taylor, W. J., G. X. Zhu, J. Dekkers, and S. Marshall. 2003.
“Socio-Economic Factors Affecting Home Internet Usage
Patterns in Central Queensland.” Informing Science 6:
233–246.
Tumasjan, A., T. O. Sprenger, P. G. Sandner, and I. M. Welpe.
2010. “Predicting Elections with Twitter: What 140
Characters Reveal about Political Sentiment.” Fourth
International AAAI Conference on Weblogs and Social
Media, May 23–26, Washington, DC.
Wold, H. 1966. “Estimation of Principal Components and
Related Models by Iterative Least Squares.” In Multivariate
Analysis, edited by P. R. Krishnaiaah, 391–420. New York:
Academic Press.
Wold, S., M. Sjöström, and L. Eriksson. 2001. “PLS-Regression:
A Basic Tool of Chemometrics.” Chemometrics and
Intelligent Laboratory Systems 58: 109–130.
Zandbergen, P. A. 2009. “Accuracy of iPhone Locations: A
Comparison of Assisted GPS, WiFI and Cellular
Positioning.” Transactions in GIS 13 (s1): 5–25.
Cartography and Geographic Information Science 75
Downloadedby[BallStateUniversity]at11:2423April2013
Appendix 1. Loading matrix for PLS components in the tweet density model
Component
Explanatory variables 1 2 3 4 5
Under 5 years −0.01474 0.077405 −0.04545 −0.01982 −0.01143
5–9 years −0.02025 0.057787 −0.04379 −0.01927 −0.01
10–14 years −0.03172 0.058438 −0.02302 −0.02861 −0.01334
15–17 years −0.02153 0.022426 −0.02407 −0.01397 −0.00691
18 and 19 years 0.00196 0.016601 −0.01363 −0.00271 −0.00794
20 years 0.000168 0.006627 −0.00405 0.002784 −0.00437
21 years 0.005339 0.013494 −0.00895 −0.00116 −0.00942
22–24 years 0.008545 0.042466 0.018536 0.001079 −0.01888
25–29 years 0.022469 0.091848 0.036858 0.005309 0.002426
30–34 years 0.026894 0.080128 0.02624 0.005096 −0.00324
35–39 years 0.0304 0.042774 0.018096 0.00483 0.00986
40–44 years 0.039194 0.004091 −0.02617 −0.00161 −0.00537
45–49 years 0.017316 −0.03679 −0.00857 0.002174 0.001503
50–54 years 0.00373 −0.0826 −0.00635 0.011897 0.008826
55–59 years −0.01004 −0.10516 0.011584 0.019274 0.027798
60 and 61 years −0.00336 −0.03628 0.013421 0.002809 0.002329
62–64 years −0.01243 −0.05618 0.011172 0.010629 0.009505
65 and 66 years −0.00818 −0.03081 0.005977 0.003103 0.008006
67–69 years −0.01376 −0.04628 0.009472 0.004551 0.003888
70–74 years −0.01464 −0.04668 0.014862 0.005115 0.006863
75–79 years −0.00534 −0.03874 0.011525 0.009119 0.010758
80–84 years −0.00167 −0.01497 0.012574 −0.0014 −0.00086
85 years and over 0.001647 −0.01959 0.013749 0.00079 7.76E-06
White alone −0.04727 0.041288 −0.10895 0.855233 −0.33653
Black or African American alone 0.019714 0.001025 0.043088 −0.15585 0.061714
American Indian and Alaska Native alone 0.039311 −0.03672 −0.049 0.040688 0.018066
Asian alone 0.02731 0.032401 −0.03394 −0.3589 0.364497
Native Hawaiian and Other Pacific Islander alone 0.003158 −1.33E-05 0.000767 −0.0057 0.00913
Some other race alone −0.04648 −0.03246 0.129032 −0.3555 −0.13824
Two or more races: 0.004259 −0.00552 0.019004 −0.01998 0.021366
Less than 9th grade −0.08597 0.331632 −0.10665 −0.02253 −0.00902
9th–12th grade, no diploma −0.16101 0.136008 0.007883 0.003644 −0.01523
High school graduate, GED, or alternative −0.302 −0.11136 0.021482 0.009998 0.058959
Some college, no degree −0.13348 −0.18998 0.02426 −0.03015 −0.04115
Associate’s degree −0.01529 −0.05624 0.002854 −0.01094 −0.00396
Bachelor’s degree 0.401932 −0.0772 0.042405 0.024567 0.009941
Graduate or professional degree 0.29582 −0.03287 0.007762 0.025411 0.000464
Less than $10,000 −0.05965 0.007578 0.041237 0.033263 0.014456
$10,000–$14,999 −0.11884 −0.0168 0.083771 0.034953 0.007335
$15,000–$19,999 −0.0885 −0.00257 0.045553 0.019674 −0.00656
$20,000–$24,999 −0.08984 0.020739 0.051756 0.005561 −0.03398
$25,000–$29,999 −0.06982 −0.00308 −0.00224 0.006647 0.015886
$30,000–$34,999 −0.05892 −0.01624 0.004458 −0.00272 −0.01196
$35,000–$39,999 −0.06398 −0.00129 0.025394 −0.00427 0.006125
$40,000–$44,999 −0.03699 −0.02119 0.001722 −0.00392 0.000198
$45,000–$49,999 −0.03612 −0.01511 −0.01185 0.011192 0.017964
$50,000–$59,999 −0.04508 0.019009 −0.0025 −0.01275 −0.00141
$60,000–$74,999 −0.01536 −0.03518 −0.03475 −0.0045 0.019347
$75,000–$99,999 0.047511 −0.00607 −0.04302 −0.01023 0.023652
$100,000–$124,999 0.112712 0.001461 −0.05712 −0.02623 −0.01345
$125,000–$149,999 0.10356 0.012866 −0.03643 −0.02051 −0.00383
$150,000–$199,999 0.174187 0.013374 −0.04549 −0.02102 −0.0099
$200,000 or more 0.245107 0.042515 −0.02049 −0.00515 −0.02389
Management, business, science, and arts occupations: 0.49972 −0.17736 0.009559 0.054147 −0.02251
Service occupations: −0.14328 −0.05453 0.092773 0.047814 −0.03232
Sales and office occupations: 0.021245 0.022759 0.069738 −0.03638 −0.02886
Natural resources, construction, and maintenance occupations: −0.25131 0.09103 −0.11676 −0.03511 0.029064
Production, transportation, and material moving occupations: −0.12638 0.118096 −0.05531 −0.03048 0.054624
76 L. Li et al.
Downloadedby[BallStateUniversity]at11:2423April2013
Appendix 2. Loading matrix for PLS components in the photo density model.
Component
Explanatory variables 1 2 3 4 5
Under 5 years −0.03985 −0.06968 −0.04157 0.035035 0.009508
5–9 years −0.03863 −0.06033 −0.03379 0.018745 0.005705
10–14 years −0.04948 −0.04034 −0.00729 0.030802 −0.01749
15–17 years −0.02739 −0.0298 −0.0105 0.001409 0.003521
18 and 19 years −0.00368 −0.02068 −0.01886 −0.00211 −0.01066
20 years 0.001142 −0.00986 −0.00504 −0.00208 −0.00873
21 years 0.001123 −0.01489 −0.01485 0.000999 −0.00771
22–24 years 0.001153 −0.01669 −0.00644 0.024757 −0.02822
25–29 years −0.00304 −0.02076 −0.00858 0.078787 −0.03802
30–34 years 0.00358 −0.02021 −0.01408 0.070264 −0.02725
35–39 years 0.010636 0.000369 −0.01345 0.047709 −0.01351
40–44 years 0.022184 −0.00871 −0.04678 0.000559 0.014569
45–49 years 0.020005 0.019069 −0.00946 −0.02387 0.013253
50–54 years 0.027076 0.038446 0.014259 −0.05507 0.028873
55–59 years 0.024689 0.058246 0.040607 −0.06575 0.022882
60 and 61 years 0.008444 0.027431 0.020304 −0.01986 −0.00012
62–64 years 0.011037 0.032313 0.032863 −0.03226 0.014433
65 and 66 years 0.002372 0.017243 0.015113 −0.01997 0.009119
67–69 years 0.004825 0.025543 0.026478 −0.02984 0.011136
70–74 years 0.004335 0.029439 0.030894 −0.02749 0.012094
75–79 years 0.009549 0.023947 0.020369 −0.02343 0.006073
80–84 years 0.002783 0.018798 0.015552 −0.00183 −2.78E-05
85 years and over 0.007138 0.02111 0.01425 −0.00551 5.71E-04
White alone 0.64126 −0.54537 0.384937 −0.04463 −0.05466
Black or African American alone −0.11371 0.098182 −0.08834 −0.0114 −0.04436
American Indian and Alaska Native alone 0.051597 −0.02843 −0.05185 −0.03109 0.121559
Asian alone −0.33726 0.269738 −0.17645 0.099653 −0.19034
Native Hawaiian and Other Pacific Islander alone −0.00434 7.60E-03 −0.00337 0.003228 −0.00463
Some other race alone −0.22297 0.165412 −0.06804 −0.02706 0.172535
Two or more races: −0.01458 0.032869 0.003109 0.011299 −0.0001
Less than 9th grade −0.14529 −0.25007 −0.05677 0.209129 −0.00659
9th–12th grade, no diploma −0.12839 −0.14642 0.06076 0.02307 −0.03606
High school graduate, GED, or alternative −0.17274 −0.03592 0.212701 −0.14922 0.089943
Some college, no degree −0.05122 0.058598 0.100471 −0.2111 −0.0332
Associate’s degree −0.00306 0.032506 0.01773 −0.05174 −0.03535
Bachelor’s degree 0.288224 0.213345 −0.18656 0.095178 0.006815
Graduate or professional degree 0.212482 0.127951 −0.14833 0.084679 0.014442
Less than $10,000 −0.02021 −0.02446 0.057107 −0.01381 0.001242
$10,000–$14,999 −0.04735 −0.01043 0.123375 −0.03851 −0.00763
$15,000–$19,999 −0.0397 −0.02443 0.07639 −0.03061 −0.00025
$20,000–$24,999 −0.04602 −0.02101 0.090227 0.001759 −0.01519
$25,000–$29,999 −0.04342 −0.0292 0.038342 −0.02533 0.014401
$30,000–$34,999 −0.0333 −0.0215 0.028601 −0.04265 0.013107
$35,000–$39,999 −0.0432 −0.00371 0.053541 −0.00873 0.003684
$40,000–$44,999 −0.02043 0.00364 0.030614 −0.01959 0.008193
$45,000–$49,999 −0.01826 −0.02243 0.008614 −0.03616 0.011423
$50,000–$59,999 −0.04024 −0.01621 0.026354 0.008339 −0.00548
$60,000–$74,999 −0.01251 −0.00814 −0.01824 −0.0494 0.013102
$75,000–$99,999 0.017281 0.015004 −0.0481 0.006423 −0.03166
$100,000–$124,999 0.056167 0.027812 −0.09461 0.03855 0.021724
$125,000–$149,999 0.048939 0.026345 −0.0844 0.03963 −0.00837
$150,000–$199,999 0.094066 0.044338 −0.13391 0.058537 0.004909
$200,000 or more 0.148192 0.064382 −0.15391 0.111557 −0.02321
Management, business, science, and arts occupations: 0.393605 0.217233 −0.29509 −0.0328 0.032189
Service occupations: −0.03144 0.015353 0.190332 −0.04594 −0.02918
Sales and office occupations: −0.00541 0.056445 0.020075 0.03938 −0.06822
Natural resources, construction, and maintenance occupations: −0.21088 −0.16301 0.090635 0.009872 0.037683
Production, transportation, and material moving occupations: −0.14587 −0.12602 −0.00595 0.029492 0.027525
Cartography and Geographic Information Science 77
Downloadedby[BallStateUniversity]at11:2423April2013

More Related Content

What's hot

Kim, M.J., & Park, H. W. (2012). Measuring Twitter-Based Political Participat...
Kim, M.J., & Park, H. W. (2012). Measuring Twitter-Based Political Participat...Kim, M.J., & Park, H. W. (2012). Measuring Twitter-Based Political Participat...
Kim, M.J., & Park, H. W. (2012). Measuring Twitter-Based Political Participat...Han Woo PARK
 
Research Proposal : Political Representation of Different types of voters on ...
Research Proposal : Political Representation of Different types of voters on ...Research Proposal : Political Representation of Different types of voters on ...
Research Proposal : Political Representation of Different types of voters on ...Joshua Wong
 
The effects of Facebook use on civic participation attitudes and behaviour: A...
The effects of Facebook use on civic participation attitudes and behaviour: A...The effects of Facebook use on civic participation attitudes and behaviour: A...
The effects of Facebook use on civic participation attitudes and behaviour: A...Mark Dix
 
Social implications of social networking sites
Social implications of social networking sitesSocial implications of social networking sites
Social implications of social networking sitesPetter Bae Brandtzæg
 
Jason A. Cohen - Political Communication Literature Review and Analysis Paper
Jason A. Cohen - Political Communication Literature Review and Analysis PaperJason A. Cohen - Political Communication Literature Review and Analysis Paper
Jason A. Cohen - Political Communication Literature Review and Analysis PaperJason A. Cohen
 
Social Media: the good, the bad and the ugly
Social Media: the good, the bad and the uglySocial Media: the good, the bad and the ugly
Social Media: the good, the bad and the uglyJosh Cowls
 
PREDICTING POPULARITY OF KOREAN CONTENTS IN ARAB COUNTRIES USING A DATA MININ...
PREDICTING POPULARITY OF KOREAN CONTENTS IN ARAB COUNTRIES USING A DATA MININ...PREDICTING POPULARITY OF KOREAN CONTENTS IN ARAB COUNTRIES USING A DATA MININ...
PREDICTING POPULARITY OF KOREAN CONTENTS IN ARAB COUNTRIES USING A DATA MININ...csandit
 
Mapping Interpersonal Risk Communication networks: Some Evidences from Twitt...
Mapping Interpersonal Risk Communication networks:  Some Evidences from Twitt...Mapping Interpersonal Risk Communication networks:  Some Evidences from Twitt...
Mapping Interpersonal Risk Communication networks: Some Evidences from Twitt...Han Woo PARK
 
Relational Development & Interpersonal Communication In Computer Mediated Con...
Relational Development & Interpersonal Communication In Computer Mediated Con...Relational Development & Interpersonal Communication In Computer Mediated Con...
Relational Development & Interpersonal Communication In Computer Mediated Con...maxbury
 
Foye 2015 THESIS
Foye 2015 THESISFoye 2015 THESIS
Foye 2015 THESISJohn Foye
 
Users’ Perceptions, Attitudes and Continuance Intentions of Facebook: Insight...
Users’ Perceptions, Attitudes and Continuance Intentions of Facebook: Insight...Users’ Perceptions, Attitudes and Continuance Intentions of Facebook: Insight...
Users’ Perceptions, Attitudes and Continuance Intentions of Facebook: Insight...IOSR Journals
 
Protesting in the age of social media WAPOR Bogota 2012
Protesting in the age of social media WAPOR Bogota 2012Protesting in the age of social media WAPOR Bogota 2012
Protesting in the age of social media WAPOR Bogota 2012Sebastián Valenzuela
 
The Impacts of Social Networking and Its Analysis
The Impacts of Social Networking and Its AnalysisThe Impacts of Social Networking and Its Analysis
The Impacts of Social Networking and Its AnalysisIJMER
 
Suazo%2c martínez & elgueta english version
Suazo%2c martínez & elgueta english versionSuazo%2c martínez & elgueta english version
Suazo%2c martínez & elgueta english version2011990
 
Visually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of LifeVisually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of LifeHarish Vaidyanathan
 
Big data analytics: from threatening privacy to challenging democracy
Big data analytics: from threatening privacy to challenging democracyBig data analytics: from threatening privacy to challenging democracy
Big data analytics: from threatening privacy to challenging democracySamos2019Summit
 

What's hot (20)

Kim, M.J., & Park, H. W. (2012). Measuring Twitter-Based Political Participat...
Kim, M.J., & Park, H. W. (2012). Measuring Twitter-Based Political Participat...Kim, M.J., & Park, H. W. (2012). Measuring Twitter-Based Political Participat...
Kim, M.J., & Park, H. W. (2012). Measuring Twitter-Based Political Participat...
 
Asymmetric polarization
Asymmetric polarizationAsymmetric polarization
Asymmetric polarization
 
Research Proposal : Political Representation of Different types of voters on ...
Research Proposal : Political Representation of Different types of voters on ...Research Proposal : Political Representation of Different types of voters on ...
Research Proposal : Political Representation of Different types of voters on ...
 
The effects of Facebook use on civic participation attitudes and behaviour: A...
The effects of Facebook use on civic participation attitudes and behaviour: A...The effects of Facebook use on civic participation attitudes and behaviour: A...
The effects of Facebook use on civic participation attitudes and behaviour: A...
 
Social implications of social networking sites
Social implications of social networking sitesSocial implications of social networking sites
Social implications of social networking sites
 
Jason A. Cohen - Political Communication Literature Review and Analysis Paper
Jason A. Cohen - Political Communication Literature Review and Analysis PaperJason A. Cohen - Political Communication Literature Review and Analysis Paper
Jason A. Cohen - Political Communication Literature Review and Analysis Paper
 
Social Media: the good, the bad and the ugly
Social Media: the good, the bad and the uglySocial Media: the good, the bad and the ugly
Social Media: the good, the bad and the ugly
 
INST633_FinalProject
INST633_FinalProjectINST633_FinalProject
INST633_FinalProject
 
Project_Report
Project_ReportProject_Report
Project_Report
 
PREDICTING POPULARITY OF KOREAN CONTENTS IN ARAB COUNTRIES USING A DATA MININ...
PREDICTING POPULARITY OF KOREAN CONTENTS IN ARAB COUNTRIES USING A DATA MININ...PREDICTING POPULARITY OF KOREAN CONTENTS IN ARAB COUNTRIES USING A DATA MININ...
PREDICTING POPULARITY OF KOREAN CONTENTS IN ARAB COUNTRIES USING A DATA MININ...
 
Mapping Interpersonal Risk Communication networks: Some Evidences from Twitt...
Mapping Interpersonal Risk Communication networks:  Some Evidences from Twitt...Mapping Interpersonal Risk Communication networks:  Some Evidences from Twitt...
Mapping Interpersonal Risk Communication networks: Some Evidences from Twitt...
 
How to get started with Data Journalism
How to get started with Data JournalismHow to get started with Data Journalism
How to get started with Data Journalism
 
Relational Development & Interpersonal Communication In Computer Mediated Con...
Relational Development & Interpersonal Communication In Computer Mediated Con...Relational Development & Interpersonal Communication In Computer Mediated Con...
Relational Development & Interpersonal Communication In Computer Mediated Con...
 
Foye 2015 THESIS
Foye 2015 THESISFoye 2015 THESIS
Foye 2015 THESIS
 
Users’ Perceptions, Attitudes and Continuance Intentions of Facebook: Insight...
Users’ Perceptions, Attitudes and Continuance Intentions of Facebook: Insight...Users’ Perceptions, Attitudes and Continuance Intentions of Facebook: Insight...
Users’ Perceptions, Attitudes and Continuance Intentions of Facebook: Insight...
 
Protesting in the age of social media WAPOR Bogota 2012
Protesting in the age of social media WAPOR Bogota 2012Protesting in the age of social media WAPOR Bogota 2012
Protesting in the age of social media WAPOR Bogota 2012
 
The Impacts of Social Networking and Its Analysis
The Impacts of Social Networking and Its AnalysisThe Impacts of Social Networking and Its Analysis
The Impacts of Social Networking and Its Analysis
 
Suazo%2c martínez & elgueta english version
Suazo%2c martínez & elgueta english versionSuazo%2c martínez & elgueta english version
Suazo%2c martínez & elgueta english version
 
Visually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of LifeVisually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of Life
 
Big data analytics: from threatening privacy to challenging democracy
Big data analytics: from threatening privacy to challenging democracyBig data analytics: from threatening privacy to challenging democracy
Big data analytics: from threatening privacy to challenging democracy
 

Similar to 15230406.2013.777139

Running&head&YIK&YAK&AS&AN&ETHNOGRAPHIC&TOOL& &&&&&& 1&.docx
Running&head&YIK&YAK&AS&AN&ETHNOGRAPHIC&TOOL& &&&&&& 1&.docxRunning&head&YIK&YAK&AS&AN&ETHNOGRAPHIC&TOOL& &&&&&& 1&.docx
Running&head&YIK&YAK&AS&AN&ETHNOGRAPHIC&TOOL& &&&&&& 1&.docxanhlodge
 
Zook making sense of geosocial media-final
Zook making sense of geosocial media-finalZook making sense of geosocial media-final
Zook making sense of geosocial media-finaloiisdp
 
Is the Age of privacy over? Facebook, Privacy and Qualitative Research
Is the Age of privacy over?  Facebook, Privacy and Qualitative ResearchIs the Age of privacy over?  Facebook, Privacy and Qualitative Research
Is the Age of privacy over? Facebook, Privacy and Qualitative ResearchLisa Blenkinsop
 
Juventud y Redes Sociales: Motivaciones y usos frecuentes
Juventud y Redes Sociales: Motivaciones y usos frecuentesJuventud y Redes Sociales: Motivaciones y usos frecuentes
Juventud y Redes Sociales: Motivaciones y usos frecuentesMaría Janeth Ríos C.
 
Tfsc disc 2014 si proposal (30 june2014)
Tfsc disc 2014 si proposal (30 june2014)Tfsc disc 2014 si proposal (30 june2014)
Tfsc disc 2014 si proposal (30 june2014)Han Woo PARK
 
A LITERATURE ANALYSIS ABOUT SOCIAL INFORMATION CONTRIBUTION AND CONSUMPTION O...
A LITERATURE ANALYSIS ABOUT SOCIAL INFORMATION CONTRIBUTION AND CONSUMPTION O...A LITERATURE ANALYSIS ABOUT SOCIAL INFORMATION CONTRIBUTION AND CONSUMPTION O...
A LITERATURE ANALYSIS ABOUT SOCIAL INFORMATION CONTRIBUTION AND CONSUMPTION O...Susan Campos
 
Usage of YouTube Content among Chennai Urban Women.pdf
Usage of YouTube Content among Chennai Urban Women.pdfUsage of YouTube Content among Chennai Urban Women.pdf
Usage of YouTube Content among Chennai Urban Women.pdfPugalendhiR
 
Article.pdf● Zeynep Turan, Hasan Tinmaz and Yuksel Goktas.docx
Article.pdf● Zeynep Turan, Hasan Tinmaz and Yuksel Goktas.docxArticle.pdf● Zeynep Turan, Hasan Tinmaz and Yuksel Goktas.docx
Article.pdf● Zeynep Turan, Hasan Tinmaz and Yuksel Goktas.docxfredharris32
 
HISTORY AND HUMANITIES LENS ON MOBILE DEVICES 1 HISTORY A
HISTORY AND HUMANITIES LENS ON MOBILE DEVICES 1 HISTORY AHISTORY AND HUMANITIES LENS ON MOBILE DEVICES 1 HISTORY A
HISTORY AND HUMANITIES LENS ON MOBILE DEVICES 1 HISTORY ASusanaFurman449
 
Suazo, martínez & elgueta english version
Suazo, martínez & elgueta english versionSuazo, martínez & elgueta english version
Suazo, martínez & elgueta english version2011990
 
Examining the Ability of Extroversion
Examining the Ability of ExtroversionExamining the Ability of Extroversion
Examining the Ability of ExtroversionJulia Chapman
 
Social surveillance aoife shona
Social surveillance aoife shonaSocial surveillance aoife shona
Social surveillance aoife shonaAoife Brown
 
palen-crisisinformatics
palen-crisisinformaticspalen-crisisinformatics
palen-crisisinformaticsSophia B Liu
 
The Impact of Social Media (Facebook/YouTube) on the Politically Interest of ...
The Impact of Social Media (Facebook/YouTube) on the Politically Interest of ...The Impact of Social Media (Facebook/YouTube) on the Politically Interest of ...
The Impact of Social Media (Facebook/YouTube) on the Politically Interest of ...journal ijrtem
 
Social surveillance
Social surveillanceSocial surveillance
Social surveillancerooneys27
 
Authoritative and Volunteered Geographical Information in a Developing Countr...
Authoritative and Volunteered Geographical Information in a Developing Countr...Authoritative and Volunteered Geographical Information in a Developing Countr...
Authoritative and Volunteered Geographical Information in a Developing Countr...rsmahabir
 
Social Media Networking Site Usage Demographics Stats
Social Media Networking Site Usage Demographics StatsSocial Media Networking Site Usage Demographics Stats
Social Media Networking Site Usage Demographics Statsrishibajaj8
 
The Use Of Social Networking Sites among the Undergraduate Students of Sokoin...
The Use Of Social Networking Sites among the Undergraduate Students of Sokoin...The Use Of Social Networking Sites among the Undergraduate Students of Sokoin...
The Use Of Social Networking Sites among the Undergraduate Students of Sokoin...musadoto
 
(January 2012) ALISE 2012
(January 2012) ALISE 2012(January 2012) ALISE 2012
(January 2012) ALISE 2012Carolyn Hank
 

Similar to 15230406.2013.777139 (20)

Running&head&YIK&YAK&AS&AN&ETHNOGRAPHIC&TOOL& &&&&&& 1&.docx
Running&head&YIK&YAK&AS&AN&ETHNOGRAPHIC&TOOL& &&&&&& 1&.docxRunning&head&YIK&YAK&AS&AN&ETHNOGRAPHIC&TOOL& &&&&&& 1&.docx
Running&head&YIK&YAK&AS&AN&ETHNOGRAPHIC&TOOL& &&&&&& 1&.docx
 
Zook making sense of geosocial media-final
Zook making sense of geosocial media-finalZook making sense of geosocial media-final
Zook making sense of geosocial media-final
 
Is the Age of privacy over? Facebook, Privacy and Qualitative Research
Is the Age of privacy over?  Facebook, Privacy and Qualitative ResearchIs the Age of privacy over?  Facebook, Privacy and Qualitative Research
Is the Age of privacy over? Facebook, Privacy and Qualitative Research
 
Juventud y Redes Sociales: Motivaciones y usos frecuentes
Juventud y Redes Sociales: Motivaciones y usos frecuentesJuventud y Redes Sociales: Motivaciones y usos frecuentes
Juventud y Redes Sociales: Motivaciones y usos frecuentes
 
Tfsc disc 2014 si proposal (30 june2014)
Tfsc disc 2014 si proposal (30 june2014)Tfsc disc 2014 si proposal (30 june2014)
Tfsc disc 2014 si proposal (30 june2014)
 
A LITERATURE ANALYSIS ABOUT SOCIAL INFORMATION CONTRIBUTION AND CONSUMPTION O...
A LITERATURE ANALYSIS ABOUT SOCIAL INFORMATION CONTRIBUTION AND CONSUMPTION O...A LITERATURE ANALYSIS ABOUT SOCIAL INFORMATION CONTRIBUTION AND CONSUMPTION O...
A LITERATURE ANALYSIS ABOUT SOCIAL INFORMATION CONTRIBUTION AND CONSUMPTION O...
 
Usage of YouTube Content among Chennai Urban Women.pdf
Usage of YouTube Content among Chennai Urban Women.pdfUsage of YouTube Content among Chennai Urban Women.pdf
Usage of YouTube Content among Chennai Urban Women.pdf
 
Article.pdf● Zeynep Turan, Hasan Tinmaz and Yuksel Goktas.docx
Article.pdf● Zeynep Turan, Hasan Tinmaz and Yuksel Goktas.docxArticle.pdf● Zeynep Turan, Hasan Tinmaz and Yuksel Goktas.docx
Article.pdf● Zeynep Turan, Hasan Tinmaz and Yuksel Goktas.docx
 
HISTORY AND HUMANITIES LENS ON MOBILE DEVICES 1 HISTORY A
HISTORY AND HUMANITIES LENS ON MOBILE DEVICES 1 HISTORY AHISTORY AND HUMANITIES LENS ON MOBILE DEVICES 1 HISTORY A
HISTORY AND HUMANITIES LENS ON MOBILE DEVICES 1 HISTORY A
 
Suazo, martínez & elgueta english version
Suazo, martínez & elgueta english versionSuazo, martínez & elgueta english version
Suazo, martínez & elgueta english version
 
Examining the Ability of Extroversion
Examining the Ability of ExtroversionExamining the Ability of Extroversion
Examining the Ability of Extroversion
 
Social surveillance aoife shona
Social surveillance aoife shonaSocial surveillance aoife shona
Social surveillance aoife shona
 
palen-crisisinformatics
palen-crisisinformaticspalen-crisisinformatics
palen-crisisinformatics
 
The Impact of Social Media (Facebook/YouTube) on the Politically Interest of ...
The Impact of Social Media (Facebook/YouTube) on the Politically Interest of ...The Impact of Social Media (Facebook/YouTube) on the Politically Interest of ...
The Impact of Social Media (Facebook/YouTube) on the Politically Interest of ...
 
Social surveillance
Social surveillanceSocial surveillance
Social surveillance
 
Authoritative and Volunteered Geographical Information in a Developing Countr...
Authoritative and Volunteered Geographical Information in a Developing Countr...Authoritative and Volunteered Geographical Information in a Developing Countr...
Authoritative and Volunteered Geographical Information in a Developing Countr...
 
24.pdf
24.pdf24.pdf
24.pdf
 
Social Media Networking Site Usage Demographics Stats
Social Media Networking Site Usage Demographics StatsSocial Media Networking Site Usage Demographics Stats
Social Media Networking Site Usage Demographics Stats
 
The Use Of Social Networking Sites among the Undergraduate Students of Sokoin...
The Use Of Social Networking Sites among the Undergraduate Students of Sokoin...The Use Of Social Networking Sites among the Undergraduate Students of Sokoin...
The Use Of Social Networking Sites among the Undergraduate Students of Sokoin...
 
(January 2012) ALISE 2012
(January 2012) ALISE 2012(January 2012) ALISE 2012
(January 2012) ALISE 2012
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

15230406.2013.777139

  • 1. This article was downloaded by: [Ball State University] On: 23 April 2013, At: 11:24 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Cartography and Geographic Information Science Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/tcag20 Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr Linna Li a , Michael F. Goodchild a & Bo Xu b a Department of Geography, Center for Spatial Studies, University of California, Santa Barbara, CA, USA b Department of Geography and Environmental Studies, California State University, San Bernardino, CA, USA Version of record first published: 19 Apr 2013. To cite this article: Linna Li , Michael F. Goodchild & Bo Xu (2013): Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr, Cartography and Geographic Information Science, 40:2, 61-77 To link to this article: http://dx.doi.org/10.1080/15230406.2013.777139 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.
  • 2. Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr Linna Lia *, Michael F. Goodchilda and Bo Xub a Department of Geography, Center for Spatial Studies, University of California, Santa Barbara, CA, USA; b Department of Geography and Environmental Studies, California State University, San Bernardino, CA, USA (Received 11 September 2012; accepted 27 January 2013) Online social networking and information sharing services have generated large volumes of spatio-temporal footprints, which are potentially a valuable source of knowledge about the physical environment and social phenomena. However, it is critical to take into consideration the uneven distribution of the data generated in social media in order to understand the nature of such data and to use them appropriately. The distribution of footprints and the characteristics of contributors indicate the quantity, quality, and type of the data. Using georeferenced tweets and photos collected from Twitter and Flickr, this research presents the spatial and temporal patterns of such crowd-sourced geographic data in the contiguous United States and explores the socioeconomic characteristics of geographic data creators by investigating the relationships between tweet and photo densities and the characteristics of local people using California as a case study. Correlations between dependent and independent variables in partial least squares regression suggest that well-educated people in the occupations of management, business, science, and arts are more likely to be involved in the generation of georeferenced tweets and photos. Further research is required to explain why some people tend to produce and spread information over the Internet using social media from the perspectives of psychology and sociology. This study would be informative to sociologists who study the behaviors of social media users, geographers who are interested in the spatial and temporal distribution of social media users, marketing agencies who intend to understand the influence of social media, and other scientists who use social media data in their research. Keywords: spatio-temporal footprints; socioeconomic; Flickr; Twitter; georeference Introduction There has been a rapid expansion in the use of social media and data sharing services recently. Data generated by these sources have been widely used to study social networks (Huberman, Romero, and Wu 2008; Lerman and Ghosh 2010) and behavioral trends (Sakaki, Okazaki, and Matsuo 2010; Bollen, Mao, and Zeng 2011). However, it is critical to understand the uneven distribution of such data sources in order to evaluate their validity, accuracy, representativeness, and uncertainty when they are used to imply the social and behavioral characteristics of the users. This article, using Twitter and Flickr as two exam- ples, explores spatiotemporal patterns of geographic data generated in social media, within the bounding box of the contiguous United States and further infers the character- istics of the users by examining the relationships between geographic data densities and the socioeconomic charac- teristics of local residents at the county level using California as a case study. Twitter is a popular social media site that is widely used for daily chatter, conversations, sharing information, and reporting news (Java et al. 2007). Flickr is an online photo management service that allows uploading and shar- ing photos within and outside of groups. Tweeting and photo-taking behaviors, specifically the distribution of tweets and photos, may suggest the diverse characteristics of different users. To understand spatiotemporal patterns of tweets and photos, we study two cases: georeferenced tweet messages in Twitter and georeferenced photos in Flickr, which are used as proxies for the spatio-temporal footprints of their creators. In the past few years, the two data sources have been widely used to investigate research questions in different disciplines. For example, the loca- tion and spatial boundary of places may be delineated based on the aggregation of geotagged photos in Flickr (Hollenstein and Purves 2010; Li and Goodchild 2012). Representative photos at different locations and tourist paths can also be extracted by analyzing spatial, temporal, and visual information associated with Flickr photos (Crandall et al. 2009). Both Antoniou, Morley, and Haklay (2010) and Purves, Edwardes, and Wood (2011) carried out comparative studies of Flickr and other image collections and have investigated issues in such collec- tions, including bias. Ames and Naaman (2007) have studied motivations in tagging. The similarity of spatial and temporal information associated with photos provided by different contributors may even indicate the probability of a social tie between them (Crandall et al. 2010). Twitter, the other rich data source, has been used to study people’s response to emergencies (Goodchild and Glennon 2010; *Corresponding author. Email: linna@geog.ucsb.edu Cartography and Geographic Information Science, 2013 Vol. 40, No. 2, 61–77, http://dx.doi.org/10.1080/15230406.2013.777139 © 2013 Cartography and Geographic Information Society Downloadedby[BallStateUniversity]at11:2423April2013
  • 3. Sakaki, Okazaki, and Matsuo 2010), the automatic detec- tion of local events (Lee and Sumiya 2010), and predict election results based on the sentiments expressed in tweets (Tumasjan et al. 2010). Furthermore, check-ins collected from location-sharing services were used to study human mobility patterns (Cheng et al. 2011). Although data collected from social media, such as Twitter, have been increasingly used to study geographic landscapes and human behaviors (Li and Goodchild, 2013), it is difficult to estimate the representativeness of such data. Despite the various studies, thus far there is no research on the socio-demographic characteristics of users, which is of great value since georeferenced data from Twitter and Flickr, are implicative of the characteristics of places, as well as local residents. However, research has been done on socio-demo- graphic characteristics of Internet users using surveys in many countries. For example, Soule, Shell, and Kleen (2003) found that gender is not a significant variable in explaining heavy Internet usage, but education is, based on the data from the Tenth Graphic, Visualizations, and Usability Center (GVU) Survey conducted on the Web. A study in the Philippines showed that younger, more afflu- ent, and well-educated people in places with better infra- structure are more capable of using Information and Communications Technology (ICT, Alampay 2006). Different Internet usage patterns of people from different socio-economic groups were identified in central Queensland (Taylor et al. 2003). As demonstrated in these studies, the characteristics of Internet users are cru- cial for understanding a range of relevant phenomena, such as Internet addiction, social opportunities through the access to ICT, and behavioral patterns in using such technologies. Since conducting surveys is time-consuming and labor-intensive, all the studies primarily collect data through questionnaires, so they can only rely on a small number of participants. In our study, we use geographic location as a link to associate social media usage and characteristics of local residents based on the data auto- matically collected using social media APIs and the aux- iliary census data. This study provides an exploratory analysis of a subset of Twitter and Flickr users, those who provide locational information for tweets and photos, in terms of their demo- graphic and socioeconomic properties at the county level in California. Georeferenced tweets and photos indicate the presence of their creators at that location. There are three major reasons why people are present at a particular location: location of residence, location of work, or loca- tion of tourist attractions. In this article, we select geor- eferenced tweets and photos contributed by local residents to explore the demographic and socioeconomic character- istics of these users. A user is considered a local resident in a county only when the time interval between two tweets or photos produced in that specific county by the user is longer than 10 days. The remainder of the article is structured as follows. The section “Twitter and Flickr data collection and pre- processing” describes the collection and pre-processing of georeferenced data from Twitter and Flickr. The section “The spatial distribution of georeferenced tweets and photos” presents the spatial distributions of georeferenced tweets and photos over the contiguous United States, followed by a discussion of the temporal patterns of geor- eferenced tweets and photos in the section “Temporal patterns of tweets and photos.” We propose two descrip- tive models in the section “Descriptive models of tweet and photo densities in California” to illustrate the relation- ships between the tweet and photo densities and the char- acteristics of people in different counties of California. The article concludes with a discussion of implications and future research directions. Twitter and Flickr data collection and pre-processing Tweets and photo metadata were collected using Twitter and Flickr’s public APIs and stored in a MySQL database. We collected data from 21 January to 7 March 2011; these dates were chosen to avoid major events that might cause unusual patterns. In total, there are 19,758,954 records for Twitter and 4,263,227 records for Flickr within the bound- ing box of the contiguous United States. Location asso- ciated with each tweet is in a variety of forms with different precision levels. It may be automatically captured by built-in Global Positioning System (GPS) receivers in mobile devices like smart phones, calculated according to the relative position of the user’s equipment in a cellular network, or manually selected by a user from a set of place names provided by Twitter. In the first case, location is in the form of latitude and longitude, while in other cases location is usually recorded as a neighborhood, a city, or even a country. Other than coordinates, Twitter takes the estimated location of a user’s device or an Internet Protocol (IP) address of a computer and reverse geocodes it to a few possible places provided to the user for selection. The positional accuracy varies from one method to another. For location recorded by GPS, it is usually at the magnitude of several meters. For location determined by triangulation in a cellular network, accu- racy ranges from 30 to 3000 m, depending on the spatial distribution of cells (Zandbergen 2009). For IP address, the positional accuracy of georeference depends on the method used to convert IP addresses to geographic coor- dinates, usually at the level of ZIP code, city, state, or even country. For example, Maxmind’s free GeoLite City data- base claims that the spatial accuracy of georeference is “over 99.5% on a country level and 78% on a city level for the U.S. within a 40 kilometer radius.” Finally, the 62 L. Li et al. Downloadedby[BallStateUniversity]at11:2423April2013
  • 4. accuracy of a place name depends on the spatial extent of the place. Information about the tweets in the database contains tweet ID, tweet text, time, location, and user ID. In Flickr, photos were either georeferenced by built-in GPS in cameras or manually georeferenced by a user who identified photo location on a map. The location could either be the place where a photo was taken or be the location of an object in the photo. Automatic recording by a GPS receiver is always the former case, while manually georeferenced photos could be either way. One typical error in location of photos occurs when a user uploads a group of photos that involve several places to the same location. Photo metadata contain information about photo ID, photo title, description, tags, upload time, time when a photo was taken, location, and owner ID. For both tweets and photos, the locations are resolved to five decimal places of latitude and longitude (approxi- mately 1 m), but we should expect that the accuracy of location is dependent on the accuracy of GPS in mobile devices (which could be several meters) or the map scale when a user specifies a photo location. Because the objec- tive of this article is to study the spatial and temporal patterns of tweets and photos, only data that have point locational information with relatively high precision are used, and those that are not georeferenced are excluded. It is estimated that the percentage of georeferenced tweets is less than 1% and geotagged photos around 3.33%.1 However, the total numbers of tweets and photos are very large, so we can still obtain great volumes of geor- eferenced data. In addition, we must be aware that these data were contributed by users who are willing to share their locations and not by everyone who uses the two services. Therefore, the data are a subset of the entire datasets of Twitter and Flickr given spatial and temporal constraints, and the users are a subset of the entire user groups. Like other data created by volunteers, there is bias in terms of contributions made by different users, because most contributions come from a very small percentage of the total number of contributors. For instance, “In most online communities, 90% of users are lurkers who never contribute, 9% of users contribute a little, and 1% of users account for almost all the action” (Nielsen 2006). Haklay (2010) showed that most of the data for England were contributed by only a few users and the difference of road data coverage between wealthy areas and poor areas is about 8% in OpenStreetMap (OSM). Contribution bias is also present in our datasets. The 300 heaviest contributors of local Twitter and Flickr users who share geographic footprints are represented in Figure 1a and 1b, showing the long tail effect: a large number of tweets and photos are created by the first few hundred contributors. When examining the relationships between georefer- enced data densities and socioeconomic characteristics of residents in California, we verify that the data were produced by local users. First, we chose county as the data aggregation level, because a person is more likely to live in one census tract and work in another. Therefore, it is difficult to tell whether a location is a user’s home or work place at a finer spatial scale. By contrast, people are more likely to live and work in the same county. According to the 2000 Census Bureau county-to-county commuting data for California, the percentage of resi- dents who commute within the same county is as high as 83%. Second, we calculated the time a user stays in a county by comparing the time interval between two tweets and photos that are produced by the same user. Only when a time interval is greater than 10 days, a user is regarded as local, and data created by this user are retained for further analysis. Correlations between tweet and photo densities and contributors’ properties were calculated at the county level. Ideally, socioeconomic characteristics of users would be determined at the individual level, but that type of data is not available for obvious reasons, so loca- tions were used to link the data densities and the residents. This type of correlation based on group data rather than individual data is called ecological correlation (Robinson, 1950). Ecological correlations between tweet and photo densities and the socioeconomic characteristics of people suggest that certain people with specific characteristics are more involved in the generation of georeferenced tweets and photos. However, it would be fallacious to infer individual behaviors from data aggregated to geographic areas (Openshaw 1984; Piantadosi, Byar, and Green 1988; King 1997). For example, correlation between the number of tweets from a place and the number of Native Americans present in that place does not imply that Native Americans are more likely to tweet. This study is a first step toward an understanding of the relationships between georeferenced tweets and photos and population; the results suggest that it would be valuable to further investigate these relationships. The spatial distribution of georeferenced tweets and photos We plotted the locations of georeferenced tweets on a map. As demonstrated in Figure 2, tweet locations roughly describe the administrative boundary of the United States and major roads at a very good resolution, which is similar to the representation of Flickr photos in other research (Crandall et al. 2009). Figure 3 shows georeferenced tweets in part of Los Angeles. At this scale, the blocks and local roads are delineated by tweet locations. For instance, tweet locations are well aligned with the location and shape of freeways, such as Interstate 405, as well as some local roads. High density along major roads might indicate people tweeting from vehicles, and perhaps from locations adjacent to major roads such as hotels and gas stations as well. Cartography and Geographic Information Science 63 Downloadedby[BallStateUniversity]at11:2423April2013
  • 5. Flickr photos have similar spatial patterns to tweet locations. However, the number of photos is substantially smaller than that of tweets during the same time period. It takes more effort to take and upload photos than it does to generate tweets. Despite a smaller number of photos than tweets, some places are associated with more photos. Intensive tweets are usually generated at places with high population density, such as big metropolitan areas; 3500 3000 2500 2000 1500 1000 20,000 18,000 16,000 14,000 12,000 10,000 8000 6000 4000 2000 0 500 0 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 0 20 40 60 80 100 120 (b) 140 160 180 200 220 240 260 280 300 Numberofgeoreferencedtweets Numberofgeoreferencedphotos Ranked user - Top 300 users generating most tweets Ranked user - Top 300 users generating most photos (a) Figure 1. (a) The number of georeferenced tweets generated by the top 300 contributors (highest: left; lowest: right). (b) The number of georeferenced photos generated by the top 300 contributors (highest: left; lowest: right). 64 L. Li et al. Downloadedby[BallStateUniversity]at11:2423April2013
  • 6. Figure 2. Georeferenced tweets within the bounding box of the contiguous United States. Figure 3. A close-up of georeferenced tweets in part of Los Angeles. Cartography and Geographic Information Science 65 Downloadedby[BallStateUniversity]at11:2423April2013
  • 7. however, many photos are also taken at places with low population density, such as Yosemite National Park. To estimate the number of tweet and photo occur- rences per unit area, we performed a kernel density ana- lysis of the national data using tweet and photo locations. Kernel density is a way of estimating the intensity of points by creating a smooth surface using a bivariate probability density function (Bailey and Gatrell 1995). The kernel estimator is defined as f ðxÞ ¼ 1 nh Xn i¼1 K x À xi h (1) where n is the total number of points, h is the bandwidth that determines the amount of smoothing, K is the kernel function, x is the location of estimation, and xi is known point location. The kernel function K could have differ- ent forms, such as a Gaussian distribution, negative exponential, or a simple binary function (it is constant within the bandwidth and zero otherwise). The quadratic function we used in the analysis is given below (Silverman 1986): KðcÞ ¼ 3 π ð1 À cT cÞ2 ifcT c 1 0 otherwise (2) There are two parameters in kernel density estimation: kernel bandwidth and cell size. The kernel was 100 km and the cell size was 1 km given the size of the region. The kernel bandwidth of 100 km is a compromise between a map that is too smooth to interpret and one that is too noisy to interpret. The cell size of 1 km was used to show fine detail. As shown in Figures 4 and 5, both tweets and photos tend to cluster in major cities with high population density. For example, Seattle, Portland, San Francisco, and Los Angeles on the west coast and Boston, New York City, Baltimore, and Washington DC on the east coast are clusters of both tweets and photos. We can almost identify all major cities with significant economic, political, and social influence in the United States from these two maps. Although there are consistent patterns of tweets and photos occurring at cities with high population density, there are some differences, too. We calculated the normal- ized density difference as follows: Dd ¼ Dp max ðDpÞ À Dt max ðDtÞ (3) where Dd measures the relative difference between tweet density and photo density, Dp and Dt are photo density and tweet density at a location, respectively, and max (Dp) and max (Dt) are the maximum photo and tweet density within the study area. To account for the total amount of differ- ence between the two sources, we normalized the density value by the maximum density in each source, so the range of density for both sources is between 0 and 1. This allows us to compare density at each location as opposed to other locations. As shown in Figure 6, some locations stand out in the map of density difference as places with high photo density, such as Lake Tahoe and Yosemite National Park in California, Charleston in South Carolina, and Orlando in Florida – which are popular tourist attractions. The normalized photo density for these places is substantially higher than the normalized tweet density. On the other hand, Atlanta in Georgia, Figure 4. Tweet density within the bounding box of the contiguous United States. 66 L. Li et al. Downloadedby[BallStateUniversity]at11:2423April2013
  • 8. Cincinnati and Columbus in Ohio, and Detroit in Michigan have significantly higher normalized tweet den- sity. Furthermore, there are many tweets in the city of Denver but a considerable number of photos in the Rockies west of Denver. At a finer scale, we generated a tweet density surface in Los Angeles using a kernel of 10 km and a cell size of 100 m. As shown in Figure 7a, downtown Los Angeles and Beverly Hills have the highest tweet density and it gradu- ally decreases in the surrounding areas. The photo density surface in Los Angeles is demonstrated in Figure 7b, with three major clusters in downtown Los Angeles, Pasadena, and Santa Monica. In these two figures, density estimation does not stop at the coast and the values are not zero in the ocean; however, a spatial constraint clearly could be applied in the density calculation. Figure 5. Flickr photo density within the bounding box of the contiguous United States. Figure 6. Normalized density difference between Flickr photos and tweets. Cartography and Geographic Information Science 67 Downloadedby[BallStateUniversity]at11:2423April2013
  • 9. Figure 7. (a) Tweet density in Los Angeles County. (b) Flickr photo density in Los Angeles County. 68 L. Li et al. Downloadedby[BallStateUniversity]at11:2423April2013
  • 10. Temporal patterns of tweets and photos The density of tweets varies from place to place and also through time. The hourly number of georeferenced tweets in Los Angeles within a week is shown in Figure 8. The highest rates of tweeting occurred between 8:00 in the morning and at midnight. There are generally two tweet peaks: one around 13:00–14:00 in the afternoon and the other around 20:00–21:00 in the evening. The lowest rate of tweeting is around 4:00–5:00 in the morning when most people are sleeping. This trend is relatively consistent in each day of the week and represents the activity pattern of georeferenced tweets. A comparison of temporal patterns of tweets and photos is shown in Figure 9a and 9b. In contrast to the temporal pattern of tweets, Flickr users are substantially more active during weekends and the rate of photo-taking is highest during the afternoon hours. However, temporal uncertainty should be considered when interpreting the results. The time when a photo was taken is provided by a camera, but not all photogra- phers consistently keep the right time setting. Descriptive models of tweet and photo densities in California In this section, we infer the characteristics of georefer- enced tweet and photo users by studying the relationships between tweet and photo densities and the socioeconomic characteristics of people in different counties of California. The hypothesis is that areas with high tweet or photo density tend to have people with some specific character- istics which may be age, race, educational attainment, the type of occupation, and household income. The tweet dataset contains 602,371 tweets in California that were georeferenced by GPS, created by 44,097 users. Because the study uses socioeconomic data of local residents only, the raw data were preprocessed to exclude data that were likely to be generated by tourists. As mentioned above, a user is regarded as a local resident if he or she stays in a county for a relatively long period of time (i.e., 10 days), which is verified by the time interval between two tweets or photos generated by the same user. As a result, there are 432,475 georeferenced tweets generated by 18,315 local users, which represent about 71.80% of all georeferenced tweets. Data on distributions of age, race, educational attain- ment, occupation, and household income were obtained from the American Community Survey (ACS) 2006– 2010. These data made up the set of explanatory variables. To create spatially intensive variables, all variables were normalized by the total number of people in each county. For instance, tweet density was calculated by the number of tweets over the total population in a county. Hence, the tweet density in the model is different from the tweet density represented as a kernel density surface in the section “The spatial distribution of georeferenced tweets and photos”: It is the number of tweets per person in a Figure 8. The average number of tweets per hour in Los Angeles County. Cartography and Geographic Information Science 69 Downloadedby[BallStateUniversity]at11:2423April2013
  • 11. Figure 9. (a) Time chart for georeferenced tweets. (b) Time chart for georeferenced photos. 70 L. Li et al. Downloadedby[BallStateUniversity]at11:2423April2013
  • 12. county, rather than the number of tweets per land area unit. The explanatory variables consist of the percentage of people who fall into each of the categories (e.g., there are 23 age groups, ranging from “under 5 years” to “85 years and over,” so there are 23 variables for the percen- tage of people in all age groups and they add up to 1). Since there are many categories in each of these types of data, the number of explanatory variables is large compared to the number of observations, and some expla- natory variables are correlated with each other, thus multi- ple linear regression is not appropriate because it requires the absence of multicollinearity. Partial least squares regression (PLSR), on the other hand, is a method parti- cularly useful for describing the correlation between a dependent variable and a set of strongly collinear inde- pendent variables. It aims to reduce the set of variables to a smaller number of uncorrelated components that char- acterize most of the covariance between the dependent variable and independent variables. PLSR was introduced by Wold (1966) in the social sciences, and was later widely adopted in chemometrics (Wold, Sjöström, and Eriksson 2001). PLSR is related to principal component regression (PCR): Both extract components from original independent variables for regression modeling; however, they differ in several ways. The major difference is that principal components in PCR are solely determined by the variance of independent variables, while those in PLSR are determined by the covariance between dependent and independent variables (Garthwaite 1994). Therefore, the methods for constructing components in PCR and PLSR are different, and the latter has the capability to capture most of the information in independent variables that explains the dependent variable by avoiding the problem in PCR of discarding important principal components with a low variance (Jolliffe 1982). Fifty-eight explanatory variables in the model can be grouped into five categories: age, race, educational attain- ment, income, and occupation. Performance of PLSR on the data resulted in five components that explain most of the variance in tweet density (70.81%) and in the original 58 independent variables (82.89%). Table 1 lists the per- centages of variance in the dependent and independent variables explained by each component, and Table 2 gives a sample loading matrix for the five components obtained from the original variables (see Appendix 1 for the entire loading matrix for PLS components in the tweet density model). The loading measures the importance of each variable in accounting for the variance of a compo- nent. A high loading value means that a specific variable accounts for much variance in a component. Table 3 gives a brief description of the meaning of the five components based on the loading values. The first component accounts for 37.94% of the variation in tweet density and 28.59% of the variation in the independent variables. It is positively highly loaded on the occupation variable of management, business, science, and arts, the education variables of bachelor’s degree and graduate or professional degree, and the household income variables of $200.000 or more Table 1. The percentage of variances explained by components in the Twitter model. Component 1 2 3 4 5 Explained variance in independent variables 0.2859 0.0987 0.0352 0.3199 0.0893 Explained variance in dependent variable 0.3794 0.1200 0.1381 0.0159 0.0548 Note: Independent variables are percentages of people falling into different subcategories of age, race, educational attainment, occupation, and household income, respectively, obtained from ACS (2006–2010), and the dependent variable is tweet density. Table 2. Sample loading matrix for PLS components in the Twitter model. Component Explanatory variables 1 2 3 4 5 Bachelor’s degree 0.401932 −0.0772 0.042405 0.024567 0.009941 Graduate or professional degree 0.29582 −0.03287 0.007762 0.025411 0.000464 $150,000 to $199,999 0.174187 0.013374 −0.04549 −0.02102 −0.0099 $200,000 or more 0.245107 0.042515 −0.02049 −0.00515 −0.02389 Management, business, science, and arts occupations: 0.49972 −0.17736 0.009559 0.054147 −0.02251 Service occupations: −0.14328 −0.05453 0.092773 0.047814 −0.03232 Sales and office occupations: 0.021245 0.022759 0.069738 −0.03638 −0.02886 Natural resources, construction, and maintenance occupations: −0.25131 0.09103 −0.11676 −0.03511 0.029064 Production, transportation, and material moving occupations: −0.12638 0.118096 −0.05531 −0.03048 0.054624 Cartography and Geographic Information Science 71 Downloadedby[BallStateUniversity]at11:2423April2013
  • 13. and $150,000 to $199,999. We may broadly call it a well- educated people component. The second component explains 12% of the variation in tweet density and 9.87% of the variance in the independent variables. It has high positive loadings on low level of education (i.e., less than 9th grade and 9th to 12th, no diploma) and occupations in transportation and material moving. This is a component for less-educated people. The third component represents other race people and accounts for 13.81% of the dependent variable but only 3.52% of the independent variables. The last two components both have low explanatory powers for tweet density and are not considered important in the model. Interestingly, there is no obvious difference between male and female in the behavior of generating georeferenced tweets, so sex was not included in the final model. In simple correlations, tweet density is also highly correlated with the percentage of people between the ages of 25 and 44 years, but age is correlated with income in this dataset, so variables of age do not show up as highly loaded predictors on the components. The scores on each component may be mapped, as demonstrated in Figure 10 for the first component. There are five shades of color classified by natural breaks from the darkest for the highest positive scores (the maximum: 0.36) to the lightest for the negative scores (the minimum: –0.16). The San Francisco Bay area is described by high positive scores, shown as the darkest area in the map. The first component characterizes the percentage of people with high education and salary, and associates this combi- nation of characteristics with a high rate of tweeting. Take San Francisco and Santa Clara Counties as an example. These are places where many people work in high-tech jobs with an advanced degree and where tweet density is high. In contrast, northern and central California has a dominance of negative scores, suggesting that the percen- tage of well-educated people and tweet density are low in these areas. Component scores –0.159129–0.122556 –0.122555–0.074462 –0.074461–0.013799 –0.013798–0.184898 0.184899–0.355314 0 100 200 400 km Figure 10. First component scores for tweet density: linear combinations of the independent variables. Table 3. Description of the PLS components in the Twitter model. Component Description 1 Well-educated people 2 Less-educated people 3 Other race people 4 White people 5 Asian people 72 L. Li et al. Downloadedby[BallStateUniversity]at11:2423April2013
  • 14. The same procedure was applied to photo density. A total of 752,176 georeferenced photos created by 19,594 users in California were collected from Flickr. Similarly, only photos contributed by local residents were retained for further analysis, resulting in 440,026 georeferenced photos created by 7216 local users. Five components constructed by PLSR capture 47.34% of the variation in photo density and 81.49% of the variance in the original independent variables (see Table 4). The entire loading matrix for PLS components in the photo density model is provided in Appendix 2. The explanatory power of this model is not as high as the tweet density model for several reasons. Although the total number of photos is about the same as that of tweets, the number of unique photo con- tributors (7216) is smaller than that of tweet creators (18,315); therefore, photos were contributed by a much smaller number of users compared to the tweet dataset. In addition, the uncertainty of time when a photo is taken may be present in Flickr photos, leading to judgmental errors when time interval was used to infer whether a user is a local resident or a tourist. The first component explains 10.97% of the variance in the dependent variable and 33.16% in the independent variables. The second component captures only 7.33% of variation in the depen- dent variable and 21.35% in the independent variables. This contrast demonstrates the use of the covariance between dependent and independent variables to construct components in PLSR, rather than the use of only variance of independent variables in PCA. The explanations of the five components are listed in Table 5. The first component is highly loaded on occupations of management, business, science, and arts, bachelor’s degree, and graduate or pro- fessional degree, and generally describes the percentage of well-educated white people. The second component is positively highly loaded on Asian people with bachelor’s degree in the occupation of management, business, science, and arts and is interpreted as well-educated Asian people. The third component accounts for 9.52% of variance in the dependent variable and 17.29% of variance in the independent variables. It has positive high loadings on white people, high school graduate, General Educational Development (GED), or alternative, and service occupations, which represents moderately edu- cated white people. The last two components explain 10.87% and 8.67% of the dependent variable, but their explanation powers are very low for independent vari- ables, so they are not regarded as significant in the model. Similar to the model of tweet density, gender does not seem to make a difference in the interpretation of photo density. A straightforward interpretation of the models would be the relationship between tweet and photo densities and the demographic and socioeconomic characteristics of people in these places. As the raw data were preprocessed to retain only tweets and photos generated by local resi- dents, socioeconomic properties of people who contribute to these data may be inferred from this relationship, such as race, education, occupation, and income. A distinction between time intervals of tweets and photos indicate that 71.80% of georeferenced tweets are generated by local people, while only 58.80% of georeferenced photos are uploaded by local residents. Therefore, the tweet density model may be more accurate in terms of inference about properties of Twitter users from their spatial footprints. Although at a coarse scale, these two models provide a big picture of the properties of local people who contribute georeferenced tweets and photos, and offer an exploratory analysis on the representativeness of a subset of Twitter and Flickr users. Conclusion The growing popularity of social networking and social media services has attracted researchers from various dis- ciplines, and this new form of geographic data has been used in a variety of applications. However, many ques- tions must still be answered in order to use these data more appropriately. For example, who uses these services? Why do people use them? How can we take advantage of Table 4. percentage of variances explained by components in the Flickr model. Component 1 2 3 4 5 Explained variance in independent variables 0.3316 0.2135 0.1729 0.0640 0.0359 Explained variance in dependent variable 0.1097 0.0733 0.0952 0.1087 0.0867 Note: Independent variables are percentages of people falling into different subcategories of age, race, educational attainment, occupation, and household income, respectively, obtained from ACS (2006–2010), and the dependent variable is photo density. Table 5. Description of the PLS components in the Flickr model. Component Description 1 Well-educated white people 2 Well-educated Asian people 3 Moderately educated white people 4 Less-educated people 5 Other race people Cartography and Geographic Information Science 73 Downloadedby[BallStateUniversity]at11:2423April2013
  • 15. this new source of information that may be potentially used for any possible topic but with uncertainty and bias? Understanding the spatial and temporal distribution of georeferenced data would provide insight into these questions. This article visualizes the spatial and temporal patterns of georeferenced tweets and Flickr photos col- lected within the contiguous United States. The tweets collected within only a few weeks delineate the adminis- trative boundaries of the United States and the major roads at a very good resolution, especially in areas with high population density. Flickr photos have similar spatial pat- terns, although the total number of photos taken during the same period of time is substantially smaller than that of tweets. However, some places have considerably higher normalized photo density than tweet density – a character- istic of tourist attractions, such as Yosemite National Park. The temporal patterns of tweets are relatively consistent each day of the week, with two major peaks around 13:00–14:00 and 20:00–21:00 hours, but there are sub- stantially more photos taken over weekends. Two descriptive models using PLSR were con- structed to explain the variation of tweet and photo densities from place to place in California, using demo- graphic and socioeconomic variables of people in each county. According to the first model, tweet density is highly dependent on the percentage of well-educated people with an advanced degree and a good salary who work in the areas of management, business, science, and arts. The second model suggests that high photo density is correlated with a high percentage of white and Asian people with an advanced degree in the areas of manage- ment, business, science, and arts. This study would be informative to sociologists who study the behaviors of social media users, geographers who are interested in the spatial and temporal distribution of social media users, marketing agencies who intend to understand the influ- ence of social media, as well as other scientists who use social media data in their research. This research provides an exploratory analysis of the characteristics of the contributors of georeferenced data, so we may be aware of the representativeness of such specific groups of people in the total population when using the data. Two major sources of bias may be reduced in the future: the bias caused by people’s move- ment and the bias due to ecological correlation. Finally, further research from the perspectives of psychology and sociology is required to explain why people with some specific social and demographic properties are more involved in creating georeferenced tweets and photos. Note 1. http://code.flickr.com/blog/2009/02/04/100000000-geo- tagged-photos-plus/ Acknowledgments The research was supported by the US National Science Foundation, award 0849910, and by the U.S. Army Research Office, award W911NF-09-1-0302. References Alampay, E. 2006. “Analysing Socio-Demographic Differences in the Access and Use of ICTs in the Philippines Using the Capability Approach.” The Electronic Journal of Information Systems in Developing Countries 27 (5): 1–39. Ames, M., and M. Naaman. 2007. “Why We Tag: Motivations for Annotation in Mobile and Online Media.” In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, April 28–May 3, San Jose, CA, 971–980. New York: ACM. Antoniou, B., J. Morley, and M. Haklay. 2010. “Web 2.0 Geotagged Photos: Assessing the Spatial Dimension of the Phenomenon.” Geomatica 64 (1): 99–110. Bailey, T. C., and A. C. Gatrell. 1995. Interactive Spatial Data Analysis. London: Longman. Bollen, J., H. Mao, and X. Zeng. 2011. “Twitter Mood Predicts the Stock Market.” Journal of Computational Science 2 (1): 1–8. Cheng, Z., J. Caverlee, K. Lee, and D. Z. Sui. 2011. “Exploring Millions of Footprints in Location Sharing Services.” In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (ICWSM), July 2011, Barcelona, 81–88. Palo Alto, CA: AAAI press. Crandall, D. J., L. Backstrom, D. Cosley, S. Suri, D. Huttenlocher, and J. Kleinberg. 2010. “Inferring Social Ties from Geographic Coincidences.” Proceedings of the National Academy of Sciences 107 (52): 22436–22441. Crandall, D. J., L. Backstrom, D. Huttenlocher,and J. Kleinberg. 2009. “Mapping the World’s Photos.” In Proceedings of the 18th International Conference on World wide web, April 20– 24, Madrid. New York: ACM. Garthwaite, P. H. 1994. “An Interpretation of Partial Least Squares.” Journal of the American Statistical Association 89: 122–127. Goodchild, M. F., and J. A. Glennon. 2010. “Crowdsourcing Geographic Information for Disaster Response: A Research Frontier.” International Journal of Digital Earth 3 (3): 231–241. Haklay, M. 2010. “How Good Is Volunteered Geographical Information? A Comparative Study of OpenStreetMap and Ordnance Survey Datasets.” Environment and Planning B, Planning Design 37 (4): 682–703. Hollenstein, L., and R. Purves. 2010. “Exploring Place Through User-Generated Content: Using Flickr to Describe City Cores.” Journal of Spatial Information Science 1 (1): 21–48. Huberman, B., D. Romero, and F. Wu. 2008. Social Networks that Matter: Twitter under the Microscope. Accessed March 6, 2013. http://ssrn.com/abstract=1313405. Java, A., X. Song, T. Finin, and B. Tseng. 2007. “Why We Twitter: Understanding Microblogging Usage and Communities.” In Proceedings of the 9th WebKDD and 1st SNA-KDD Workshop on Web Mining and Social Network Analysis, August 12, San Jose, CA, 56–65. New York: ACM. Jolliffe, I. T. 1982. “A Note on the Use of Principal Components in Regression.” Applied Statistics 31: 300–303. 74 L. Li et al. Downloadedby[BallStateUniversity]at11:2423April2013
  • 16. King, G. 1997. A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data. Princeton, NJ: Princeton University Press. Lee, R., and K. Sumiya. 2010. “Measuring Geographical Regularities of Crowd Behaviors for Twitter-Based Geo- Social Event Detection.” In Proceedings of the 2nd ACMSIGSPATIAL International Workshop on Location Based Social Networks (LBSN2010), 1–10. New York: ACM. Lerman, K., and R. Ghosh. 2010. “Information Contagion: An Empirical Study of the Spread of News on Digg and Twitter Social Networks.” In Proceedings of 4th International Conference on Weblogs and Social Media (ICWSM), Washington, DC, May 23–26, Menlo Park, CA: AAAI Press. Li, L., and M. F. Goodchild. 2012. “Constructing Places from Spatial Footprints.” In Proceedings of the 1st ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information, edited by M. F. Goodchild, D. Pfoser, and D. Sui, November 6, Redondo Beach, CA. New York: ACM. Li, L., and M. F. Goodchild. 2013. “Spatio-Temporal Footprints in Social Networks.” Encyclopedia of Social Networks and Mining, edited by R. S. Alhajj, and J. G. Rokne, Springer. Nielsen, J. 2006. “Participation Inequality: Encouraging More Users to Contribute.” Jakob Nielsen’s Alertbox 9: 2006. Openshaw, S. 1984. “Ecological Fallacies and the Analysis of Areal Census Data.” Environment and Planning A 16: 17–31. Piantadosi, S., D. P. Byar, and S. B. Green. 1988. “The Ecological Fallacy.” American Journal of Epidemiology 127: 893–904. Purves, R., A. Edwardes, and J. Wood. 2011. “Describing Place through User Generated Content.” First Monday 16: 9–5. Robinson, W. S. 1950. “Ecological Correlations and the Behavior of Individuals.” American Sociological Review 15 (3): 351–357. Sakaki, T., M. Okazaki, and Y. Matsuo. 2010. “Earthquake Shakes Twitter Users: Real-Time Event Detection by Social Sensors.” In Proceedings of the 19th International Conference on World wide web, April 2010, Raleigh, NC, 851–860. New York: ACM. Silverman, B. W. 1986. Density Estimation for Statistics and Data Analysis. London: Chapman and Hall. Soule, L. C., L. W. Shell, and B. A. Kleen. 2003. “Exploring Internet Addiction: Demographic Characteristics and Stereotypes of Heavy Internet Users.” Journal of Computer Information Systems 44 (1): 64–73. Taylor, W. J., G. X. Zhu, J. Dekkers, and S. Marshall. 2003. “Socio-Economic Factors Affecting Home Internet Usage Patterns in Central Queensland.” Informing Science 6: 233–246. Tumasjan, A., T. O. Sprenger, P. G. Sandner, and I. M. Welpe. 2010. “Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment.” Fourth International AAAI Conference on Weblogs and Social Media, May 23–26, Washington, DC. Wold, H. 1966. “Estimation of Principal Components and Related Models by Iterative Least Squares.” In Multivariate Analysis, edited by P. R. Krishnaiaah, 391–420. New York: Academic Press. Wold, S., M. Sjöström, and L. Eriksson. 2001. “PLS-Regression: A Basic Tool of Chemometrics.” Chemometrics and Intelligent Laboratory Systems 58: 109–130. Zandbergen, P. A. 2009. “Accuracy of iPhone Locations: A Comparison of Assisted GPS, WiFI and Cellular Positioning.” Transactions in GIS 13 (s1): 5–25. Cartography and Geographic Information Science 75 Downloadedby[BallStateUniversity]at11:2423April2013
  • 17. Appendix 1. Loading matrix for PLS components in the tweet density model Component Explanatory variables 1 2 3 4 5 Under 5 years −0.01474 0.077405 −0.04545 −0.01982 −0.01143 5–9 years −0.02025 0.057787 −0.04379 −0.01927 −0.01 10–14 years −0.03172 0.058438 −0.02302 −0.02861 −0.01334 15–17 years −0.02153 0.022426 −0.02407 −0.01397 −0.00691 18 and 19 years 0.00196 0.016601 −0.01363 −0.00271 −0.00794 20 years 0.000168 0.006627 −0.00405 0.002784 −0.00437 21 years 0.005339 0.013494 −0.00895 −0.00116 −0.00942 22–24 years 0.008545 0.042466 0.018536 0.001079 −0.01888 25–29 years 0.022469 0.091848 0.036858 0.005309 0.002426 30–34 years 0.026894 0.080128 0.02624 0.005096 −0.00324 35–39 years 0.0304 0.042774 0.018096 0.00483 0.00986 40–44 years 0.039194 0.004091 −0.02617 −0.00161 −0.00537 45–49 years 0.017316 −0.03679 −0.00857 0.002174 0.001503 50–54 years 0.00373 −0.0826 −0.00635 0.011897 0.008826 55–59 years −0.01004 −0.10516 0.011584 0.019274 0.027798 60 and 61 years −0.00336 −0.03628 0.013421 0.002809 0.002329 62–64 years −0.01243 −0.05618 0.011172 0.010629 0.009505 65 and 66 years −0.00818 −0.03081 0.005977 0.003103 0.008006 67–69 years −0.01376 −0.04628 0.009472 0.004551 0.003888 70–74 years −0.01464 −0.04668 0.014862 0.005115 0.006863 75–79 years −0.00534 −0.03874 0.011525 0.009119 0.010758 80–84 years −0.00167 −0.01497 0.012574 −0.0014 −0.00086 85 years and over 0.001647 −0.01959 0.013749 0.00079 7.76E-06 White alone −0.04727 0.041288 −0.10895 0.855233 −0.33653 Black or African American alone 0.019714 0.001025 0.043088 −0.15585 0.061714 American Indian and Alaska Native alone 0.039311 −0.03672 −0.049 0.040688 0.018066 Asian alone 0.02731 0.032401 −0.03394 −0.3589 0.364497 Native Hawaiian and Other Pacific Islander alone 0.003158 −1.33E-05 0.000767 −0.0057 0.00913 Some other race alone −0.04648 −0.03246 0.129032 −0.3555 −0.13824 Two or more races: 0.004259 −0.00552 0.019004 −0.01998 0.021366 Less than 9th grade −0.08597 0.331632 −0.10665 −0.02253 −0.00902 9th–12th grade, no diploma −0.16101 0.136008 0.007883 0.003644 −0.01523 High school graduate, GED, or alternative −0.302 −0.11136 0.021482 0.009998 0.058959 Some college, no degree −0.13348 −0.18998 0.02426 −0.03015 −0.04115 Associate’s degree −0.01529 −0.05624 0.002854 −0.01094 −0.00396 Bachelor’s degree 0.401932 −0.0772 0.042405 0.024567 0.009941 Graduate or professional degree 0.29582 −0.03287 0.007762 0.025411 0.000464 Less than $10,000 −0.05965 0.007578 0.041237 0.033263 0.014456 $10,000–$14,999 −0.11884 −0.0168 0.083771 0.034953 0.007335 $15,000–$19,999 −0.0885 −0.00257 0.045553 0.019674 −0.00656 $20,000–$24,999 −0.08984 0.020739 0.051756 0.005561 −0.03398 $25,000–$29,999 −0.06982 −0.00308 −0.00224 0.006647 0.015886 $30,000–$34,999 −0.05892 −0.01624 0.004458 −0.00272 −0.01196 $35,000–$39,999 −0.06398 −0.00129 0.025394 −0.00427 0.006125 $40,000–$44,999 −0.03699 −0.02119 0.001722 −0.00392 0.000198 $45,000–$49,999 −0.03612 −0.01511 −0.01185 0.011192 0.017964 $50,000–$59,999 −0.04508 0.019009 −0.0025 −0.01275 −0.00141 $60,000–$74,999 −0.01536 −0.03518 −0.03475 −0.0045 0.019347 $75,000–$99,999 0.047511 −0.00607 −0.04302 −0.01023 0.023652 $100,000–$124,999 0.112712 0.001461 −0.05712 −0.02623 −0.01345 $125,000–$149,999 0.10356 0.012866 −0.03643 −0.02051 −0.00383 $150,000–$199,999 0.174187 0.013374 −0.04549 −0.02102 −0.0099 $200,000 or more 0.245107 0.042515 −0.02049 −0.00515 −0.02389 Management, business, science, and arts occupations: 0.49972 −0.17736 0.009559 0.054147 −0.02251 Service occupations: −0.14328 −0.05453 0.092773 0.047814 −0.03232 Sales and office occupations: 0.021245 0.022759 0.069738 −0.03638 −0.02886 Natural resources, construction, and maintenance occupations: −0.25131 0.09103 −0.11676 −0.03511 0.029064 Production, transportation, and material moving occupations: −0.12638 0.118096 −0.05531 −0.03048 0.054624 76 L. Li et al. Downloadedby[BallStateUniversity]at11:2423April2013
  • 18. Appendix 2. Loading matrix for PLS components in the photo density model. Component Explanatory variables 1 2 3 4 5 Under 5 years −0.03985 −0.06968 −0.04157 0.035035 0.009508 5–9 years −0.03863 −0.06033 −0.03379 0.018745 0.005705 10–14 years −0.04948 −0.04034 −0.00729 0.030802 −0.01749 15–17 years −0.02739 −0.0298 −0.0105 0.001409 0.003521 18 and 19 years −0.00368 −0.02068 −0.01886 −0.00211 −0.01066 20 years 0.001142 −0.00986 −0.00504 −0.00208 −0.00873 21 years 0.001123 −0.01489 −0.01485 0.000999 −0.00771 22–24 years 0.001153 −0.01669 −0.00644 0.024757 −0.02822 25–29 years −0.00304 −0.02076 −0.00858 0.078787 −0.03802 30–34 years 0.00358 −0.02021 −0.01408 0.070264 −0.02725 35–39 years 0.010636 0.000369 −0.01345 0.047709 −0.01351 40–44 years 0.022184 −0.00871 −0.04678 0.000559 0.014569 45–49 years 0.020005 0.019069 −0.00946 −0.02387 0.013253 50–54 years 0.027076 0.038446 0.014259 −0.05507 0.028873 55–59 years 0.024689 0.058246 0.040607 −0.06575 0.022882 60 and 61 years 0.008444 0.027431 0.020304 −0.01986 −0.00012 62–64 years 0.011037 0.032313 0.032863 −0.03226 0.014433 65 and 66 years 0.002372 0.017243 0.015113 −0.01997 0.009119 67–69 years 0.004825 0.025543 0.026478 −0.02984 0.011136 70–74 years 0.004335 0.029439 0.030894 −0.02749 0.012094 75–79 years 0.009549 0.023947 0.020369 −0.02343 0.006073 80–84 years 0.002783 0.018798 0.015552 −0.00183 −2.78E-05 85 years and over 0.007138 0.02111 0.01425 −0.00551 5.71E-04 White alone 0.64126 −0.54537 0.384937 −0.04463 −0.05466 Black or African American alone −0.11371 0.098182 −0.08834 −0.0114 −0.04436 American Indian and Alaska Native alone 0.051597 −0.02843 −0.05185 −0.03109 0.121559 Asian alone −0.33726 0.269738 −0.17645 0.099653 −0.19034 Native Hawaiian and Other Pacific Islander alone −0.00434 7.60E-03 −0.00337 0.003228 −0.00463 Some other race alone −0.22297 0.165412 −0.06804 −0.02706 0.172535 Two or more races: −0.01458 0.032869 0.003109 0.011299 −0.0001 Less than 9th grade −0.14529 −0.25007 −0.05677 0.209129 −0.00659 9th–12th grade, no diploma −0.12839 −0.14642 0.06076 0.02307 −0.03606 High school graduate, GED, or alternative −0.17274 −0.03592 0.212701 −0.14922 0.089943 Some college, no degree −0.05122 0.058598 0.100471 −0.2111 −0.0332 Associate’s degree −0.00306 0.032506 0.01773 −0.05174 −0.03535 Bachelor’s degree 0.288224 0.213345 −0.18656 0.095178 0.006815 Graduate or professional degree 0.212482 0.127951 −0.14833 0.084679 0.014442 Less than $10,000 −0.02021 −0.02446 0.057107 −0.01381 0.001242 $10,000–$14,999 −0.04735 −0.01043 0.123375 −0.03851 −0.00763 $15,000–$19,999 −0.0397 −0.02443 0.07639 −0.03061 −0.00025 $20,000–$24,999 −0.04602 −0.02101 0.090227 0.001759 −0.01519 $25,000–$29,999 −0.04342 −0.0292 0.038342 −0.02533 0.014401 $30,000–$34,999 −0.0333 −0.0215 0.028601 −0.04265 0.013107 $35,000–$39,999 −0.0432 −0.00371 0.053541 −0.00873 0.003684 $40,000–$44,999 −0.02043 0.00364 0.030614 −0.01959 0.008193 $45,000–$49,999 −0.01826 −0.02243 0.008614 −0.03616 0.011423 $50,000–$59,999 −0.04024 −0.01621 0.026354 0.008339 −0.00548 $60,000–$74,999 −0.01251 −0.00814 −0.01824 −0.0494 0.013102 $75,000–$99,999 0.017281 0.015004 −0.0481 0.006423 −0.03166 $100,000–$124,999 0.056167 0.027812 −0.09461 0.03855 0.021724 $125,000–$149,999 0.048939 0.026345 −0.0844 0.03963 −0.00837 $150,000–$199,999 0.094066 0.044338 −0.13391 0.058537 0.004909 $200,000 or more 0.148192 0.064382 −0.15391 0.111557 −0.02321 Management, business, science, and arts occupations: 0.393605 0.217233 −0.29509 −0.0328 0.032189 Service occupations: −0.03144 0.015353 0.190332 −0.04594 −0.02918 Sales and office occupations: −0.00541 0.056445 0.020075 0.03938 −0.06822 Natural resources, construction, and maintenance occupations: −0.21088 −0.16301 0.090635 0.009872 0.037683 Production, transportation, and material moving occupations: −0.14587 −0.12602 −0.00595 0.029492 0.027525 Cartography and Geographic Information Science 77 Downloadedby[BallStateUniversity]at11:2423April2013