SlideShare a Scribd company logo
1 of 26
American Clusters Geodemographic
          Classification
           METHODOLOGY
Geodemographics
Birds of a feather flock together (English proverb).
Geodemographics is the analysis of people by where they
live (Sleight, 1996).
Geodemographic classification categorizes neighborhoods
based on their socio-economic and lifestyle characteristics.
Applications of Geodemographic
         Classifications
                        Commercial                                                                        Non-for-Profit
          Market Research                                                                                                 Public Sector
          Site Selection                                                                                                   Health Care
          Trade Area                                Commercial – People
                                                    who live in an area
                                                                                     Non-for-profit – People
                                                                                     who live in an area                     Education
                                                    where there is...                where there is...
          Analysis
                                                    •A high probability of buying a •Low grades in secondary           Local Authority
          Direct Marketing                           particular type of newspaper.    school exams.

          Advertising                               •A high application rate for     •High risk of developing                  Policing
                                                     consolidation loans.             diabetes.

          Management                                •A high consumption of           •High fear of crime, but low            Academic
                                                     fashion goods.                   crime levels
          Media Analysis                                                                                            Poverty Prevention
          Elections                                                                                                           Charities
Source: adapted from:
                         •Bolton Council Planning
•Harris et al. 2005      Research
•OAC
Open Geodemographics
Term “Open Geodemographics” was first used and applied by Dr. Dan Vickers,
the author of “Multi-level Integrated Classification Based on the 2001
Census”. In this paper Dr. Vickers describes methods and techniques of
building the National Classification of Census Output Areas (UK).

Dr. Vickers has uploaded methodology and results of his work online for
review and free download.

To find out more about Open Geodemographics please watch this
presentation.
Project Objectives


Are to create open and free geodemographic classification of
USA using Census 2000 results following the methodology
developed by Dr. Dan Vickers.
Project Methodology
Selection of cluster objects                                                                                                                           Variables
 (operational taxonomic                                                    Variables selection                                                      standardization
           units)




                                                                                                                                                     Interpretation,
Clustering method selection                                               Identification of cluster
                                                                                                                                                testing and mapping of
                                                                                  number
                                                                                                                                                         clusters

  The classification building process was largely based on the methodology elaborated by Daniel Vickers in “Multi-level Integrated Classification Based on the 2001 Census (2006)”. Other
  methodologies were also considered such as the methodology of building MOSAIC described in “Geodemographics, GIS and Neighborhood Targeting” R. Harris, P. Sleight, R. Webber, Wiley
  (2005).
Classification Inputs

 208,000 Census block groups of all 50 states
 281,000,000 Overall population
 106,000,000 Households
Data and Variables
   Data US Census 2000 - the major source of data for the classification

   Variables Some researchers such as Harris and Webber suggest the inclusion
   of as many variables as possible to create “more meaningful clusters” (Harris et
   al, 2005) .

   However another opinion exists that the minimum number of variables should
   be used for analysis in order to prevent data redundancy and collinearity
   (researchers Everitt and Vickers)(Vickers, 2006).

   In our classification we will try to use only “necessary” variables to avoid data
   redundancy and collinearity.
Variables Selection
      List of initial 77 variables
1.    Male                                 28. Married                                53. Percent of monthly cost (without
2.    Female                               29. Divorced                                   mortgage) exceeds 25
3.    Urban                                30. Widowed                                54. High School Degree
4.    Rural                                31. Never Married                          55. Higher Education Attainment
5.    White                                32. Occupied by renters                    56. No car households
6.    Black                                33. Median Rent                            57. 2+ car households
7.    Native                               34. Avg. house size                        58. Work at home
8.    Asian                                35. One bedroom                            59. Car to work
9.    Hawaiian                             36. Four bedrooms                          60. Carpool to work
10.   Some other race                      37. Gas                                    61. Public Transport to Work
11.   Two or more races                    38. Bottled Gas Kerosene                   62. Bicycle or Walk to Work
12.   Aged 0-4                             39. Wood                                   63. Long commuters (60+min to work)
13.   Aged 5-14                            40. No fuel                                64. Short Commuters (less 15 min to work)
14.   Aged 15-24                           41. Median year house build                65. To work from 12 to 5am
15.   Aged 25-44                           42. Median house value                     66. To work from 8 to 10 am
16.   Aged 45-64                           43. Occupied more 1.5 person per room      67. To work from 4 pm to 11:59 pm
17.   Aged 65+                             44. Occupied less 0.5 person per room      68. Standardised Disability Ratio
18.   Foreign born                         45. lacking complete plumbing facilities   69. Unemployed
19.   Not a citizen                        46. lacking complete kitchen facilities    70. Working part-time
20.   Spanish language households          47. Households without a mortgage          71. People living below poverty line
21.   Other Eurp languages households      48. Second mortgage or home equity         72. Retirement Income
22.   Asian language households            49. Owner cost with mortgage               73. Public Assistance Income
23.   One person households (under 65)     50. Owner cost without mortgage            74. Supplemental Security Income
24.   One person households (over 65)      51. Percent of monthly cost (with          75. Social Security Income for HH
25.   One parent households                    mortgage) exceeds 40                   76. Interest, Dividends or Net rental income
26.   Family households with no children   52. Percent of monthly cost (with              for HH
27.   Non family households                    mortgage ) does not exceed 10          77. Median HH income



               some of them had to be removed...
Variables Selection
   The initial list of variables was reviewed and reduced by applying three analytical
   tools:

• Principal Component Analysis (PCA)/Factor analysis: variables with high factor
     loadings were selected

• Correlation Matrix: pairs of variables strongly correlated with each other were
     examined and only one within each pair was left for analysis

• Standard Deviation Evaluation: variables with low SD were not included
     because they vary little between block groups and do not bring value to the
     clustering process.
Variables Selection
List of variables was reduced from 77 to 47 variables
   1.Urban population
   2.Black                                 26.Second mortgage or home equity
   3.Asian                                 27.Monthly cost (with mortgage ) less 10%
   4.Aged under 5                          28.Monthly cost (with mortgage ) more 40%
   5.Aged 5-14                             29.Higher education
   6.Aged 15-24                            30.Two cars households
   7.Aged 25-44                            31.Car to work
   8.Aged 45-64                            32.Public transport to work
   9.Aged 65+                              33.Bicycle or walk to work
   10.One person households (under 65)     34.Long commuters (60+min to work)
   11.One person households (over 65)      35.To work from 8 to 10 am
   12.One parent households                36.Standardized Disability Ratio
   13.Households with no children          37.Unemployed
   14.Foreign born population              38.Working part-time
   15.Spanish language households          39.People living below poverty line
   16.Other Eurp languages households      40.Retirement income
   17.Never married                        41.Public assistance income
   18.Occupied by renters                  42.Social security income
   19.Median rent                          43.Interest, dividends or net rental income
   20.One bedroom                          44.Median household income
   21.Four bedrooms                        45.Government workers
   22.Bottled gas, oil, kerosene as fuel   46.Self-employed
   23.Median house value                   47.Agriculture, forestry, fishing and hunting and mining
   24.Occupied 1.5 person per room
   25.Households Without a mortgage
Variable Transformation
 Variable transformation and standardization allowed for the inclusion of
 different types of data (e.g. percent of black population, medium no. of
 rooms, medium household income)

Log transformation

 By transforming the data to a log (logarithmic) scale the problem of
 very high value outliers was greatly reduced as the difference between
 values at the extremities of the data set was reduced by more than those
 more typical average values. (Methods for Area classification for output
 areas, National Statistics, UK)
 http://www.statistics.gov.uk/about/methodology_by_theme/area_classification/oa/methodology.asp
Variables Transformation
Z-score standardization

 Z-score standardization is the most popular method of data standardization,
 but in our project it showed poor results because highly skewed variables
 were given too much weight. This could result in forming of wrong clusters
 in the clustering process.

Range standardization
 To reduce the effect of highly skewed variables the method of range
 standardization was chosen. This method allows for the data to be
 standardized in the range between 0 (minimum value) and 1 (maximum
 value).
Clustering Process
K-means

To create the classification k-means method (performed in SPSS) was
chosen as it works well on large datasets (Vickers, 2006)
The major issue with k-means is that the number of clusters has to be
specified beforehand
To determine the number of clusters 2 tests were performed:
 Test 1 Evaluation of an average distance to the cluster center
 Test 2 Cluster size range assessment
Test1. Evaluation of average distance to
cluster center




It was suggested that the most useful number of clusters for the classification would be around 6 (Vickers, 2006).
So the target number of k-means clusters would be found between 4 and 8. We needed to find the solution with
the most significant increase in average distance from the cluster center. The most significant increase between
two consecutive solutions was found at the point of 6 cluster solution.
Test 2. Cluster size range




Another important factor which was considered was the size of clusters: the more homogenous are clusters in
terms of number of members the better.
Mean range between optimal cluster size (all clusters are equal in number of members) and actual sizes of
clusters for a given solution – the lower range the better.
Number of Clusters

7 cluster solution was chosen
4 cluster solution worked good in both tests, but was
outperformed by 5 and 7cluster solution which worked better
in the second test. 6 cluster solutions showed poor results in
the second test while 8 cluster solution didn’t pass the first
test well.
Group of Clusters
These 7 clusters form a highest level of classification
hierarchy. They represent the Group of Clusters.
To build the second level each resulted cluster was
split into number of smaller clusters by using the
same methods we applied before.
As the result we’ve got 18 distinguished clusters for
the American Clusters Classification.
Clusters Hierarchy

                     1st level of
                     classification
                     hierarchy
                     (Group of Clusters)
                     2nd level of
                     classification
                     hierarchy
                     18 clusters
Analyzing Clusters
Now 18 identified
clusters are ready
to be analyzed
All mean values of
clusters were
compared with the
dataset mean
values.
Describing CLusters
Cluster 5.1- Upscale Couples
                                         “Significant share of
                                         these group members is
                                         self-employed’
                        Based on the comparison clusters
                        were analyzed and described
                                                                                                  “Their incomes are two
                        Then clusters were mapped and                                             times higher than the US
                                                                                                  average”
                        named
 “Upscale Couples” is a cluster of rich people with large
 share of men and women between 45 and 65 who live in
 suburban areas within the vicinity of large US cities.
                                                 “Those who work prefer to use personal
                                                 vehicle to get to work, and majority of them
                                                 leave home between 8-10am.” Image Sources:
                                                                                  http://www.flickr.com/photos/loungerie/3029049309/
                                                                                                    http://www.flickr.com/photos/whartz/1066907283/
Mapping Clusters
                   Mapping clusters
                   using free tools
                   from Google
Final Clusters
Clusters Description
                   To get cluster description just click on one of the links below
    Depressed Blocks                  Satellite City Young Families

Low Income Families                   Small Town Communities

Established Suburbs                   Upscale Couples

Settled Achievers                     Prosperous B Boomers

Suburban Middles                      Successful Families

Rural Despair                         Country Life

Rural Communities                     Farmers' Land

Unfortunate Countryside               Multicultural Communities

Retired Citizens                      Hispanic Families
References
   Callingham, M. (2005), From areal classification to geodemographics, paper presented at the
   Demographic User Group Conference, Royal Society, London 10th November 2005.
   Debenham, J. E. (2002), Understanding Geodemographic Classification: Creating The Building Blocks For
   An Extension, Working Paper 02/1 School of Geography, University of Leeds [online] http://
   www.geog.leeds.ac.uk/wpapers/02-1.pdf
   Harris, R., Sleight, P. and Webber, R. (2005), Geodemographics, GIS and Neighbourhood Targeting,
   London, Wiley
   Longley, P. A. (2005), Geographical Information Systems: a renaissance of geodemographics for public
   service delivery, Progress in Human Geography, 29(1)
   Sleight, P. (2004) Targeting customers: How to Use Geodemographic and Lifestyle Data in Your Business,
   Henley-on –Thames, World Advertising Research Centre
   Vickers, D. (2006) , Multi-level Integrated Classification Based on the 2001 Census, The University of
   Leeds
   Webber, R. and Farr, M. (2001) , MOSAIC-From an area classification system to household classification,
   Journal of Targeting, Measurement and Analysis for Marketing,10(1).
To find more please visit
 World Clusters.org

More Related Content

Recently uploaded

TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024Adnet Communications
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMVoces Mineras
 
20200128 Ethical by Design - Whitepaper.pdf
20200128 Ethical by Design - Whitepaper.pdf20200128 Ethical by Design - Whitepaper.pdf
20200128 Ethical by Design - Whitepaper.pdfChris Skinner
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdfShaun Heinrichs
 
Cyber Security Training in Office Environment
Cyber Security Training in Office EnvironmentCyber Security Training in Office Environment
Cyber Security Training in Office Environmentelijahj01012
 
Welding Electrode Making Machine By Deccan Dynamics
Welding Electrode Making Machine By Deccan DynamicsWelding Electrode Making Machine By Deccan Dynamics
Welding Electrode Making Machine By Deccan DynamicsIndiaMART InterMESH Limited
 
Guide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDFGuide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDFChandresh Chudasama
 
Entrepreneurship lessons in Philippines
Entrepreneurship lessons in  PhilippinesEntrepreneurship lessons in  Philippines
Entrepreneurship lessons in PhilippinesDavidSamuel525586
 
Effective Strategies for Maximizing Your Profit When Selling Gold Jewelry
Effective Strategies for Maximizing Your Profit When Selling Gold JewelryEffective Strategies for Maximizing Your Profit When Selling Gold Jewelry
Effective Strategies for Maximizing Your Profit When Selling Gold JewelryWhittensFineJewelry1
 
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...ssuserf63bd7
 
Unveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic ExperiencesUnveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic ExperiencesDoe Paoro
 
Supercharge Your eCommerce Stores-acowebs
Supercharge Your eCommerce Stores-acowebsSupercharge Your eCommerce Stores-acowebs
Supercharge Your eCommerce Stores-acowebsGOKUL JS
 
20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdf
20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdf20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdf
20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdfChris Skinner
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Peter Ward
 
Darshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfDarshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfShashank Mehta
 
Technical Leaders - Working with the Management Team
Technical Leaders - Working with the Management TeamTechnical Leaders - Working with the Management Team
Technical Leaders - Working with the Management TeamArik Fletcher
 
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...Hector Del Castillo, CPM, CPMM
 

Recently uploaded (20)

WAM Corporate Presentation April 12 2024.pdf
WAM Corporate Presentation April 12 2024.pdfWAM Corporate Presentation April 12 2024.pdf
WAM Corporate Presentation April 12 2024.pdf
 
TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024
 
The Bizz Quiz-E-Summit-E-Cell-IITPatna.pptx
The Bizz Quiz-E-Summit-E-Cell-IITPatna.pptxThe Bizz Quiz-E-Summit-E-Cell-IITPatna.pptx
The Bizz Quiz-E-Summit-E-Cell-IITPatna.pptx
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQM
 
20200128 Ethical by Design - Whitepaper.pdf
20200128 Ethical by Design - Whitepaper.pdf20200128 Ethical by Design - Whitepaper.pdf
20200128 Ethical by Design - Whitepaper.pdf
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf
 
Cyber Security Training in Office Environment
Cyber Security Training in Office EnvironmentCyber Security Training in Office Environment
Cyber Security Training in Office Environment
 
Welding Electrode Making Machine By Deccan Dynamics
Welding Electrode Making Machine By Deccan DynamicsWelding Electrode Making Machine By Deccan Dynamics
Welding Electrode Making Machine By Deccan Dynamics
 
Guide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDFGuide Complete Set of Residential Architectural Drawings PDF
Guide Complete Set of Residential Architectural Drawings PDF
 
Entrepreneurship lessons in Philippines
Entrepreneurship lessons in  PhilippinesEntrepreneurship lessons in  Philippines
Entrepreneurship lessons in Philippines
 
Effective Strategies for Maximizing Your Profit When Selling Gold Jewelry
Effective Strategies for Maximizing Your Profit When Selling Gold JewelryEffective Strategies for Maximizing Your Profit When Selling Gold Jewelry
Effective Strategies for Maximizing Your Profit When Selling Gold Jewelry
 
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...
 
Unveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic ExperiencesUnveiling the Soundscape Music for Psychedelic Experiences
Unveiling the Soundscape Music for Psychedelic Experiences
 
Supercharge Your eCommerce Stores-acowebs
Supercharge Your eCommerce Stores-acowebsSupercharge Your eCommerce Stores-acowebs
Supercharge Your eCommerce Stores-acowebs
 
20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdf
20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdf20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdf
20220816-EthicsGrade_Scorecard-JP_Morgan_Chase-Q2-63_57.pdf
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...
 
Darshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdfDarshan Hiranandani [News About Next CEO].pdf
Darshan Hiranandani [News About Next CEO].pdf
 
Technical Leaders - Working with the Management Team
Technical Leaders - Working with the Management TeamTechnical Leaders - Working with the Management Team
Technical Leaders - Working with the Management Team
 
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

American Clusters Classification Methodology

  • 1. American Clusters Geodemographic Classification METHODOLOGY
  • 2. Geodemographics Birds of a feather flock together (English proverb). Geodemographics is the analysis of people by where they live (Sleight, 1996). Geodemographic classification categorizes neighborhoods based on their socio-economic and lifestyle characteristics.
  • 3. Applications of Geodemographic Classifications Commercial Non-for-Profit Market Research Public Sector Site Selection Health Care Trade Area Commercial – People who live in an area Non-for-profit – People who live in an area Education where there is... where there is... Analysis •A high probability of buying a •Low grades in secondary Local Authority Direct Marketing particular type of newspaper. school exams. Advertising •A high application rate for •High risk of developing Policing consolidation loans. diabetes. Management •A high consumption of •High fear of crime, but low Academic fashion goods. crime levels Media Analysis Poverty Prevention Elections Charities Source: adapted from: •Bolton Council Planning •Harris et al. 2005 Research •OAC
  • 4. Open Geodemographics Term “Open Geodemographics” was first used and applied by Dr. Dan Vickers, the author of “Multi-level Integrated Classification Based on the 2001 Census”. In this paper Dr. Vickers describes methods and techniques of building the National Classification of Census Output Areas (UK). Dr. Vickers has uploaded methodology and results of his work online for review and free download. To find out more about Open Geodemographics please watch this presentation.
  • 5. Project Objectives Are to create open and free geodemographic classification of USA using Census 2000 results following the methodology developed by Dr. Dan Vickers.
  • 6. Project Methodology Selection of cluster objects Variables (operational taxonomic Variables selection standardization units) Interpretation, Clustering method selection Identification of cluster testing and mapping of number clusters The classification building process was largely based on the methodology elaborated by Daniel Vickers in “Multi-level Integrated Classification Based on the 2001 Census (2006)”. Other methodologies were also considered such as the methodology of building MOSAIC described in “Geodemographics, GIS and Neighborhood Targeting” R. Harris, P. Sleight, R. Webber, Wiley (2005).
  • 7. Classification Inputs 208,000 Census block groups of all 50 states 281,000,000 Overall population 106,000,000 Households
  • 8. Data and Variables Data US Census 2000 - the major source of data for the classification Variables Some researchers such as Harris and Webber suggest the inclusion of as many variables as possible to create “more meaningful clusters” (Harris et al, 2005) . However another opinion exists that the minimum number of variables should be used for analysis in order to prevent data redundancy and collinearity (researchers Everitt and Vickers)(Vickers, 2006). In our classification we will try to use only “necessary” variables to avoid data redundancy and collinearity.
  • 9. Variables Selection List of initial 77 variables 1. Male 28. Married 53. Percent of monthly cost (without 2. Female 29. Divorced mortgage) exceeds 25 3. Urban 30. Widowed 54. High School Degree 4. Rural 31. Never Married 55. Higher Education Attainment 5. White 32. Occupied by renters 56. No car households 6. Black 33. Median Rent 57. 2+ car households 7. Native 34. Avg. house size 58. Work at home 8. Asian 35. One bedroom 59. Car to work 9. Hawaiian 36. Four bedrooms 60. Carpool to work 10. Some other race 37. Gas 61. Public Transport to Work 11. Two or more races 38. Bottled Gas Kerosene 62. Bicycle or Walk to Work 12. Aged 0-4 39. Wood 63. Long commuters (60+min to work) 13. Aged 5-14 40. No fuel 64. Short Commuters (less 15 min to work) 14. Aged 15-24 41. Median year house build 65. To work from 12 to 5am 15. Aged 25-44 42. Median house value 66. To work from 8 to 10 am 16. Aged 45-64 43. Occupied more 1.5 person per room 67. To work from 4 pm to 11:59 pm 17. Aged 65+ 44. Occupied less 0.5 person per room 68. Standardised Disability Ratio 18. Foreign born 45. lacking complete plumbing facilities 69. Unemployed 19. Not a citizen 46. lacking complete kitchen facilities 70. Working part-time 20. Spanish language households 47. Households without a mortgage 71. People living below poverty line 21. Other Eurp languages households 48. Second mortgage or home equity 72. Retirement Income 22. Asian language households 49. Owner cost with mortgage 73. Public Assistance Income 23. One person households (under 65) 50. Owner cost without mortgage 74. Supplemental Security Income 24. One person households (over 65) 51. Percent of monthly cost (with 75. Social Security Income for HH 25. One parent households mortgage) exceeds 40 76. Interest, Dividends or Net rental income 26. Family households with no children 52. Percent of monthly cost (with for HH 27. Non family households mortgage ) does not exceed 10 77. Median HH income some of them had to be removed...
  • 10. Variables Selection The initial list of variables was reviewed and reduced by applying three analytical tools: • Principal Component Analysis (PCA)/Factor analysis: variables with high factor loadings were selected • Correlation Matrix: pairs of variables strongly correlated with each other were examined and only one within each pair was left for analysis • Standard Deviation Evaluation: variables with low SD were not included because they vary little between block groups and do not bring value to the clustering process.
  • 11. Variables Selection List of variables was reduced from 77 to 47 variables 1.Urban population 2.Black 26.Second mortgage or home equity 3.Asian 27.Monthly cost (with mortgage ) less 10% 4.Aged under 5 28.Monthly cost (with mortgage ) more 40% 5.Aged 5-14 29.Higher education 6.Aged 15-24 30.Two cars households 7.Aged 25-44 31.Car to work 8.Aged 45-64 32.Public transport to work 9.Aged 65+ 33.Bicycle or walk to work 10.One person households (under 65) 34.Long commuters (60+min to work) 11.One person households (over 65) 35.To work from 8 to 10 am 12.One parent households 36.Standardized Disability Ratio 13.Households with no children 37.Unemployed 14.Foreign born population 38.Working part-time 15.Spanish language households 39.People living below poverty line 16.Other Eurp languages households 40.Retirement income 17.Never married 41.Public assistance income 18.Occupied by renters 42.Social security income 19.Median rent 43.Interest, dividends or net rental income 20.One bedroom 44.Median household income 21.Four bedrooms 45.Government workers 22.Bottled gas, oil, kerosene as fuel 46.Self-employed 23.Median house value 47.Agriculture, forestry, fishing and hunting and mining 24.Occupied 1.5 person per room 25.Households Without a mortgage
  • 12. Variable Transformation Variable transformation and standardization allowed for the inclusion of different types of data (e.g. percent of black population, medium no. of rooms, medium household income) Log transformation By transforming the data to a log (logarithmic) scale the problem of very high value outliers was greatly reduced as the difference between values at the extremities of the data set was reduced by more than those more typical average values. (Methods for Area classification for output areas, National Statistics, UK) http://www.statistics.gov.uk/about/methodology_by_theme/area_classification/oa/methodology.asp
  • 13. Variables Transformation Z-score standardization Z-score standardization is the most popular method of data standardization, but in our project it showed poor results because highly skewed variables were given too much weight. This could result in forming of wrong clusters in the clustering process. Range standardization To reduce the effect of highly skewed variables the method of range standardization was chosen. This method allows for the data to be standardized in the range between 0 (minimum value) and 1 (maximum value).
  • 14. Clustering Process K-means To create the classification k-means method (performed in SPSS) was chosen as it works well on large datasets (Vickers, 2006) The major issue with k-means is that the number of clusters has to be specified beforehand To determine the number of clusters 2 tests were performed: Test 1 Evaluation of an average distance to the cluster center Test 2 Cluster size range assessment
  • 15. Test1. Evaluation of average distance to cluster center It was suggested that the most useful number of clusters for the classification would be around 6 (Vickers, 2006). So the target number of k-means clusters would be found between 4 and 8. We needed to find the solution with the most significant increase in average distance from the cluster center. The most significant increase between two consecutive solutions was found at the point of 6 cluster solution.
  • 16. Test 2. Cluster size range Another important factor which was considered was the size of clusters: the more homogenous are clusters in terms of number of members the better. Mean range between optimal cluster size (all clusters are equal in number of members) and actual sizes of clusters for a given solution – the lower range the better.
  • 17. Number of Clusters 7 cluster solution was chosen 4 cluster solution worked good in both tests, but was outperformed by 5 and 7cluster solution which worked better in the second test. 6 cluster solutions showed poor results in the second test while 8 cluster solution didn’t pass the first test well.
  • 18. Group of Clusters These 7 clusters form a highest level of classification hierarchy. They represent the Group of Clusters. To build the second level each resulted cluster was split into number of smaller clusters by using the same methods we applied before. As the result we’ve got 18 distinguished clusters for the American Clusters Classification.
  • 19. Clusters Hierarchy 1st level of classification hierarchy (Group of Clusters) 2nd level of classification hierarchy 18 clusters
  • 20. Analyzing Clusters Now 18 identified clusters are ready to be analyzed All mean values of clusters were compared with the dataset mean values.
  • 21. Describing CLusters Cluster 5.1- Upscale Couples “Significant share of these group members is self-employed’ Based on the comparison clusters were analyzed and described “Their incomes are two Then clusters were mapped and times higher than the US average” named “Upscale Couples” is a cluster of rich people with large share of men and women between 45 and 65 who live in suburban areas within the vicinity of large US cities. “Those who work prefer to use personal vehicle to get to work, and majority of them leave home between 8-10am.” Image Sources: http://www.flickr.com/photos/loungerie/3029049309/ http://www.flickr.com/photos/whartz/1066907283/
  • 22. Mapping Clusters Mapping clusters using free tools from Google
  • 24. Clusters Description To get cluster description just click on one of the links below Depressed Blocks Satellite City Young Families Low Income Families Small Town Communities Established Suburbs Upscale Couples Settled Achievers Prosperous B Boomers Suburban Middles Successful Families Rural Despair Country Life Rural Communities Farmers' Land Unfortunate Countryside Multicultural Communities Retired Citizens Hispanic Families
  • 25. References Callingham, M. (2005), From areal classification to geodemographics, paper presented at the Demographic User Group Conference, Royal Society, London 10th November 2005. Debenham, J. E. (2002), Understanding Geodemographic Classification: Creating The Building Blocks For An Extension, Working Paper 02/1 School of Geography, University of Leeds [online] http:// www.geog.leeds.ac.uk/wpapers/02-1.pdf Harris, R., Sleight, P. and Webber, R. (2005), Geodemographics, GIS and Neighbourhood Targeting, London, Wiley Longley, P. A. (2005), Geographical Information Systems: a renaissance of geodemographics for public service delivery, Progress in Human Geography, 29(1) Sleight, P. (2004) Targeting customers: How to Use Geodemographic and Lifestyle Data in Your Business, Henley-on –Thames, World Advertising Research Centre Vickers, D. (2006) , Multi-level Integrated Classification Based on the 2001 Census, The University of Leeds Webber, R. and Farr, M. (2001) , MOSAIC-From an area classification system to household classification, Journal of Targeting, Measurement and Analysis for Marketing,10(1).
  • 26. To find more please visit World Clusters.org

Editor's Notes