SlideShare a Scribd company logo
On the social scientific value of
transactional data

Ben Anderson* & Alexei Vernitski†
*

Department of Sociology
†
Department of Mathematical Sciences
26 May 2011
Outline
 What do we mean?
 A 21st Century Sociology?
 Case studies:
• Do deprived areas have different telephone calling patterns?
• Do households have similar calling patterns?
• Can we usefully classify households' calling patterns?

 Future directions
What do we mean?
 Transactional data:
• Generated by everyday life
• Automatically captured as part of 'business as usual'
• N = millions
• Billions of data points

 Literature commentary:
• Surveillance, Computer Science
What do we mean?
 Transactional data:
• Generated by everyday life
• Automatically captured as part of 'business as usual'
• N = millions
• Billions of data points

 Literature commentary:
• Surveillance, Computer Science
• Social Science
• Savage & Burrows, 2007
• doi:10.1177/0038038507080443
• 101 citations (Google Scholar)
• http://www.youtube.com/watch?v=ARLARDwLJhw
Examples
Traditional uses

“PayCheck profiles all 1.6 million postcodes in the UK
using information on over 4 million households from
lifestyle surveys and Census and Market Research data.
It is available as a mean, median and mode figure for
each postcode or as a PayCheck type.
PayCheck can be used for:
• Selecting names and addresses from the Ocean
• Coding up customer records for profiling or campaign
selections
• Profiling to understand how your customer group
compares to the rest of the UK population”

“Mosaic UK classifies consumers by household
or postcode, allowing you to optimise the use of the
segmentation depending upon the application.
• 46% of the data used to build Mosaic is nonCensus sourced information that is updated
annually. This enables Mosaic to monitor changes
in consumer behaviour and incorporate these each
year within the classification.
• Mosaic UK is validated by a comprehensive
programme of fieldwork and observational
research covering each of the UK's 120 postal
areas.”
New uses

Mobile Graz: © MIT
http://senseable
.mit.edu/grazre
altime/

Abstracted 'Segments'
aren't needed
- fuzzy matching by
items, relationship,
location
A 21st Century Sociology?
• Re-assessing old questions
• Networks, place, space & social relationships (capital)
• Consumption, leisure and class?
• Public performance of self?

• Imagining new questions?
• Software & social stratification?
• ?
A 21st Century Sociology?
• Re-assessing old questions
• Networks, place, space & social relationships (capital)
• Consumption, leisure and class?
• Public performance of self?

• Imagining new questions?
• Software & social stratification?
• ?

• What might our students need to know?
• Data provenance & politics?
• Imputation, visualisation, signal analysis…?
• An Industrial Sociology?
Case study data
•

BT 100,000 data
•
•
•

sample of 103,113 households covering all of the UK in 1995
All outgoing billable calls recorded for the months of October 1995,
March 1996, October 1996, March 1997, October 1997, March 1998
Linked to Customer data – postcode, billing flags, ACORN code
Case study data
•

BT 100,000 data
•
•
•

•

sample of 103,113 households covering all of the UK in 1995
All outgoing billable calls recorded for the months of October 1995,
March 1996, October 1996, March 1997, October 1997, March 1998
Linked to Customer data – postcode, billing flags, ACORN code

BT/Essex Home OnLine household panel data
•
•
•

representative sample of 1000 households covering all of GB in 1998
3 wave household panel survey (1998/1999/2000)
All incoming & outgoing billable calls recorded for 438 of the 1000
households who:
a)
b)

were BT customers and
gave consent for call records to be linked to survey
Data: BT 100,000
•
•

October 1995 +
October/March 1996/7/8
A sample, not complete
coverage

England Northern I reland S cotland
In first sample (Oct 95) only
In first & second samples only
…
In all samples
70,195
10,305
11,485
Total
70,195
10,305
11,485

W ales
11,131
11,131

Unknow n ( phone Unknow n
number
( postcode
unmatched)
unmatched)
1,606
2,245
1,149
11,245

Useful longitudinal sample = 103,116

261
261

Total
1,606
2,245
104,526
114,622
Data: BT 100,000 sample

Aggregated to
OAs
Datazones
LSOA (Eng & Wales)
SOA (NI)
Datazone (Scotland)
Data: BT 100,000 sample
•
•
•

October 1995 +
October/March 1996/7/8
A sample, not complete
coverage
So is it a representative
sample?
Data: BT 100,000 sample
•

October 1995 +
October/March 1996/7/8
A sample, not complete
coverage

•

16.0%
14.0%

•

So is it a representative
sample?

12.0%
% of sampled households
(unfiltered)

10.0%

% of sampled households
(filtered)

8.0%

% of all households
6.0%

•

Sample tends to have fewer at
the margins, especially in the
most deprived areas.

4.0%
2.0%
0.0%
0

1

2

3

4

5

6

7

8

9

IMD 2004 decile

•
•
•
•

IMD = Index of Multi[ple Deprivation (area level Government statistics)
England data only (n = 32,482 LSOAs of which 1,743 contained at least one sampled number and 593 contained over 33 or 7.5%)
70,195 households/numbers (unfiltered), 51,118 (filtered)
Filter removes areas where number of households logged > 30
Data: BT ‘100,000’ sample selection bias?
• Areas where calls collected more likely to have…
• Slightly higher % of households in higher socio-economic groups
• Lower % of households with 2+ cars
• Live in lower IMD employment deprivation scores
• West Midlands & Yorkshire location

• Implications:
• Generalisability?
• Ideally we’d want a random sample
But…

•
•
•

England data only (n = 32,482 LSOAs of which 1,743 contained at least one sampled number and 593 contained over 33 or 7.5%)
70,195 households/numbers (unfiltered), 51,118 (filtered)
Filter removes areas where number of households logged > 30
Data: Home OnLine Household Panel
•
•

June 1998 - July 2001, 999 households
A representative sample by design
partner, kids > 15
partner, kids > 11
partner, kids < 12
partner , no kids, > 55
partner , no kids, < 56
couple, no kids, aged < 36
lone parent , all kids > 15
lone parent , all kids < 16
un-related others
alone over 55
alone under 56
0

5

10

15

% of households
Un-monitored

Monitored

20

•

A total of 438 households were
monitored at some point
Outline
 What do we mean?
 A 21st Century Sociology?
 Case studies:
• Do deprived areas have different telephone calling patterns?
• Do households have similar calling patterns?
• Can we usefully classify households' calling patterns?

 Future directions
Case study 1: Deprivation & Communication

Network Diversity and Economic
Development
Nathan Eagle, et al. Science 328,
1029 (2010); DOI:
10.1126/science.1186605
Case study 1: Deprivation & Communication
•

Our approach:
•

Simplified ‘localness’ metric:
–
–

•

Ratio of local to national calls (ratio = n local† / n national)
Higher localness -> lower economic opportunity?

BT 100,000 data
–

•

March 1998 data to match IMD 2004 data

Local call = same dialing code area; Regional calls ~= Government Office Region (GOR); National calls ~= between GORs. Not
geographic distance.
†
Case Study 1: ‘Localness’ and ‘deprivation’
•
•
•
•

•
•
•

IMD 2004
% calls
Filtered
Distance
wasn’t dead in
1998

LSOA level
March 1998, England data only (1,743 LSOAs contained at least one sampled number and 593 contained over 33
or 7.5%). 1,743 LSOAs = 5M calls, 593 LSOAs = 3.7M calls
IMD 2004 (2001 data)
Case Study 1: ‘Localness’ by region

•
•
•

Household level, highest 10% of ‘Localness’ (outliers) removed - predominantly in Wales & Northern Ireland
March 1998, All = 78,127 ‘households’ (11,737 made no local calls and 30,894 made no national calls), England =
50,971 ‘households’
IMD 2004 (2001 data)
Case Study 1: ‘Localness’ and deprivation

•
•
•

Household level, highest 10% of ‘Localness’ (outliers) removed - predominantly in Wales & Northern Ireland
March 1998, All = 78,127 ‘households’ (11,737 made no local calls and 30,894 made no national calls), England =
50,971 ‘households’
IMD 2004 (2001 data)
Case Study 1: ‘Localness’ and ‘employment deprivation’:
•
•

‘Localness’
Ratio of
national to
local calls (=
local*/national)

•

IMD 2004
– Employment score
– R = 0.53

•
•
•
•

Household level, highest 10% of ‘Localness’ (outliers) removed - predominantly in Wales & Northern Ireland
March 1998, All = 78,127 ‘households’ (11,737 made no local calls and 30,894 made no national calls), England = 50,971 ‘households’
IMD 2004 (2001 data)
*Local call = same dialing code area; Regional calls ~= between GORs. Not geographic distance.
Case study 1: Deprivation and communication
• Probably!
• Localness measure
• For

both:

• Higher rates correlate (+) with deprivation
• Especially employment deprivation
Outline
 What do we mean?
 A 21st Century Sociology?
 Case studies:
• Do deprived areas have different telephone calling patterns?
• Do households have similar calling patterns?
• Can we usefully classify households' calling patterns?

 Future directions
Case study 2: Patterns of communication
• Why?
• Habits & rhythms of life…
• What’s all this about?

BT “100,000” data – sample of 103,113 households
October 1995 call data (7,935,195 calls)
Case study 2: Patterns of communication

BT “100,000” data – sample of 103,113 households covering all of the UK in 1995
Local calls = n/10
October 1995 data (7,935,195 calls)
Case study 2: Patterns of communication
BT “Home OnLine Panel” households
6

•

500

Sundays:
•
Longer calls
•
Fewer calls

•

But what’s this?
•
A data blip!

450
5

300
3

250
200

2

Mean call duration

350
4

150
100
1
50

5-Apr

29-Mar

22-Mar

15-Mar

6-Mar

1-Mar

22-Feb

15-Feb

8-Feb

1-Feb

25-Jan

18-Jan

0
11-Jan

0
4-Jan

Mean calls per household

400

Day
Mean number of calls per household

•
•

Mean call duration (s)

BT “Home OnLine Panel” data (423 households) - slightly more likely to be ‘alone over 55’ and living in a semi-detached
house/bungalow
January - April 2000
Case study 2: Patterns of communication

Duration of calls…
•

Mean call duration (s)
1.0

Autocorrelation analysis:
• Every 7th day is similar
•

•

.5

Every 14th day is similar
•

Partial ACF

0.0

-.5

1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35

-1.0

Confidence Limits

Coefficient

Lag Number

BT “Home Online Panel” data – 1999/2000 excerpt

•

weekly autocorrelation
fortnightly autocorrelation
even after allowing for
weekly autocorrelation

Humans are creatures of
(localised) habit!
Case study 2: Patterns of communication
• Biological networks:
• Many nodes with few connections &
few nodes with many connections: y =
a/x or y = a/x2 (log-linear or ‘scalefree’)

• Human calling networks:
• Same principle
• McCarty et al (2001) Comparing two
methods for estimating network size,
Human Organization; Spring 2001;
60, 1; pg. 28

• Do we find the same thing?
Case study 2: Patterns of communication
• Outgoing calls

8

• X: log (number of calls)
• Y: log (number of telephone
numbers)

7
6

5
4

400

3

350

2

300

250

200

y = -0.9494x +
7.2

1
0

R2 = 0.9289

0
150

2

• Informally:

100

50

0
0

•
•

1

100

200

300

400

500

600

700

800

900

BT “Home Online Panel” data – 1999/2000 excerpt
Data for 1 ‘typical’ household

3

4

5

6

7

8

• 1/2 the numbers called are
called once
• Of the rest 1/3 are called twice
• Of the remainder 1/4 are called
three times
• etc
Case study 2: Patterns of communication

• Outgoing calls and Incoming calls show roughly the
same pattern
•
•

BT “Home Online Panel” data – 1999/2000 excerpt
Data for 1 ‘typical’ household
Outline
 What do we mean?
 A 21st Century Sociology?
 Case studies:
• Do deprived areas have different telephone calling patterns?
• Do households have similar calling patterns?
• Can we usefully classify households' calling patterns?

 Future directions
Case study 3: Classifying households

• Incoming

• Outgoing

• ‘Mirror’ curves for two households
•
•

BT “Home Online Panel” data – 1999/2000 excerpt
Data for 1 ‘typical’ household
Case study 3: Classifying households
• ‘Mirror’ curves
for several
households
4045

4.5
4
3.5
3
2.5
2

Series1

1.5
1
0.5
0
(4

•

BT “Home Online Panel” data – 1999/2000 excerpt

(3

(2

(1

0

1
Case study 3: Classifying households
• Can we use these curves to classify households?
• Of the 391 who match to wave 1 survey data, 16 are ‘unusual
shapes’
• E.g. ‘4045’
4045
•
•
•
•
•
•
•
•
•

•
•

Single person
White male
Aged 46
Lives alone, divorced
Speaks to neighbours at least once a week
0 local friends
4 non local friends
2 local relatives
12 non-local relatives
'4

BT “Home Online Panel” data – 1999/2000 excerpt
Data for 1 ‘typical’ household

'3

5
4
3
Series1

2
1
0
'2

'1

0

1
Case study 3: Classifying households
• Of the 16 ‘unusual shapes’
• Predominantly single adults
• Alone > 55
• Alone < 55
• Lone parent with young
child(ren)

• Why?

partner+kid>15
partner+kid>11
partner+kid<12
part,nokid,>55
part,nokid,<56
part,nokid,<36
lonepar,kid>15
lonepar,kid<16
un-other rel
alone over 55
alone under 56
0

5

10

15

20

% of households
"Normal"

•
•

BT “Home Online Panel” data – 1999/2000 excerpt
Data for 1 ‘typical’ household

"Unusual"

25

30
Outline
 What do we mean?
 A 21st Century Sociology?
 Case studies:
• Do deprived areas have different telephone calling patterns?
• Do households have similar calling patterns?
• Can we usefully classify households' calling patterns?

 Future directions
Future directions:
• Analysis of call duration
• Do we get the same shape?
• How do we combine
frequency & duration?
• Can the curves predict
anything about the
households?
Future directions:
• Household composition & network structure?
Household transition
Future directions:
• Network characteristics in places
• Do people in different kinds of places have different kinds of (ego)
networks
• Does this change over time?
• Can we represent a place by some combination of ego networks?
• To what extent do ego networks overlap
• And how do we deal with the sampling issue?
Future directions:
• Network characteristics in places
• Do people in different kinds of places have different kinds of (ego)
networks
• Does this change over time?
• Can we represent a place by some combination of ego networks?
• To what extent do ego networks overlap
• And how do we deal with the sampling issue?
BUT: Data Issues
• Business calls?
• Some of the (longer) calls may be to ISPs
• Filter?

• Some things don’t match up:
• 36 of the Home OnLine panel households monitored were not BT customers
(or so they said in the survey!)
• Of the 136,732 different callers, 17,947 had no recorded number
•
•
•

Of the 124,032 different callees 1,963 had no recorded number and 21,295 had a
number 3 digits long (54% of these were calls to 192, 13% to ‘123’)
There are many semi-duplicates (impossible calls) - internal BT data?
And there are call type codes no-one understands any more!

> When you didn’t design the data source…
•
•

Provenance is crucial!
Data forensics skills are critical!
Thank you!

• Dr Ben Anderson
• benander@essex.ac.uk
• http://cresi.essex.ac.uk/getperson?personID=1

• Dr Alexei Verntiski
• asvern@essex.ac.uk
• http://www.essex.ac.uk/maths/staff/profile.aspx?ID=1275

• BT funded feasibility project:
• http://cresi.essex.ac.uk/getproject?projectID=46

More Related Content

Similar to On the social scientific value of transactional data

Arnaldo Barreto & Ben Anderson - The sociological value of transactional data?
Arnaldo Barreto & Ben Anderson - The sociological value of transactional data?Arnaldo Barreto & Ben Anderson - The sociological value of transactional data?
Arnaldo Barreto & Ben Anderson - The sociological value of transactional data?
Sociology@Essex
 
Presentation - Using open data to develop statistical literacy in schools - U...
Presentation - Using open data to develop statistical literacy in schools - U...Presentation - Using open data to develop statistical literacy in schools - U...
Presentation - Using open data to develop statistical literacy in schools - U...
celiamac58
 
TCI 2015 Cluster Mapping: Pattern in Irish National and Regional Economic Act...
TCI 2015 Cluster Mapping: Pattern in Irish National and Regional Economic Act...TCI 2015 Cluster Mapping: Pattern in Irish National and Regional Economic Act...
TCI 2015 Cluster Mapping: Pattern in Irish National and Regional Economic Act...
TCI Network
 
Linking Socio-economic and Demographic Characteristics to Twitter Topics - Gu...
Linking Socio-economic and Demographic Characteristics to Twitter Topics - Gu...Linking Socio-economic and Demographic Characteristics to Twitter Topics - Gu...
Linking Socio-economic and Demographic Characteristics to Twitter Topics - Gu...
Guy Lansley
 
Homelessness and Labour Force Participation. Evidence from an Original Data C...
Homelessness and Labour Force Participation. Evidence from an Original Data C...Homelessness and Labour Force Participation. Evidence from an Original Data C...
Homelessness and Labour Force Participation. Evidence from an Original Data C...
FEANTSA
 
Presentation - Using open data to develop statistical literacy in schools - c...
Presentation - Using open data to develop statistical literacy in schools - c...Presentation - Using open data to develop statistical literacy in schools - c...
Presentation - Using open data to develop statistical literacy in schools - c...
celiamac58
 
Introduction automated zone design
Introduction automated zone designIntroduction automated zone design
Introduction automated zone design
University of Southampton
 
Monitoring Internal Migration in the United Kingdom
Monitoring Internal Migration in the United KingdomMonitoring Internal Migration in the United Kingdom
Monitoring Internal Migration in the United Kingdom
UKDSCensus
 
Measuring and Mapping Population
Measuring and Mapping PopulationMeasuring and Mapping Population
Measuring and Mapping Population
hantsga
 
Practice Hunting
Practice HuntingPractice Hunting
Practice Hunting
Ben Anderson
 
Big Data in Economic Research: Twitter, Phone calls and Political events
Big Data in Economic Research: Twitter, Phone calls and Political eventsBig Data in Economic Research: Twitter, Phone calls and Political events
Big Data in Economic Research: Twitter, Phone calls and Political events
PhDSofiaUniversity
 
Seriously Mixed Methods - a GRIDy Challenge?
Seriously Mixed Methods - a GRIDy Challenge?Seriously Mixed Methods - a GRIDy Challenge?
Seriously Mixed Methods - a GRIDy Challenge?
Ben Anderson
 
Sutton in partnership presentation residents survey and one planet sutton -...
Sutton in partnership presentation   residents survey and one planet sutton -...Sutton in partnership presentation   residents survey and one planet sutton -...
Sutton in partnership presentation residents survey and one planet sutton -...
SuttoninPartnership
 
Paul ita 2015 telework
Paul ita 2015 teleworkPaul ita 2015 telework
Paul ita 2015 telework
pauljackson1966
 
SAGT Conference 2015 - Scottish Government Stats and teaching
SAGT Conference 2015 - Scottish Government Stats and teachingSAGT Conference 2015 - Scottish Government Stats and teaching
SAGT Conference 2015 - Scottish Government Stats and teaching
celiamac58
 
ESS IA -Survey IA -2
ESS IA -Survey IA -2ESS IA -Survey IA -2
ESS IA -Survey IA -2
GURU CHARAN KUMAR
 
Employment leakage by local government are in the Northern Territory, Austral...
Employment leakage by local government are in the Northern Territory, Austral...Employment leakage by local government are in the Northern Territory, Austral...
Employment leakage by local government are in the Northern Territory, Austral...
Ninti_One
 
Introduction to Census data and practical applications - Geography Skills Abe...
Introduction to Census data and practical applications - Geography Skills Abe...Introduction to Census data and practical applications - Geography Skills Abe...
Introduction to Census data and practical applications - Geography Skills Abe...
celiamac58
 
Jerker statistics sweden covid 19 response
Jerker statistics sweden covid 19 responseJerker statistics sweden covid 19 response
Jerker statistics sweden covid 19 response
Juha Saarentaus
 
A. Bates, Results of the experience in the UK using Travel to Work Areas, an...
A. Bates,  Results of the experience in the UK using Travel to Work Areas, an...A. Bates,  Results of the experience in the UK using Travel to Work Areas, an...
A. Bates, Results of the experience in the UK using Travel to Work Areas, an...
Istituto nazionale di statistica
 

Similar to On the social scientific value of transactional data (20)

Arnaldo Barreto & Ben Anderson - The sociological value of transactional data?
Arnaldo Barreto & Ben Anderson - The sociological value of transactional data?Arnaldo Barreto & Ben Anderson - The sociological value of transactional data?
Arnaldo Barreto & Ben Anderson - The sociological value of transactional data?
 
Presentation - Using open data to develop statistical literacy in schools - U...
Presentation - Using open data to develop statistical literacy in schools - U...Presentation - Using open data to develop statistical literacy in schools - U...
Presentation - Using open data to develop statistical literacy in schools - U...
 
TCI 2015 Cluster Mapping: Pattern in Irish National and Regional Economic Act...
TCI 2015 Cluster Mapping: Pattern in Irish National and Regional Economic Act...TCI 2015 Cluster Mapping: Pattern in Irish National and Regional Economic Act...
TCI 2015 Cluster Mapping: Pattern in Irish National and Regional Economic Act...
 
Linking Socio-economic and Demographic Characteristics to Twitter Topics - Gu...
Linking Socio-economic and Demographic Characteristics to Twitter Topics - Gu...Linking Socio-economic and Demographic Characteristics to Twitter Topics - Gu...
Linking Socio-economic and Demographic Characteristics to Twitter Topics - Gu...
 
Homelessness and Labour Force Participation. Evidence from an Original Data C...
Homelessness and Labour Force Participation. Evidence from an Original Data C...Homelessness and Labour Force Participation. Evidence from an Original Data C...
Homelessness and Labour Force Participation. Evidence from an Original Data C...
 
Presentation - Using open data to develop statistical literacy in schools - c...
Presentation - Using open data to develop statistical literacy in schools - c...Presentation - Using open data to develop statistical literacy in schools - c...
Presentation - Using open data to develop statistical literacy in schools - c...
 
Introduction automated zone design
Introduction automated zone designIntroduction automated zone design
Introduction automated zone design
 
Monitoring Internal Migration in the United Kingdom
Monitoring Internal Migration in the United KingdomMonitoring Internal Migration in the United Kingdom
Monitoring Internal Migration in the United Kingdom
 
Measuring and Mapping Population
Measuring and Mapping PopulationMeasuring and Mapping Population
Measuring and Mapping Population
 
Practice Hunting
Practice HuntingPractice Hunting
Practice Hunting
 
Big Data in Economic Research: Twitter, Phone calls and Political events
Big Data in Economic Research: Twitter, Phone calls and Political eventsBig Data in Economic Research: Twitter, Phone calls and Political events
Big Data in Economic Research: Twitter, Phone calls and Political events
 
Seriously Mixed Methods - a GRIDy Challenge?
Seriously Mixed Methods - a GRIDy Challenge?Seriously Mixed Methods - a GRIDy Challenge?
Seriously Mixed Methods - a GRIDy Challenge?
 
Sutton in partnership presentation residents survey and one planet sutton -...
Sutton in partnership presentation   residents survey and one planet sutton -...Sutton in partnership presentation   residents survey and one planet sutton -...
Sutton in partnership presentation residents survey and one planet sutton -...
 
Paul ita 2015 telework
Paul ita 2015 teleworkPaul ita 2015 telework
Paul ita 2015 telework
 
SAGT Conference 2015 - Scottish Government Stats and teaching
SAGT Conference 2015 - Scottish Government Stats and teachingSAGT Conference 2015 - Scottish Government Stats and teaching
SAGT Conference 2015 - Scottish Government Stats and teaching
 
ESS IA -Survey IA -2
ESS IA -Survey IA -2ESS IA -Survey IA -2
ESS IA -Survey IA -2
 
Employment leakage by local government are in the Northern Territory, Austral...
Employment leakage by local government are in the Northern Territory, Austral...Employment leakage by local government are in the Northern Territory, Austral...
Employment leakage by local government are in the Northern Territory, Austral...
 
Introduction to Census data and practical applications - Geography Skills Abe...
Introduction to Census data and practical applications - Geography Skills Abe...Introduction to Census data and practical applications - Geography Skills Abe...
Introduction to Census data and practical applications - Geography Skills Abe...
 
Jerker statistics sweden covid 19 response
Jerker statistics sweden covid 19 responseJerker statistics sweden covid 19 response
Jerker statistics sweden covid 19 response
 
A. Bates, Results of the experience in the UK using Travel to Work Areas, an...
A. Bates,  Results of the experience in the UK using Travel to Work Areas, an...A. Bates,  Results of the experience in the UK using Travel to Work Areas, an...
A. Bates, Results of the experience in the UK using Travel to Work Areas, an...
 

More from Ben Anderson

Using Time Use Data To Trace 'Energy Practices' Through Time
Using Time Use Data To Trace 'Energy Practices' Through TimeUsing Time Use Data To Trace 'Energy Practices' Through Time
Using Time Use Data To Trace 'Energy Practices' Through Time
Ben Anderson
 
Modeling Water Demand in Droughts (in England & Wales)
Modeling Water Demand in Droughts (in England & Wales)Modeling Water Demand in Droughts (in England & Wales)
Modeling Water Demand in Droughts (in England & Wales)
Ben Anderson
 
Modeling Water Demand in Droughts (in England & Wales)
Modeling Water Demand in Droughts (in England & Wales)Modeling Water Demand in Droughts (in England & Wales)
Modeling Water Demand in Droughts (in England & Wales)
Ben Anderson
 
A Social Practices-based Microsimulation Model for Estimating Domestic Hot Wa...
A Social Practices-based Microsimulation Model for Estimating Domestic Hot Wa...A Social Practices-based Microsimulation Model for Estimating Domestic Hot Wa...
A Social Practices-based Microsimulation Model for Estimating Domestic Hot Wa...
Ben Anderson
 
SAVE: Lightning Talk
SAVE: Lightning TalkSAVE: Lightning Talk
SAVE: Lightning Talk
Ben Anderson
 
SAVE: A large scale randomised control trial approach to testing domestic ele...
SAVE: A large scale randomised control trial approach to testing domestic ele...SAVE: A large scale randomised control trial approach to testing domestic ele...
SAVE: A large scale randomised control trial approach to testing domestic ele...
Ben Anderson
 
Hunting for (energy) demanding practices using big & medium sized data
Hunting for (energy) demanding practices using big & medium sized dataHunting for (energy) demanding practices using big & medium sized data
Hunting for (energy) demanding practices using big & medium sized data
Ben Anderson
 
Electricity consumption and household characteristics: Implications for censu...
Electricity consumption and household characteristics: Implications for censu...Electricity consumption and household characteristics: Implications for censu...
Electricity consumption and household characteristics: Implications for censu...
Ben Anderson
 
Small Area Estimation as a tool for thinking about temporal and spatial varia...
Small Area Estimation as a tool for thinking about temporal and spatial varia...Small Area Estimation as a tool for thinking about temporal and spatial varia...
Small Area Estimation as a tool for thinking about temporal and spatial varia...
Ben Anderson
 
The Time and Timing of UK Domestic Energy DEMAND
The Time and Timing of UK Domestic Energy DEMANDThe Time and Timing of UK Domestic Energy DEMAND
The Time and Timing of UK Domestic Energy DEMAND
Ben Anderson
 
Modeling Electricity Demand in Time and Space
Modeling Electricity Demand in Time and SpaceModeling Electricity Demand in Time and Space
Modeling Electricity Demand in Time and Space
Ben Anderson
 
Developing insight from commercial data to support #Census2022
Developing insight from commercial data to support #Census2022 Developing insight from commercial data to support #Census2022
Developing insight from commercial data to support #Census2022
Ben Anderson
 
PRACTICE HUNTING: Time Use Surveys for a quantification of practices distribu...
PRACTICE HUNTING: Time Use Surveys for a quantification of practices distribu...PRACTICE HUNTING: Time Use Surveys for a quantification of practices distribu...
PRACTICE HUNTING: Time Use Surveys for a quantification of practices distribu...
Ben Anderson
 
Census2022: Extracting value from domestic consumption data in a post­census era
Census2022: Extracting value from domestic consumption data in a post­census eraCensus2022: Extracting value from domestic consumption data in a post­census era
Census2022: Extracting value from domestic consumption data in a post­census era
Ben Anderson
 
The Rhythms and Components of ‘Peak Energy’ Demand
The Rhythms and Components of ‘Peak Energy’ DemandThe Rhythms and Components of ‘Peak Energy’ Demand
The Rhythms and Components of ‘Peak Energy’ Demand
Ben Anderson
 
Tracking Social Practices with Big(ish) data
Tracking Social Practices with Big(ish) dataTracking Social Practices with Big(ish) data
Tracking Social Practices with Big(ish) data
Ben Anderson
 
Modes of commuting, workplace choice and energy use at home
Modes of commuting, workplace choice and energy use at homeModes of commuting, workplace choice and energy use at home
Modes of commuting, workplace choice and energy use at home
Ben Anderson
 
Do ‘eco’ attitudes & behaviours explain the uptake of domestic energy product...
Do ‘eco’ attitudes & behaviours explain the uptake of domestic energy product...Do ‘eco’ attitudes & behaviours explain the uptake of domestic energy product...
Do ‘eco’ attitudes & behaviours explain the uptake of domestic energy product...
Ben Anderson
 
Small Area Estimation as a tool for thinking about spatial variation in energ...
Small Area Estimation as a tool for thinking about spatial variation in energ...Small Area Estimation as a tool for thinking about spatial variation in energ...
Small Area Estimation as a tool for thinking about spatial variation in energ...
Ben Anderson
 
The Distribution of Domestic Energy-Tech in Great Britain: 2008 – 2011
The Distribution of Domestic Energy-Tech in Great Britain: 2008 – 2011The Distribution of Domestic Energy-Tech in Great Britain: 2008 – 2011
The Distribution of Domestic Energy-Tech in Great Britain: 2008 – 2011
Ben Anderson
 

More from Ben Anderson (20)

Using Time Use Data To Trace 'Energy Practices' Through Time
Using Time Use Data To Trace 'Energy Practices' Through TimeUsing Time Use Data To Trace 'Energy Practices' Through Time
Using Time Use Data To Trace 'Energy Practices' Through Time
 
Modeling Water Demand in Droughts (in England & Wales)
Modeling Water Demand in Droughts (in England & Wales)Modeling Water Demand in Droughts (in England & Wales)
Modeling Water Demand in Droughts (in England & Wales)
 
Modeling Water Demand in Droughts (in England & Wales)
Modeling Water Demand in Droughts (in England & Wales)Modeling Water Demand in Droughts (in England & Wales)
Modeling Water Demand in Droughts (in England & Wales)
 
A Social Practices-based Microsimulation Model for Estimating Domestic Hot Wa...
A Social Practices-based Microsimulation Model for Estimating Domestic Hot Wa...A Social Practices-based Microsimulation Model for Estimating Domestic Hot Wa...
A Social Practices-based Microsimulation Model for Estimating Domestic Hot Wa...
 
SAVE: Lightning Talk
SAVE: Lightning TalkSAVE: Lightning Talk
SAVE: Lightning Talk
 
SAVE: A large scale randomised control trial approach to testing domestic ele...
SAVE: A large scale randomised control trial approach to testing domestic ele...SAVE: A large scale randomised control trial approach to testing domestic ele...
SAVE: A large scale randomised control trial approach to testing domestic ele...
 
Hunting for (energy) demanding practices using big & medium sized data
Hunting for (energy) demanding practices using big & medium sized dataHunting for (energy) demanding practices using big & medium sized data
Hunting for (energy) demanding practices using big & medium sized data
 
Electricity consumption and household characteristics: Implications for censu...
Electricity consumption and household characteristics: Implications for censu...Electricity consumption and household characteristics: Implications for censu...
Electricity consumption and household characteristics: Implications for censu...
 
Small Area Estimation as a tool for thinking about temporal and spatial varia...
Small Area Estimation as a tool for thinking about temporal and spatial varia...Small Area Estimation as a tool for thinking about temporal and spatial varia...
Small Area Estimation as a tool for thinking about temporal and spatial varia...
 
The Time and Timing of UK Domestic Energy DEMAND
The Time and Timing of UK Domestic Energy DEMANDThe Time and Timing of UK Domestic Energy DEMAND
The Time and Timing of UK Domestic Energy DEMAND
 
Modeling Electricity Demand in Time and Space
Modeling Electricity Demand in Time and SpaceModeling Electricity Demand in Time and Space
Modeling Electricity Demand in Time and Space
 
Developing insight from commercial data to support #Census2022
Developing insight from commercial data to support #Census2022 Developing insight from commercial data to support #Census2022
Developing insight from commercial data to support #Census2022
 
PRACTICE HUNTING: Time Use Surveys for a quantification of practices distribu...
PRACTICE HUNTING: Time Use Surveys for a quantification of practices distribu...PRACTICE HUNTING: Time Use Surveys for a quantification of practices distribu...
PRACTICE HUNTING: Time Use Surveys for a quantification of practices distribu...
 
Census2022: Extracting value from domestic consumption data in a post­census era
Census2022: Extracting value from domestic consumption data in a post­census eraCensus2022: Extracting value from domestic consumption data in a post­census era
Census2022: Extracting value from domestic consumption data in a post­census era
 
The Rhythms and Components of ‘Peak Energy’ Demand
The Rhythms and Components of ‘Peak Energy’ DemandThe Rhythms and Components of ‘Peak Energy’ Demand
The Rhythms and Components of ‘Peak Energy’ Demand
 
Tracking Social Practices with Big(ish) data
Tracking Social Practices with Big(ish) dataTracking Social Practices with Big(ish) data
Tracking Social Practices with Big(ish) data
 
Modes of commuting, workplace choice and energy use at home
Modes of commuting, workplace choice and energy use at homeModes of commuting, workplace choice and energy use at home
Modes of commuting, workplace choice and energy use at home
 
Do ‘eco’ attitudes & behaviours explain the uptake of domestic energy product...
Do ‘eco’ attitudes & behaviours explain the uptake of domestic energy product...Do ‘eco’ attitudes & behaviours explain the uptake of domestic energy product...
Do ‘eco’ attitudes & behaviours explain the uptake of domestic energy product...
 
Small Area Estimation as a tool for thinking about spatial variation in energ...
Small Area Estimation as a tool for thinking about spatial variation in energ...Small Area Estimation as a tool for thinking about spatial variation in energ...
Small Area Estimation as a tool for thinking about spatial variation in energ...
 
The Distribution of Domestic Energy-Tech in Great Britain: 2008 – 2011
The Distribution of Domestic Energy-Tech in Great Britain: 2008 – 2011The Distribution of Domestic Energy-Tech in Great Britain: 2008 – 2011
The Distribution of Domestic Energy-Tech in Great Britain: 2008 – 2011
 

Recently uploaded

Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
Bisnar Chase Personal Injury Attorneys
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
Assessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptxAssessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptx
Kavitha Krishnan
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
taiba qazi
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
ak6969907
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
 

Recently uploaded (20)

Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
Assessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptxAssessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptx
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
 

On the social scientific value of transactional data

  • 1. On the social scientific value of transactional data Ben Anderson* & Alexei Vernitski† * Department of Sociology † Department of Mathematical Sciences 26 May 2011
  • 2. Outline  What do we mean?  A 21st Century Sociology?  Case studies: • Do deprived areas have different telephone calling patterns? • Do households have similar calling patterns? • Can we usefully classify households' calling patterns?  Future directions
  • 3. What do we mean?  Transactional data: • Generated by everyday life • Automatically captured as part of 'business as usual' • N = millions • Billions of data points  Literature commentary: • Surveillance, Computer Science
  • 4. What do we mean?  Transactional data: • Generated by everyday life • Automatically captured as part of 'business as usual' • N = millions • Billions of data points  Literature commentary: • Surveillance, Computer Science • Social Science • Savage & Burrows, 2007 • doi:10.1177/0038038507080443 • 101 citations (Google Scholar) • http://www.youtube.com/watch?v=ARLARDwLJhw
  • 6. Traditional uses “PayCheck profiles all 1.6 million postcodes in the UK using information on over 4 million households from lifestyle surveys and Census and Market Research data. It is available as a mean, median and mode figure for each postcode or as a PayCheck type. PayCheck can be used for: • Selecting names and addresses from the Ocean • Coding up customer records for profiling or campaign selections • Profiling to understand how your customer group compares to the rest of the UK population” “Mosaic UK classifies consumers by household or postcode, allowing you to optimise the use of the segmentation depending upon the application. • 46% of the data used to build Mosaic is nonCensus sourced information that is updated annually. This enables Mosaic to monitor changes in consumer behaviour and incorporate these each year within the classification. • Mosaic UK is validated by a comprehensive programme of fieldwork and observational research covering each of the UK's 120 postal areas.”
  • 7. New uses Mobile Graz: © MIT http://senseable .mit.edu/grazre altime/ Abstracted 'Segments' aren't needed - fuzzy matching by items, relationship, location
  • 8. A 21st Century Sociology? • Re-assessing old questions • Networks, place, space & social relationships (capital) • Consumption, leisure and class? • Public performance of self? • Imagining new questions? • Software & social stratification? • ?
  • 9. A 21st Century Sociology? • Re-assessing old questions • Networks, place, space & social relationships (capital) • Consumption, leisure and class? • Public performance of self? • Imagining new questions? • Software & social stratification? • ? • What might our students need to know? • Data provenance & politics? • Imputation, visualisation, signal analysis…? • An Industrial Sociology?
  • 10. Case study data • BT 100,000 data • • • sample of 103,113 households covering all of the UK in 1995 All outgoing billable calls recorded for the months of October 1995, March 1996, October 1996, March 1997, October 1997, March 1998 Linked to Customer data – postcode, billing flags, ACORN code
  • 11. Case study data • BT 100,000 data • • • • sample of 103,113 households covering all of the UK in 1995 All outgoing billable calls recorded for the months of October 1995, March 1996, October 1996, March 1997, October 1997, March 1998 Linked to Customer data – postcode, billing flags, ACORN code BT/Essex Home OnLine household panel data • • • representative sample of 1000 households covering all of GB in 1998 3 wave household panel survey (1998/1999/2000) All incoming & outgoing billable calls recorded for 438 of the 1000 households who: a) b) were BT customers and gave consent for call records to be linked to survey
  • 12. Data: BT 100,000 • • October 1995 + October/March 1996/7/8 A sample, not complete coverage England Northern I reland S cotland In first sample (Oct 95) only In first & second samples only … In all samples 70,195 10,305 11,485 Total 70,195 10,305 11,485 W ales 11,131 11,131 Unknow n ( phone Unknow n number ( postcode unmatched) unmatched) 1,606 2,245 1,149 11,245 Useful longitudinal sample = 103,116 261 261 Total 1,606 2,245 104,526 114,622
  • 13. Data: BT 100,000 sample Aggregated to OAs Datazones LSOA (Eng & Wales) SOA (NI) Datazone (Scotland)
  • 14. Data: BT 100,000 sample • • • October 1995 + October/March 1996/7/8 A sample, not complete coverage So is it a representative sample?
  • 15. Data: BT 100,000 sample • October 1995 + October/March 1996/7/8 A sample, not complete coverage • 16.0% 14.0% • So is it a representative sample? 12.0% % of sampled households (unfiltered) 10.0% % of sampled households (filtered) 8.0% % of all households 6.0% • Sample tends to have fewer at the margins, especially in the most deprived areas. 4.0% 2.0% 0.0% 0 1 2 3 4 5 6 7 8 9 IMD 2004 decile • • • • IMD = Index of Multi[ple Deprivation (area level Government statistics) England data only (n = 32,482 LSOAs of which 1,743 contained at least one sampled number and 593 contained over 33 or 7.5%) 70,195 households/numbers (unfiltered), 51,118 (filtered) Filter removes areas where number of households logged > 30
  • 16. Data: BT ‘100,000’ sample selection bias? • Areas where calls collected more likely to have… • Slightly higher % of households in higher socio-economic groups • Lower % of households with 2+ cars • Live in lower IMD employment deprivation scores • West Midlands & Yorkshire location • Implications: • Generalisability? • Ideally we’d want a random sample But… • • • England data only (n = 32,482 LSOAs of which 1,743 contained at least one sampled number and 593 contained over 33 or 7.5%) 70,195 households/numbers (unfiltered), 51,118 (filtered) Filter removes areas where number of households logged > 30
  • 17. Data: Home OnLine Household Panel • • June 1998 - July 2001, 999 households A representative sample by design partner, kids > 15 partner, kids > 11 partner, kids < 12 partner , no kids, > 55 partner , no kids, < 56 couple, no kids, aged < 36 lone parent , all kids > 15 lone parent , all kids < 16 un-related others alone over 55 alone under 56 0 5 10 15 % of households Un-monitored Monitored 20 • A total of 438 households were monitored at some point
  • 18. Outline  What do we mean?  A 21st Century Sociology?  Case studies: • Do deprived areas have different telephone calling patterns? • Do households have similar calling patterns? • Can we usefully classify households' calling patterns?  Future directions
  • 19. Case study 1: Deprivation & Communication Network Diversity and Economic Development Nathan Eagle, et al. Science 328, 1029 (2010); DOI: 10.1126/science.1186605
  • 20. Case study 1: Deprivation & Communication • Our approach: • Simplified ‘localness’ metric: – – • Ratio of local to national calls (ratio = n local† / n national) Higher localness -> lower economic opportunity? BT 100,000 data – • March 1998 data to match IMD 2004 data Local call = same dialing code area; Regional calls ~= Government Office Region (GOR); National calls ~= between GORs. Not geographic distance. †
  • 21. Case Study 1: ‘Localness’ and ‘deprivation’ • • • • • • • IMD 2004 % calls Filtered Distance wasn’t dead in 1998 LSOA level March 1998, England data only (1,743 LSOAs contained at least one sampled number and 593 contained over 33 or 7.5%). 1,743 LSOAs = 5M calls, 593 LSOAs = 3.7M calls IMD 2004 (2001 data)
  • 22. Case Study 1: ‘Localness’ by region • • • Household level, highest 10% of ‘Localness’ (outliers) removed - predominantly in Wales & Northern Ireland March 1998, All = 78,127 ‘households’ (11,737 made no local calls and 30,894 made no national calls), England = 50,971 ‘households’ IMD 2004 (2001 data)
  • 23. Case Study 1: ‘Localness’ and deprivation • • • Household level, highest 10% of ‘Localness’ (outliers) removed - predominantly in Wales & Northern Ireland March 1998, All = 78,127 ‘households’ (11,737 made no local calls and 30,894 made no national calls), England = 50,971 ‘households’ IMD 2004 (2001 data)
  • 24. Case Study 1: ‘Localness’ and ‘employment deprivation’: • • ‘Localness’ Ratio of national to local calls (= local*/national) • IMD 2004 – Employment score – R = 0.53 • • • • Household level, highest 10% of ‘Localness’ (outliers) removed - predominantly in Wales & Northern Ireland March 1998, All = 78,127 ‘households’ (11,737 made no local calls and 30,894 made no national calls), England = 50,971 ‘households’ IMD 2004 (2001 data) *Local call = same dialing code area; Regional calls ~= between GORs. Not geographic distance.
  • 25. Case study 1: Deprivation and communication • Probably! • Localness measure • For both: • Higher rates correlate (+) with deprivation • Especially employment deprivation
  • 26. Outline  What do we mean?  A 21st Century Sociology?  Case studies: • Do deprived areas have different telephone calling patterns? • Do households have similar calling patterns? • Can we usefully classify households' calling patterns?  Future directions
  • 27. Case study 2: Patterns of communication • Why? • Habits & rhythms of life… • What’s all this about? BT “100,000” data – sample of 103,113 households October 1995 call data (7,935,195 calls)
  • 28. Case study 2: Patterns of communication BT “100,000” data – sample of 103,113 households covering all of the UK in 1995 Local calls = n/10 October 1995 data (7,935,195 calls)
  • 29. Case study 2: Patterns of communication BT “Home OnLine Panel” households 6 • 500 Sundays: • Longer calls • Fewer calls • But what’s this? • A data blip! 450 5 300 3 250 200 2 Mean call duration 350 4 150 100 1 50 5-Apr 29-Mar 22-Mar 15-Mar 6-Mar 1-Mar 22-Feb 15-Feb 8-Feb 1-Feb 25-Jan 18-Jan 0 11-Jan 0 4-Jan Mean calls per household 400 Day Mean number of calls per household • • Mean call duration (s) BT “Home OnLine Panel” data (423 households) - slightly more likely to be ‘alone over 55’ and living in a semi-detached house/bungalow January - April 2000
  • 30. Case study 2: Patterns of communication Duration of calls… • Mean call duration (s) 1.0 Autocorrelation analysis: • Every 7th day is similar • • .5 Every 14th day is similar • Partial ACF 0.0 -.5 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 -1.0 Confidence Limits Coefficient Lag Number BT “Home Online Panel” data – 1999/2000 excerpt • weekly autocorrelation fortnightly autocorrelation even after allowing for weekly autocorrelation Humans are creatures of (localised) habit!
  • 31. Case study 2: Patterns of communication • Biological networks: • Many nodes with few connections & few nodes with many connections: y = a/x or y = a/x2 (log-linear or ‘scalefree’) • Human calling networks: • Same principle • McCarty et al (2001) Comparing two methods for estimating network size, Human Organization; Spring 2001; 60, 1; pg. 28 • Do we find the same thing?
  • 32. Case study 2: Patterns of communication • Outgoing calls 8 • X: log (number of calls) • Y: log (number of telephone numbers) 7 6 5 4 400 3 350 2 300 250 200 y = -0.9494x + 7.2 1 0 R2 = 0.9289 0 150 2 • Informally: 100 50 0 0 • • 1 100 200 300 400 500 600 700 800 900 BT “Home Online Panel” data – 1999/2000 excerpt Data for 1 ‘typical’ household 3 4 5 6 7 8 • 1/2 the numbers called are called once • Of the rest 1/3 are called twice • Of the remainder 1/4 are called three times • etc
  • 33. Case study 2: Patterns of communication • Outgoing calls and Incoming calls show roughly the same pattern • • BT “Home Online Panel” data – 1999/2000 excerpt Data for 1 ‘typical’ household
  • 34. Outline  What do we mean?  A 21st Century Sociology?  Case studies: • Do deprived areas have different telephone calling patterns? • Do households have similar calling patterns? • Can we usefully classify households' calling patterns?  Future directions
  • 35. Case study 3: Classifying households • Incoming • Outgoing • ‘Mirror’ curves for two households • • BT “Home Online Panel” data – 1999/2000 excerpt Data for 1 ‘typical’ household
  • 36. Case study 3: Classifying households • ‘Mirror’ curves for several households 4045 4.5 4 3.5 3 2.5 2 Series1 1.5 1 0.5 0 (4 • BT “Home Online Panel” data – 1999/2000 excerpt (3 (2 (1 0 1
  • 37. Case study 3: Classifying households • Can we use these curves to classify households? • Of the 391 who match to wave 1 survey data, 16 are ‘unusual shapes’ • E.g. ‘4045’ 4045 • • • • • • • • • • • Single person White male Aged 46 Lives alone, divorced Speaks to neighbours at least once a week 0 local friends 4 non local friends 2 local relatives 12 non-local relatives '4 BT “Home Online Panel” data – 1999/2000 excerpt Data for 1 ‘typical’ household '3 5 4 3 Series1 2 1 0 '2 '1 0 1
  • 38. Case study 3: Classifying households • Of the 16 ‘unusual shapes’ • Predominantly single adults • Alone > 55 • Alone < 55 • Lone parent with young child(ren) • Why? partner+kid>15 partner+kid>11 partner+kid<12 part,nokid,>55 part,nokid,<56 part,nokid,<36 lonepar,kid>15 lonepar,kid<16 un-other rel alone over 55 alone under 56 0 5 10 15 20 % of households "Normal" • • BT “Home Online Panel” data – 1999/2000 excerpt Data for 1 ‘typical’ household "Unusual" 25 30
  • 39. Outline  What do we mean?  A 21st Century Sociology?  Case studies: • Do deprived areas have different telephone calling patterns? • Do households have similar calling patterns? • Can we usefully classify households' calling patterns?  Future directions
  • 40. Future directions: • Analysis of call duration • Do we get the same shape? • How do we combine frequency & duration? • Can the curves predict anything about the households?
  • 41. Future directions: • Household composition & network structure? Household transition
  • 42. Future directions: • Network characteristics in places • Do people in different kinds of places have different kinds of (ego) networks • Does this change over time? • Can we represent a place by some combination of ego networks? • To what extent do ego networks overlap • And how do we deal with the sampling issue?
  • 43. Future directions: • Network characteristics in places • Do people in different kinds of places have different kinds of (ego) networks • Does this change over time? • Can we represent a place by some combination of ego networks? • To what extent do ego networks overlap • And how do we deal with the sampling issue?
  • 44. BUT: Data Issues • Business calls? • Some of the (longer) calls may be to ISPs • Filter? • Some things don’t match up: • 36 of the Home OnLine panel households monitored were not BT customers (or so they said in the survey!) • Of the 136,732 different callers, 17,947 had no recorded number • • • Of the 124,032 different callees 1,963 had no recorded number and 21,295 had a number 3 digits long (54% of these were calls to 192, 13% to ‘123’) There are many semi-duplicates (impossible calls) - internal BT data? And there are call type codes no-one understands any more! > When you didn’t design the data source… • • Provenance is crucial! Data forensics skills are critical!
  • 45. Thank you! • Dr Ben Anderson • benander@essex.ac.uk • http://cresi.essex.ac.uk/getperson?personID=1 • Dr Alexei Verntiski • asvern@essex.ac.uk • http://www.essex.ac.uk/maths/staff/profile.aspx?ID=1275 • BT funded feasibility project: • http://cresi.essex.ac.uk/getproject?projectID=46

Editor's Notes

  1. Tesco: Manchester - Sustainable Consumption Institute has daily update of Tesco’s club card data - detailed purchase logs of all club card users Nectar: direct and indirect collection
  2. In a way ‘Sociology’ gets cut out of the loop
  3. Industrial not as in the study of industry but as an industry in itself. Knowing Capitalism Software &amp; social stratification - will the ability to be much smarter about neighborhood data lead to an increase in local homogeneity as similar people cluster?
  4. Industrial not as in the study of industry but as an industry in itself. Knowing Capitalism Software &amp; social stratification - will the ability to be much smarter about neighborhood data lead to an increase in local homogeneity as similar people cluster?
  5. To introduce the data…
  6. To introduce the data…
  7. To introduce the data…
  8. To introduce the data…
  9. To introduce the data…
  10. To introduce the data…
  11. Inspiration… the idea is that diversity/heterogeneity in communication networks should enable greater economic opportunity through ‘weak ties’ (Granovetter/Burt etc). Over-localised networks seen as limited Eagle et al used a complex measure of entropy and structural holes -&gt; high diversity has negative correlation with deprivation
  12. To introduce the data…
  13. Same story dominance of local calls % local calls increases as deprivation increases
  14. Instead of aggregating to LSOA level, this is household level Same story dominance of local calls % local calls increases as deprivation increases
  15. Same story dominance of local calls % local calls increases as deprivation increases
  16. Same story dominance of local calls % local calls increases as deprivation increases
  17. % local calls increases as deprivation increases
  18. Same story dominance of local calls % local calls increases as deprivation increases
  19. Seems to work!