• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
The sociological value of transactional data?
 

The sociological value of transactional data?

on

  • 1,576 views

In this talk we will outline some of the potential sociological research value in transactional data and present for discussion some preliminary analysis of both internet usage log data from several ...

In this talk we will outline some of the potential sociological research value in transactional data and present for discussion some preliminary analysis of both internet usage log data from several Brazilian institutions and from UK households. We will also present some exploratory analysis of a large corpus of UK telephone call records from the late 1990s. In these cases we will offer few conclusions but rather hope to generate discussion of their potential value in sociological research and of the ethical dilemmas that surround their collection and (re)use.

Statistics

Views

Total Views
1,576
Views on SlideShare
1,574
Embed Views
2

Actions

Likes
0
Downloads
8
Comments
0

2 Embeds 2

http://cresi.wordpress.com 1
http://www.slideshare.net 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Industrial not as in the study of industry but as an industry in itself. Knowing Capitalism
  • To introduce the data…
  • i.e. why would anyone bother trying to identify people? Lots of effort for what return?
  • We only have between 5% and 25% of households in each LSOA And the calls include those made to businesses etc etc (need to remove hubs) Eagle, Claxton et al -> similar work using UK call records from 2005, submitted to Science(?)
  • Panel households knowing each other is not surprising – clustered sample so neighbours. Also discovered people apparently calling themselves = the way particular calling packages were billed (e.g. call-cards for children/students to call home free – goes on home bill)!
  • Matched to some extent by the time-use data form the time-use survey on same sample
  • Longitudinal social network analysis…
  • Contributes to triangulation – greater ‘transparency’ – no lies, no forgetfulness, no social construction EXCEPT in the coding/inerpretation of the data during analysis (“are they doing what we think they are doing?”) sociology, anthropology, administration, public policies, psychology, economics, public health... Maybe the most real picture of 21 st Century society?

The sociological value of transactional data? The sociological value of transactional data? Presentation Transcript

  • The sociological value of transactional data? Dr Arnaldo Barreto Dr Ben Anderson
  • Contents
    • What do we mean?
    • Why does it matter?
    • Examples: Case studies
        • Telephone call records
        • Internet usage logs
    • Discussion
  • What do we mean?
    • Transactional data
        • Generated by everyday life
        • Automatically captured as part of ‘business as usual’
        • N = millions (billions)
    • Savage & Burrows (2007) “ The Coming Crisis of Empirical Sociology”
  • Where does (some of) it go?
    • “ PayCheck profiles all 1.6 million postcodes in the UK using information on over 4 million households from lifestyle surveys and Census and Market Research data. It is available as a mean, median and mode figure for each postcode or as a PayCheck type.
    • PayCheck can be used for:
    • Selecting names and addresses from the Ocean
    • Coding up customer records for profiling or campaign selections
    • Profiling to understand how your customer group compares to the rest of the UK population”
    • “ Mosaic UK classifies consumers by household or postcode , allowing you to optimise the use of the segmentation depending upon the application.
    • 46% of the data used to build Mosaic is non-Census sourced information that is updated annually. This enables Mosaic to monitor changes in consumer behaviour and incorporate these each year within the classification.
    • Mosaic UK is validated by a comprehensive programme of fieldwork and observational research covering each of the UK's 120 postal areas.”
    Dissagregation and identification is the point!
  • Why does it matter?
    • Savage & Burrows see a ‘crisis’
        • What is sociology’s ‘edge’? [Differentiator/USP/Claim]
        • What is sociology’s (empirical) role?
        • Towards a ‘politics of method’?
    • But we see opportunities
        • A 21 st Century Sociology
            • Empirical resources with which to ask new questions
        • A Practical Sociology
            • Underpinning a 21 st Century Analytics Industry
  • A 21 st Century Sociology?
    • Re-assessing old questions
        • Networks, place, space and social capital?
        • Consumption, leisure and class?
        • Public performance of self?
    • Imagining new questions?
        • Software & social stratification?
        • ?
    • What might our students need to know?
        • Data provenance & politics?
        • Imputation, clustering, visualisation,…?
        • An Industrial Sociology ?
  • Contents
    • What do we mean?
    • Why does it matter?
    • Examples: Case studies
        • Telephone call records
        • Internet usage logs
    • Discussion
  • Case studies
    • BT 100,000 data
        • sample of 103,113 households covering all of the UK in 1995
        • All outgoing billable calls recorded for the months of October 1995, March 1996, October 1996, March 1997, October 1997, March 1998
        • Linked to Customer data – postcode, billing flags, ACORN code
    • BT/Essex Home OnLine household panel data
        • representative sample of 1000 households covering all of GB in 1998
        • 3 wave household panel survey (1998/1999/2000)
        • All incoming & outgoing billable calls recorded for 423 of the 1000 households who i) were BT customers and 2) gave consent for linkage to survey
    • Brazilian Internet log data
        • Logs of traffic to/from six large organisations
        • Government/private/public education
        • Collected by internet service provider
  • Ethics/Law and Data BT 100,000 data BT/Essex Home OnLine Panel Data Brazil Internet Data Consent to collect? As part of commercial service provision – monitoring/research? Yes (but not the third party!) As part of commercial service provision – monitoring/research? Consent to link? Consent for future research? And by whom? Risk of ‘disclosure’
  • Ethics/Law and Data BT 100,000 data BT/Essex Home OnLine Panel Data Brazil Internet Data Consent to collect? As part of commercial service provision – monitoring/research? Yes (but not the third party!) As part of commercial service provision – monitoring/research? Consent to link? As above (to customer data – postcode, billing flags etc) but no ‘sensitive personal data’ Yes (to survey data) – contains ‘sensitive personal data’ N/A Consent for future research? And by whom? Risk of ‘disclosure’
  • Ethics/Law and Data BT 100,000 data BT/Essex Home OnLine Panel Data Brazil Internet Data Consent to collect? As part of commercial service provision – monitoring/research? Yes (but not the third party!) As part of commercial service provision – monitoring/research? Consent to link? As above (to customer data – postcode, billing flags etc) but no ‘sensitive personal data’ Yes (to survey data) – contains ‘sensitive personal data’ N/A Consent for future research? And by whom? ? Yes ? Risk of ‘disclosure’
  • Ethics/Law and Data But big effort for what return? BT 100,000 data BT/Essex Home OnLine Panel Data Brazil Internet Data Consent to collect? As part of commercial service provision – monitoring/research? Yes (but not the third party!) As part of commercial service provision – monitoring/research? Consent to link? As above (to customer data – postcode, billing flags etc) but no ‘sensitive personal data’ Yes (to survey data) – contains ‘sensitive personal data’ N/A Consent for future research? And by whom? ? Yes ? Risk of ‘disclosure’ Medium – in principle could locate postcode and ‘ask around’ but it’s 10 years old and no ‘sensitive personal data’ Medium – could search on phone number & locate using date of birth/household characteristics? -> security applied Low – in principle could link to specific PC in organisation and so to a user but requires internal organisation data
  • Contents
    • What do we mean?
    • Why does it matter?
    • Examples: Case studies
        • Telephone call records
        • Internet usage logs
    • Discussion
    • BT “100,000” households data
    Analysing social communication
    • BT “100,000” data – sample of 103,113 households covering all of the UK in 1995
    • Non-random selection – all households in the first digital exchanges
    • All outgoing billable calls recorded for the months of October 1995, March 1996, October 1996, March 1997, October 1997, March 1998
    • Each month = c. 8 million calls
  • Analysing social communication
    • BT “100,000” data – sample of 103,113 households covering all of the UK in 1995
    • Aggregated to LSOA level
    • Linked to:
      • Indices of deprivation
      • Urban/rural
      • Census data
      • etc
    • Monthly costs (total)
    • Duration (mean)
    • Cost per call (mean)
    Analysing social resources
    • BT “100,000” data – sample of 103,113 households covering all of the UK in 1995
    • October 1995 call data (England only)
    • English IMD 2004 Income Domain Score (based on benefit counts from 2001)
    • Local calls
    • National calls
    Analysing social resources
    • BT “100,000” data – sample of 103,113 households covering all of the UK in 1995
    • October 1995 call data (England only)
    • English IMD 2004 Index of Multiple Deprivation (based on a range of data from 2001)
    But there are data problems... Call type % of calls Local 77.8 National 8.4 Regional 4.9 Mobiles? 2.9 International 1.0
  • Analysing temporal communication
    • BT “100,000” data – sample of 103,113 households covering all of the UK in 1995
    • October 1995 call data (7,935,195 calls)
    • Why?
        • Habits & rhythms of life…
        • What’s all this about?
  • Analysing temporal communication
    • BT “100,000” data – sample of 103,113 households covering all of the UK in 1995
    • October 1995 data (7,935,195 calls)
    But this is still aggregated data!
    • Localised creatures of habit!
    • BT “Home OnLine Panel” households data
    Analysing temporal communication
    • BT “Home Online Panel” data – representative sample of 1000 households covering all of GB in 1998
    • 3 wave panel survey (1998/1999/2000)
    • All incoming & outgoing billable calls recorded for 423 households who i) were BT customers and 2) gave consent for linkage to survey
    • These were similar to non-logged households except for an over-representation of “alone over 55s” who appeared more likely to give consent
    • Sundays:
      • Longer calls
      • Fewer calls
    • But what’s this?
      • A data blip!
    • Duration of calls…
    Analysing temporal communication
    • BT “Home Online Panel” data – 1999/2000 excerpt
    • Autocorrelation analysis:
      • Every 7 th day is similar
        • weekly autocorrelation
      • Every 14 th day is similar
        • fortnightly autocorrelation even after allowing for weekly autocorrelation
    • Humans are creatures of habit!
    • Number of calls…
    Analysing temporal communication
    • BT “Home Online Panel” data – 1999/2000 excerpt
    • Autocorrelation analysis:
      • Today is like yesterday
        • Daily autocorrelation
      • Every 21 st day is similar
      • Every 29 th day is similar
    • Humans are creatures of habit!
    • Number of emails sent…
    Analysing temporal communication
    • BT “Home Online Panel” data – internet logs of 16 households
    • Plus logs for 64 households on an experimental BT internet trial
    • Autocorrelation analysis:
      • Today is like yesterday
        • Daily autocorrelation
      • Every 6 th day is similar (huh?)
      • Every 7 th day is similar
  • Visualising social interactions
    • BT “Home Online Panel” data – 1999/2000 excerpt
    • All called numbers
    • All calls
    • Clusters of calls to the same number
    • Some of our panel households know each other!
  • What else could we ask?
    • Social network (calling) structures <-> household composition?
  • What else could we ask?
    • Social network (calling) structures <-> household composition?
  • What else could we ask?
    • Social/economic/cultural capitals <-> calling behaviour?
    • household composition & network structure?
    Household transitions?
  • Contents
    • What do we mean?
    • Why does it matter?
    • Examples: Case studies
        • Telephone call records
        • Internet usage logs
    • Discussion
  • Why discuss Internet use? Source: comScore World Metrix 2007 Country Internet Penetration Monthly Unique Users ('000) Average Daily Users ('000) Average Usage Days per User per Month Average Monthly hours per User Average Monthly Pages per User Austria 53% 3,721 1,485 12 16.3 1,906 Belgium 54% 4,728 2,447 15.5 20.6 2,399 Denmark 68% 3,045 1,493 14.7 22 3,058 Finland 65% 2,818 1,544 16.4 29.7 3,749 France 51% 25,388 14,531 17.2 26.1 2,768 Germany 46% 32,578 18,359 16.9 22.6 2,807 Ireland 42% 1,365 591 13 18.9 1,871 Italy 36% 18,086 7,783 12.9 17.7 1,862 Netherlands 83% 11,292 7,35 19.5 27 3,131 Norway 70% 2,62 1,288 14.7 27.4 3,08 Portugal 44% 3,882 1,731 13.4 23.3 2,454 Russia 11% 13,255 5,048 11.4 13.3 1,695 Spain 39% 13,628 8,828 19.4 30.6 2,675 Sweden 70% 5,259 2,895 16.5 31.7 4,019 Switzerland 58% 3,666 1,846 15.1 22.7 2,676 UK 62% 31,15 21,767 21 34.4 3,44 Europe* 40% 221,463 121,774 16.5 24.1 2,662 US* 66% 156,697 114,472 21.9 31.4 2,826
    • Internet Access 70% of households had access in 2009 (Sources: National Statistics Omnibus Survey)
        • Households with access to the Internet, UK 18.3 million households in the UK (70 per cent) had Internet access in 2009. This is an increase of just under 2 million households (11 per cent) over the last year and 4 million households (28 per cent) since 2006. UK estimates are not available prior to 2006.
    Internet numbers
    • E-commerce
        • This is only the beginning!
        • Source: UK Expenditure & Food Survey 2002-2007
          • Expenditure – all expenditure coded as (total) and witin that (online)
          • Ordering – list of 17 items commonly ordered online
    Internet numbers
    • Brazilian Case:
        • We have some services only by Internet... (e.g. Income Tax Return)
    • => The Digital Society is a reality
    Internet numbers
    • &quot;Today we have an interim report from Lord Carter setting out the scale of our ambition to compete in the digital economy and that's a market worth about £50bn a year ,&quot; Gordon Brown, January 2009
    • http://news.bbc.co.uk/1/hi/technology/7858183.stm ”
  • But what are people doing online!?
    • By Interviews
      • questionnaires (bureau of census and other researches of households / enterprises / schools);
    • By automatic collection in your PC
      • with your consent
      • programms in toolbar
    • By automatic collection in your Internet Provider
      • without your consent (usually).
  • Some problems with questionnaires
    • ´Confessional´ methods and lies...
        • The idea of productivity
          • number of hours in front of TV / PC for entertainment...?
        • The problem of social acceptance
          • nazi, pornography, pedophilia, violence,...
        • The idea of non-pleasure
          • “ I am almost always studying, working...never for fun...”
    • Well known in survey research
        • But often ‘ignored’
  • Automatic Collection with consent
    • The idea of BIG BROTHER
      • I know there is always somebody looking... So I will do only ‘the right’ things...
      • Hawthorn effect
    • Panoptic of Bentham/Foucault...
    • People use a ‘shared’ account
      • Hard to tell who did what in a multi-person household (Kraut et al, 2002)
    Prison building at Presidio Modelo, Isla De la Juventud, Cuba (2005) http://en.wikipedia.org/wiki/File:Presidio-modelo2.JPG
  • For example
    • Nielsen NetRatings
    Top 10 Global Web Parent Companies, Home & Work December 2009 Source: Nielsen NetView RANK PARENT UNIQUE AUDIENCE (000) ACTIVE REACH % TIME PER PERSON (HH:MM:SS) 1 GOOGLE 353,851 83.91 2:38:50 2 MICROSOFT 315,490 74.81 3:01:38 3 YAHOO! 228,711 54.23 2:12:36 4 FACEBOOK 206,878 49.06 5:57:17 5 EBAY 163,844 38.85 1:41:31 6 WIKIMEDIA FOUNDATION 141,239 33.49 0:16:01 7 AMAZON 137,364 32.57 0:32:11 8 AOL LLC 129,360 30.67 2:21:03 9 NEWS CORP. ONLINE 120,316 28.53 0:59:17 10 INTERACTIVECORP 115,131 27.30 0:11:36
  • Automatic Collection without Consent (Internet Provider mode)
    • How do Police find cybercrimes?
    • Concepts of traceability... Lessig.
    • Yes, we have a Big Brother...
    Home/Workplace router Internet Service Provider (ISP) router Internet email / web server etc Internet email / web server etc Internet email / web server etc Browser log Router log ISP user log Web server log
  • ‘ Better’ Social Science?
    • Poor Social Science
        • Technologies have ‘impact’
        • Activities get ‘displaced’
        • Internet Good vs Internet Bad
    • Better Social Science
        • These technologies are altering patterns of self-service consumption
        • Proper question:
            • How are they being integrated into everyday activities?
        • “ We will discover what the actual chains of provision are only by asking, and seeing, what people are actually using the net for, how it relates to other aspects of peoples’ lives – by observing how the technology is embodied in the chains of provision for the various final services we consume”. Gershuny (2003)
    Behaviour Technology
  • Just one example in Brazil... Research about e-Gov with ISP authorization
  • Just one example in Brazil... Research about e-Gov with ISP authorization
  • Facts:
    • 1º - The data has always been collected by Internet Providers
        • Used in Police cases and internal statistic/research;
    • 2º - Much personal data exists such as medical, police and census records
        • they use it within ethical terms;
    • 3º - Banks and Credit Cards use personal data for their statistics
        • but nobody is identified (anonymised & aggregated);
    • 4º - People don´t like ´confessional´ methods
        • usually offer non-real information about their preferences and behaviour.
  • What is the real contribution of this?
    • What are we doing with 34.4 hours in front of the PC?
    Data Triangulation Observational and qualitative methods Survey methods Tracking, tracing and logging methods
  • Thank you
    • [email_address]
    • [email_address]
    • http://cresi.essex.ac.uk
    • All mentioned datasets are available for further research…!
    • Savage & Burrows: http://soc.sagepub.com/cgi/content/abstract/41/5/885 (Sociology 41 no 5)
    • Radical data: http://www.ccsr.ac.uk/methods/events/RadicalData/Programme.htm
    • Home OnLine (no call records): http://www.data-archive.ac.uk/findingdata/snDescription.asp?sn=4607