StatMine – prototypeStatMine
Exploring official statistics
Martijn Tennekes, Edwin de Jonge, Jan van der Laan & Jessica
Statistics Netherlands (CBS)
Visweek 2013
StatMine, statistical
goldmine
Edwin de Jonge (@edwindjonge)
Jan van der Laan, Jessica Solcer
Statistics Netherlands / CBS
Dutch Information Visualisation Event 2014, June 19, 2014
StatMine 0.2
2
Statistics Netherlands / CBS
- Creates and publishes official statistics on economics,
demographics, health care and others.
- Since 1899
- Website: www.cbs.nl
- Online DB: http://statline.cbs.nl (since 1997)
Why StatMine?
– Online StatLine contains more than one billion (109)
facts
‐ Policy makers
‐ Journalists
‐ Citizens
‐ Enterprises
‐ Economists
‐ Social scientist
‐ Historicians
‐ etc
StatMine 0.2
3
StatMine
4
Problem 1
Numbers ≠ Information
1. Numbers ≠ Information
We know from a user study that:
1. Many interesting patterns in StatLine are not spotted by
users
2. Many important topics in StatLine are scattered across
multiple tables
StatMine 0.2
5
StatMine
6
H1:
Data
analysis
=
Data
insight
H1. Data insight
Goal of StatMine 0.1 was to provide more
insight StatLine numbers by
• Presenting these facts visually and
interactively
• We tested this succesfully on 4 “difficult
StatLine tables.
StatMine 0.2
7
StatMine 0.2
8
Bar chart
- compare
Line chart
- development
Bubble/scatter chart
- correlation
Mosaic chart
- structure
an exploration
of
dissemination
data: StatMine
9
Chart type – bar chart
StatMine 0.2
10
Small multiples?
StatMine 0.2
11
Demo
an exploration of dissemination data: StatMine 12
StatMine 0.1 Results
Tested on 25 users:
Findings:
- Test persons think that visualizing data adds
value (small multiples)
- Data owners look at their data differently
- They want this tool to check their data before
publication.
StatMine 0.2
13
StatMine
14
Problem 2:
Fragmented Information
2. Fragmented information
Most information in StatLine is fragmented:
‐ Energy consumption wrt economic growth
‐ Perceived public safety wrt registered crime
– Users currently need to look into multiple tables and
combine the information by hand. Gebruiker moet in
meerdere tabellen kijken en informatie zelf combineren
StatMine 0.2
15
StatMine
16
2. Merge data!
H2. Table joining
Goal StatMine 0.2: create more insight by:
- Letting users combine tables
- Condition: share at least one column/data
dimension.
- Tested on small set of tables.
StatMine 0.2
17
StatMine 0.2 Results
Test persons: 20 internal, 40 external (policy
makers, journalists).
Findings:
- External users enthousiast about visual
possibilities StatMine
- Joining of data fills a user need.
StatMine 0.2
18
StatMine
19
Problem 3
Statistical numbers are
uncertain
H3. Confidence intervals
– Al facts Statistics Netherlands have confidence interval
– European Statistics Code of Practice (12.2):
‐ “sampling and non sampling errors should be
systematically documented”
Goal StatMine 0.3:
Investigate how uncertainty in numbers can be presented
understandable to users.
StatMine
20
Restricted to:
‐ How do users interpret CI’s? And what does that affect
the interpretation of facts?
‐ Do users need CI’s?
Assumption:
‐ For test data set of point estimate with CI available
StatMine 0.3
StatMine 0.2
21
User test (100+) with synthetic data shows that:
‐ CI’s improve validity of user statements (they are
more correct)
User test CI’s
StatMine 0.2
22
StatMine 0.3
– Prototype StatMine 0.3:
‐ Show uncertainty in Line Charts
‐ Bar Charts
‐ Tested on 25 test persons.
23
Line charts with uncertainty
24
Bar charts with uncertainty
25
StatMine 0.4
–Build on CBS open data API
–Will be public
–Currently in beta test, ETA (2014 Q3)
26
Questions?
27

StatMine

  • 1.
    StatMine – prototypeStatMine Exploringofficial statistics Martijn Tennekes, Edwin de Jonge, Jan van der Laan & Jessica Statistics Netherlands (CBS) Visweek 2013 StatMine, statistical goldmine Edwin de Jonge (@edwindjonge) Jan van der Laan, Jessica Solcer Statistics Netherlands / CBS Dutch Information Visualisation Event 2014, June 19, 2014
  • 2.
    StatMine 0.2 2 Statistics Netherlands/ CBS - Creates and publishes official statistics on economics, demographics, health care and others. - Since 1899 - Website: www.cbs.nl - Online DB: http://statline.cbs.nl (since 1997)
  • 3.
    Why StatMine? – OnlineStatLine contains more than one billion (109) facts ‐ Policy makers ‐ Journalists ‐ Citizens ‐ Enterprises ‐ Economists ‐ Social scientist ‐ Historicians ‐ etc StatMine 0.2 3
  • 4.
  • 5.
    1. Numbers ≠Information We know from a user study that: 1. Many interesting patterns in StatLine are not spotted by users 2. Many important topics in StatLine are scattered across multiple tables StatMine 0.2 5
  • 6.
  • 7.
    H1. Data insight Goalof StatMine 0.1 was to provide more insight StatLine numbers by • Presenting these facts visually and interactively • We tested this succesfully on 4 “difficult StatLine tables. StatMine 0.2 7
  • 8.
    StatMine 0.2 8 Bar chart -compare Line chart - development Bubble/scatter chart - correlation Mosaic chart - structure
  • 9.
  • 10.
  • 11.
  • 12.
    Demo an exploration ofdissemination data: StatMine 12
  • 13.
    StatMine 0.1 Results Testedon 25 users: Findings: - Test persons think that visualizing data adds value (small multiples) - Data owners look at their data differently - They want this tool to check their data before publication. StatMine 0.2 13
  • 14.
  • 15.
    2. Fragmented information Mostinformation in StatLine is fragmented: ‐ Energy consumption wrt economic growth ‐ Perceived public safety wrt registered crime – Users currently need to look into multiple tables and combine the information by hand. Gebruiker moet in meerdere tabellen kijken en informatie zelf combineren StatMine 0.2 15
  • 16.
  • 17.
    H2. Table joining GoalStatMine 0.2: create more insight by: - Letting users combine tables - Condition: share at least one column/data dimension. - Tested on small set of tables. StatMine 0.2 17
  • 18.
    StatMine 0.2 Results Testpersons: 20 internal, 40 external (policy makers, journalists). Findings: - External users enthousiast about visual possibilities StatMine - Joining of data fills a user need. StatMine 0.2 18
  • 19.
  • 20.
    H3. Confidence intervals –Al facts Statistics Netherlands have confidence interval – European Statistics Code of Practice (12.2): ‐ “sampling and non sampling errors should be systematically documented” Goal StatMine 0.3: Investigate how uncertainty in numbers can be presented understandable to users. StatMine 20
  • 21.
    Restricted to: ‐ Howdo users interpret CI’s? And what does that affect the interpretation of facts? ‐ Do users need CI’s? Assumption: ‐ For test data set of point estimate with CI available StatMine 0.3 StatMine 0.2 21
  • 22.
    User test (100+)with synthetic data shows that: ‐ CI’s improve validity of user statements (they are more correct) User test CI’s StatMine 0.2 22
  • 23.
    StatMine 0.3 – PrototypeStatMine 0.3: ‐ Show uncertainty in Line Charts ‐ Bar Charts ‐ Tested on 25 test persons. 23
  • 24.
    Line charts withuncertainty 24
  • 25.
    Bar charts withuncertainty 25
  • 26.
    StatMine 0.4 –Build onCBS open data API –Will be public –Currently in beta test, ETA (2014 Q3) 26
  • 27.