SlideShare a Scribd company logo
StatMine – prototype
0.2
Edwin de Jonge, Jan van der Laan & Jessica Solcer
Statistics Netherlands (CBS)
NTTS 2013, March 6 2013
StatMine
Goal: Improve use figures Statistics Netherlands
How: Add Analysis layer to OutputDB (StatLine)
Working approach:
•
•
•
•

Formulate improvement
Develop software prototype
Test prototype on (real) users
Evaluate

But why?
StatMine

2
Mission SN

“The mission of Statistics Netherlands is to publish
reliable and coherent statistical information that
meets the needs of society” (source: www.cbs.nl)

StatMine 0.2

3
Mission SN

“The mission of Statistics Netherlands is to publish
reliable and coherent statistical information that
meets the needs of society” (source: www.cbs.nl)

StatMine 0.2

4
Evidence-based
policy

5
What is the state of the Netherlands?

StatLine contains over
1.000.000.000 figures!

StatMine

6
Problem 1
Figures ≠ Information

StatMine

7
1. Figures ≠ Information
We know (from user study):
• Some important user don’t get the most out of
StatLine:
• Data journalists
• Policy makers

• They don’t find and see interesting
information, because of tabular presention (data =
table)

StatMine 0.2

8
Solution 1
Visualize
data!

StatMine

9
Problem 2.
Fragmented information

StatMine

10
2. Fragmented information
For policy makers and journalist most information in
OutputDB is fragmented:
• Users need to combine fragments from different
statistics
• Diabetes (insuline usage, hospital admissions,
mortality, visits to doctor, obesity)
• Energy consumption vs economic growth
• Income vs economic growth
• (Perceived) public safety vs registered crimes
StatMine 0.2

11
2. Solution:
Let users
combine
tables

(even if we
wouldn’t …)

StatMine

12
Prototype StatMine 0.2
Implements:
• Visual interactive data browsing
• Combining fragments of different tables

Tested on:
• 40 SN employees (++)
• 40 policy makers (++)

StatMine 0.2

13
Line chart

Bar chart

- Show development

- Compare

Bubble/scatter chart

Mosaic chart

- Show correlation

- Show structure

StatMine 0.2

14
Small multiples

StatMine 0.2

15
StatMine

16
Technical
HTML5
JSON

R

JavaScript

CSS
SVG

• Runs on desktop
• makkelijk over te zetten naar webserver

StatMine 0.2

17
Currently (2013)
• All Official Statistics have confidence interval.
• StatMine 0.3 will test if showing uncertainty
improves/changes understanding of (quality of)
figures.
• May lead to publishing interval estimates (in stead
of point estimates).

StatMine

18
Conclusion
• Visual data browsing is promising for
• Our own statisticians (quality control)
• External policy makers and journalists

• Using real end users for testing is very helpful:
• Lots of suggestions for improvement from users
• Users feel involved in innovation process of NSI

StatMine

19

More Related Content

What's hot

Carrying out analysis
Carrying out analysisCarrying out analysis
Carrying out analysis
NurFathihaTahiatSeeu
 
When Big Data and Predictive Analytics Collide: Visual Magic Happens
When Big Data and Predictive Analytics Collide: Visual Magic HappensWhen Big Data and Predictive Analytics Collide: Visual Magic Happens
When Big Data and Predictive Analytics Collide: Visual Magic Happens
Infini Graph
 
Scary Reporting Projects. Fighting the Data Demon.
Scary Reporting Projects. Fighting the Data Demon. Scary Reporting Projects. Fighting the Data Demon.
Scary Reporting Projects. Fighting the Data Demon.
LiveStories
 
Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...
Big Data Spain
 
The true meaning of data by Maciej Dabrowski
The true meaning of data by Maciej Dabrowski   The true meaning of data by Maciej Dabrowski
The true meaning of data by Maciej Dabrowski
Altocloud
 
The true meaning of data
The true meaning of dataThe true meaning of data
The true meaning of data
mdabrowski
 
Statistics vs machine learning: which is more powerful
Statistics vs machine learning: which is more powerfulStatistics vs machine learning: which is more powerful
Statistics vs machine learning: which is more powerful
Stat Analytica
 
What's new with analytics in academia?
What's new with analytics in academia?What's new with analytics in academia?
What's new with analytics in academia?
InfoTrust LLC
 

What's hot (8)

Carrying out analysis
Carrying out analysisCarrying out analysis
Carrying out analysis
 
When Big Data and Predictive Analytics Collide: Visual Magic Happens
When Big Data and Predictive Analytics Collide: Visual Magic HappensWhen Big Data and Predictive Analytics Collide: Visual Magic Happens
When Big Data and Predictive Analytics Collide: Visual Magic Happens
 
Scary Reporting Projects. Fighting the Data Demon.
Scary Reporting Projects. Fighting the Data Demon. Scary Reporting Projects. Fighting the Data Demon.
Scary Reporting Projects. Fighting the Data Demon.
 
Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...Case of success: Visualization as an example for exercising democratic transp...
Case of success: Visualization as an example for exercising democratic transp...
 
The true meaning of data by Maciej Dabrowski
The true meaning of data by Maciej Dabrowski   The true meaning of data by Maciej Dabrowski
The true meaning of data by Maciej Dabrowski
 
The true meaning of data
The true meaning of dataThe true meaning of data
The true meaning of data
 
Statistics vs machine learning: which is more powerful
Statistics vs machine learning: which is more powerfulStatistics vs machine learning: which is more powerful
Statistics vs machine learning: which is more powerful
 
What's new with analytics in academia?
What's new with analytics in academia?What's new with analytics in academia?
What's new with analytics in academia?
 

Viewers also liked

Grieco - input2012
Grieco -  input2012Grieco -  input2012
Grieco - input2012
INPUT 2012
 
Advance statistics 2
Advance statistics 2Advance statistics 2
Advance statistics 2
Tim Arroyo
 
Using Technology To Achieve Total Worker Health
Using Technology To Achieve Total Worker HealthUsing Technology To Achieve Total Worker Health
Using Technology To Achieve Total Worker Health
Medgate Inc.
 
Social Media Statistics - 2010 update
Social Media Statistics - 2010 updateSocial Media Statistics - 2010 update
Social Media Statistics - 2010 update
Social Media MC
 
Bus and coach
Bus and coachBus and coach
Bus and coach
Eran Perzelan
 
Alarming Social Media Statistics for Real Estate Professionals
Alarming Social Media Statistics for Real Estate ProfessionalsAlarming Social Media Statistics for Real Estate Professionals
Alarming Social Media Statistics for Real Estate Professionals
Doug Devitre
 
Technology and open knowledge in sports statistics
Technology and open knowledge in sports statisticsTechnology and open knowledge in sports statistics
Technology and open knowledge in sports statistics
dwiederman
 
22538598 introduction-to-research-methodology-acccording-to-jntu-hyd-mba-syll...
22538598 introduction-to-research-methodology-acccording-to-jntu-hyd-mba-syll...22538598 introduction-to-research-methodology-acccording-to-jntu-hyd-mba-syll...
22538598 introduction-to-research-methodology-acccording-to-jntu-hyd-mba-syll...
lavanya758
 
PPT for report-Cambodai
PPT for report-CambodaiPPT for report-Cambodai
PPT for report-Cambodai
jayan_sri
 
Ta2.09 5 bersales.philippines digitization ppp bersales jan 14 2017
Ta2.09 5 bersales.philippines digitization ppp bersales jan 14 2017Ta2.09 5 bersales.philippines digitization ppp bersales jan 14 2017
Ta2.09 5 bersales.philippines digitization ppp bersales jan 14 2017
Statistics South Africa
 
Best Computer Jobs for the Future | High Pay & Fast Growth
Best Computer Jobs for the Future | High Pay & Fast GrowthBest Computer Jobs for the Future | High Pay & Fast Growth
Best Computer Jobs for the Future | High Pay & Fast Growth
ITCareerFinder
 
Turning Numbers into Knowledge: A Statistics Dashboard
Turning Numbers into Knowledge: A Statistics DashboardTurning Numbers into Knowledge: A Statistics Dashboard
Turning Numbers into Knowledge: A Statistics Dashboard
WiLS
 
Chapter 01
Chapter 01Chapter 01
Chapter 01
Jemmy Rakinaung
 
Teaching High School Statistics and use of Technology
Teaching High School Statistics and use of TechnologyTeaching High School Statistics and use of Technology
Teaching High School Statistics and use of Technology
simoninamerica
 
Marketing Music Education: Recent facts, quotes and statistics that YOU can u...
Marketing Music Education: Recent facts, quotes and statistics that YOU can u...Marketing Music Education: Recent facts, quotes and statistics that YOU can u...
Marketing Music Education: Recent facts, quotes and statistics that YOU can u...
Kathleen Heuer
 
Using assessment data
Using assessment dataUsing assessment data
Using assessment data
fcaristo
 
Maddaloni, daniela, descriptive statistics
Maddaloni, daniela, descriptive statisticsMaddaloni, daniela, descriptive statistics
Maddaloni, daniela, descriptive statistics
dvmaddaloni
 
WSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in StatisticsWSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in Statistics
Christian Robert
 
Introduction to Twitter in Higher Education workshop for SIGMA 2014
Introduction to Twitter in Higher Education workshop  for SIGMA 2014Introduction to Twitter in Higher Education workshop  for SIGMA 2014
Introduction to Twitter in Higher Education workshop for SIGMA 2014
Alex Spiers
 
Advance Statistics - Wilcoxon Signed Rank Test
Advance Statistics - Wilcoxon Signed Rank TestAdvance Statistics - Wilcoxon Signed Rank Test
Advance Statistics - Wilcoxon Signed Rank Test
Joshua Batalla
 

Viewers also liked (20)

Grieco - input2012
Grieco -  input2012Grieco -  input2012
Grieco - input2012
 
Advance statistics 2
Advance statistics 2Advance statistics 2
Advance statistics 2
 
Using Technology To Achieve Total Worker Health
Using Technology To Achieve Total Worker HealthUsing Technology To Achieve Total Worker Health
Using Technology To Achieve Total Worker Health
 
Social Media Statistics - 2010 update
Social Media Statistics - 2010 updateSocial Media Statistics - 2010 update
Social Media Statistics - 2010 update
 
Bus and coach
Bus and coachBus and coach
Bus and coach
 
Alarming Social Media Statistics for Real Estate Professionals
Alarming Social Media Statistics for Real Estate ProfessionalsAlarming Social Media Statistics for Real Estate Professionals
Alarming Social Media Statistics for Real Estate Professionals
 
Technology and open knowledge in sports statistics
Technology and open knowledge in sports statisticsTechnology and open knowledge in sports statistics
Technology and open knowledge in sports statistics
 
22538598 introduction-to-research-methodology-acccording-to-jntu-hyd-mba-syll...
22538598 introduction-to-research-methodology-acccording-to-jntu-hyd-mba-syll...22538598 introduction-to-research-methodology-acccording-to-jntu-hyd-mba-syll...
22538598 introduction-to-research-methodology-acccording-to-jntu-hyd-mba-syll...
 
PPT for report-Cambodai
PPT for report-CambodaiPPT for report-Cambodai
PPT for report-Cambodai
 
Ta2.09 5 bersales.philippines digitization ppp bersales jan 14 2017
Ta2.09 5 bersales.philippines digitization ppp bersales jan 14 2017Ta2.09 5 bersales.philippines digitization ppp bersales jan 14 2017
Ta2.09 5 bersales.philippines digitization ppp bersales jan 14 2017
 
Best Computer Jobs for the Future | High Pay & Fast Growth
Best Computer Jobs for the Future | High Pay & Fast GrowthBest Computer Jobs for the Future | High Pay & Fast Growth
Best Computer Jobs for the Future | High Pay & Fast Growth
 
Turning Numbers into Knowledge: A Statistics Dashboard
Turning Numbers into Knowledge: A Statistics DashboardTurning Numbers into Knowledge: A Statistics Dashboard
Turning Numbers into Knowledge: A Statistics Dashboard
 
Chapter 01
Chapter 01Chapter 01
Chapter 01
 
Teaching High School Statistics and use of Technology
Teaching High School Statistics and use of TechnologyTeaching High School Statistics and use of Technology
Teaching High School Statistics and use of Technology
 
Marketing Music Education: Recent facts, quotes and statistics that YOU can u...
Marketing Music Education: Recent facts, quotes and statistics that YOU can u...Marketing Music Education: Recent facts, quotes and statistics that YOU can u...
Marketing Music Education: Recent facts, quotes and statistics that YOU can u...
 
Using assessment data
Using assessment dataUsing assessment data
Using assessment data
 
Maddaloni, daniela, descriptive statistics
Maddaloni, daniela, descriptive statisticsMaddaloni, daniela, descriptive statistics
Maddaloni, daniela, descriptive statistics
 
WSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in StatisticsWSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in Statistics
 
Introduction to Twitter in Higher Education workshop for SIGMA 2014
Introduction to Twitter in Higher Education workshop  for SIGMA 2014Introduction to Twitter in Higher Education workshop  for SIGMA 2014
Introduction to Twitter in Higher Education workshop for SIGMA 2014
 
Advance Statistics - Wilcoxon Signed Rank Test
Advance Statistics - Wilcoxon Signed Rank TestAdvance Statistics - Wilcoxon Signed Rank Test
Advance Statistics - Wilcoxon Signed Rank Test
 

Similar to StatMine (New Technologies and Techniques for Statistics)

StatMine
StatMineStatMine
StatMine
Edwin de Jonge
 
StatMine, visual exploration of output data
StatMine, visual exploration of output dataStatMine, visual exploration of output data
StatMine, visual exploration of output data
Edwin de Jonge
 
Views you can use: data visualization | LSC Technology Initiative Grant Confe...
Views you can use: data visualization | LSC Technology Initiative Grant Confe...Views you can use: data visualization | LSC Technology Initiative Grant Confe...
Views you can use: data visualization | LSC Technology Initiative Grant Confe...
Legal Services Corporation
 
SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...
SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...
SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...
BigData_Europe
 
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big data
webwinkelvakdag
 
Responsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics NetherlandsResponsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics Netherlands
Piet J.H. Daas
 
Statista Corporate Account Features
Statista Corporate Account FeaturesStatista Corporate Account Features
Statista Corporate Account Features
Statista
 
Unit-I_Big data life cycle.pptx, sources of Big Data
Unit-I_Big data life cycle.pptx, sources of Big DataUnit-I_Big data life cycle.pptx, sources of Big Data
Unit-I_Big data life cycle.pptx, sources of Big Data
RajendraKankrale1
 
Data Mining & Predictive Analytics - Lesson 14 - Concepts Recapitulation and ...
Data Mining & Predictive Analytics - Lesson 14 - Concepts Recapitulation and ...Data Mining & Predictive Analytics - Lesson 14 - Concepts Recapitulation and ...
Data Mining & Predictive Analytics - Lesson 14 - Concepts Recapitulation and ...
Michael Lew
 
big data analytics pgpmx2015
big data analytics pgpmx2015big data analytics pgpmx2015
big data analytics pgpmx2015
Sanmeet Dhokay
 
Equals Seed Funding Presentation
Equals Seed Funding PresentationEquals Seed Funding Presentation
Equals Seed Funding Presentation
DevLoadco
 
Economics & Statistics Insights in Data Science by DataPerts Technologies
Economics & Statistics Insights in Data Science by DataPerts TechnologiesEconomics & Statistics Insights in Data Science by DataPerts Technologies
Economics & Statistics Insights in Data Science by DataPerts Technologies
Ravindra Panwar
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A Lie
Sunil Ranka
 
Tableau 2018 - Introduction to Visual analytics
Tableau 2018 - Introduction to Visual analyticsTableau 2018 - Introduction to Visual analytics
Tableau 2018 - Introduction to Visual analytics
Arun K
 
Big Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation SlidesBig Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation Slides
SlideTeam
 
Scaling up your Analytics & Insights
Scaling up your Analytics & InsightsScaling up your Analytics & Insights
Scaling up your Analytics & Insights
LoQutus
 
Big Data Analytics for BI, BA and QA
Big Data Analytics for BI, BA and QABig Data Analytics for BI, BA and QA
Big Data Analytics for BI, BA and QA
Dmitry Tolpeko
 
Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the Marketspace
Bala Iyer
 
Introduction to einstein analytics
Introduction to einstein analyticsIntroduction to einstein analytics
Introduction to einstein analytics
Steven Hugo
 
Introduction to einstein analytics
Introduction to einstein analyticsIntroduction to einstein analytics
Introduction to einstein analytics
Jan Vandevelde
 

Similar to StatMine (New Technologies and Techniques for Statistics) (20)

StatMine
StatMineStatMine
StatMine
 
StatMine, visual exploration of output data
StatMine, visual exploration of output dataStatMine, visual exploration of output data
StatMine, visual exploration of output data
 
Views you can use: data visualization | LSC Technology Initiative Grant Confe...
Views you can use: data visualization | LSC Technology Initiative Grant Confe...Views you can use: data visualization | LSC Technology Initiative Grant Confe...
Views you can use: data visualization | LSC Technology Initiative Grant Confe...
 
SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...
SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...
SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...
 
WWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big dataWWV2015: Jibes Paul van der Hulst big data
WWV2015: Jibes Paul van der Hulst big data
 
Responsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics NetherlandsResponsible Data Science at Statistics Netherlands
Responsible Data Science at Statistics Netherlands
 
Statista Corporate Account Features
Statista Corporate Account FeaturesStatista Corporate Account Features
Statista Corporate Account Features
 
Unit-I_Big data life cycle.pptx, sources of Big Data
Unit-I_Big data life cycle.pptx, sources of Big DataUnit-I_Big data life cycle.pptx, sources of Big Data
Unit-I_Big data life cycle.pptx, sources of Big Data
 
Data Mining & Predictive Analytics - Lesson 14 - Concepts Recapitulation and ...
Data Mining & Predictive Analytics - Lesson 14 - Concepts Recapitulation and ...Data Mining & Predictive Analytics - Lesson 14 - Concepts Recapitulation and ...
Data Mining & Predictive Analytics - Lesson 14 - Concepts Recapitulation and ...
 
big data analytics pgpmx2015
big data analytics pgpmx2015big data analytics pgpmx2015
big data analytics pgpmx2015
 
Equals Seed Funding Presentation
Equals Seed Funding PresentationEquals Seed Funding Presentation
Equals Seed Funding Presentation
 
Economics & Statistics Insights in Data Science by DataPerts Technologies
Economics & Statistics Insights in Data Science by DataPerts TechnologiesEconomics & Statistics Insights in Data Science by DataPerts Technologies
Economics & Statistics Insights in Data Science by DataPerts Technologies
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A Lie
 
Tableau 2018 - Introduction to Visual analytics
Tableau 2018 - Introduction to Visual analyticsTableau 2018 - Introduction to Visual analytics
Tableau 2018 - Introduction to Visual analytics
 
Big Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation SlidesBig Data Tools PowerPoint Presentation Slides
Big Data Tools PowerPoint Presentation Slides
 
Scaling up your Analytics & Insights
Scaling up your Analytics & InsightsScaling up your Analytics & Insights
Scaling up your Analytics & Insights
 
Big Data Analytics for BI, BA and QA
Big Data Analytics for BI, BA and QABig Data Analytics for BI, BA and QA
Big Data Analytics for BI, BA and QA
 
Big Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the MarketspaceBig Data & Business Analytics: Understanding the Marketspace
Big Data & Business Analytics: Understanding the Marketspace
 
Introduction to einstein analytics
Introduction to einstein analyticsIntroduction to einstein analytics
Introduction to einstein analytics
 
Introduction to einstein analytics
Introduction to einstein analyticsIntroduction to einstein analytics
Introduction to einstein analytics
 

More from Edwin de Jonge

sdcSpatial user!2019
sdcSpatial user!2019sdcSpatial user!2019
sdcSpatial user!2019
Edwin de Jonge
 
Validatetools, resolve and simplify contradictive or data validation rules
Validatetools, resolve and simplify contradictive or data validation rulesValidatetools, resolve and simplify contradictive or data validation rules
Validatetools, resolve and simplify contradictive or data validation rules
Edwin de Jonge
 
Data error! But where?
Data error! But where?Data error! But where?
Data error! But where?
Edwin de Jonge
 
Daff: diff, patch and merge for data.frame
Daff: diff, patch and merge for data.frameDaff: diff, patch and merge for data.frame
Daff: diff, patch and merge for data.frame
Edwin de Jonge
 
Chunked, dplyr for large text files
Chunked, dplyr for large text filesChunked, dplyr for large text files
Chunked, dplyr for large text files
Edwin de Jonge
 
Heatmaps best practices Strata Hadoop
Heatmaps best practices Strata HadoopHeatmaps best practices Strata Hadoop
Heatmaps best practices Strata Hadoop
Edwin de Jonge
 
Docopt, beautiful command-line options for R, user2014
Docopt, beautiful command-line options for R,  user2014Docopt, beautiful command-line options for R,  user2014
Docopt, beautiful command-line options for R, user2014
Edwin de Jonge
 
Big data experiments
Big data experimentsBig data experiments
Big data experiments
Edwin de Jonge
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
Edwin de Jonge
 
ffbase, statistical functions for large datasets
ffbase, statistical functions for large datasetsffbase, statistical functions for large datasets
ffbase, statistical functions for large datasets
Edwin de Jonge
 
Tabplotd3, interactive inspection of large data
Tabplotd3, interactive inspection of large dataTabplotd3, interactive inspection of large data
Tabplotd3, interactive inspection of large data
Edwin de Jonge
 
Big data as a source for official statistics
Big data as a source for official statisticsBig data as a source for official statistics
Big data as a source for official statistics
Edwin de Jonge
 
Statmine, Visuele dataexploratie
Statmine, Visuele dataexploratieStatmine, Visuele dataexploratie
Statmine, Visuele dataexploratie
Edwin de Jonge
 

More from Edwin de Jonge (13)

sdcSpatial user!2019
sdcSpatial user!2019sdcSpatial user!2019
sdcSpatial user!2019
 
Validatetools, resolve and simplify contradictive or data validation rules
Validatetools, resolve and simplify contradictive or data validation rulesValidatetools, resolve and simplify contradictive or data validation rules
Validatetools, resolve and simplify contradictive or data validation rules
 
Data error! But where?
Data error! But where?Data error! But where?
Data error! But where?
 
Daff: diff, patch and merge for data.frame
Daff: diff, patch and merge for data.frameDaff: diff, patch and merge for data.frame
Daff: diff, patch and merge for data.frame
 
Chunked, dplyr for large text files
Chunked, dplyr for large text filesChunked, dplyr for large text files
Chunked, dplyr for large text files
 
Heatmaps best practices Strata Hadoop
Heatmaps best practices Strata HadoopHeatmaps best practices Strata Hadoop
Heatmaps best practices Strata Hadoop
 
Docopt, beautiful command-line options for R, user2014
Docopt, beautiful command-line options for R,  user2014Docopt, beautiful command-line options for R,  user2014
Docopt, beautiful command-line options for R, user2014
 
Big data experiments
Big data experimentsBig data experiments
Big data experiments
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
 
ffbase, statistical functions for large datasets
ffbase, statistical functions for large datasetsffbase, statistical functions for large datasets
ffbase, statistical functions for large datasets
 
Tabplotd3, interactive inspection of large data
Tabplotd3, interactive inspection of large dataTabplotd3, interactive inspection of large data
Tabplotd3, interactive inspection of large data
 
Big data as a source for official statistics
Big data as a source for official statisticsBig data as a source for official statistics
Big data as a source for official statistics
 
Statmine, Visuele dataexploratie
Statmine, Visuele dataexploratieStatmine, Visuele dataexploratie
Statmine, Visuele dataexploratie
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 

StatMine (New Technologies and Techniques for Statistics)

  • 1. StatMine – prototype 0.2 Edwin de Jonge, Jan van der Laan & Jessica Solcer Statistics Netherlands (CBS) NTTS 2013, March 6 2013
  • 2. StatMine Goal: Improve use figures Statistics Netherlands How: Add Analysis layer to OutputDB (StatLine) Working approach: • • • • Formulate improvement Develop software prototype Test prototype on (real) users Evaluate But why? StatMine 2
  • 3. Mission SN “The mission of Statistics Netherlands is to publish reliable and coherent statistical information that meets the needs of society” (source: www.cbs.nl) StatMine 0.2 3
  • 4. Mission SN “The mission of Statistics Netherlands is to publish reliable and coherent statistical information that meets the needs of society” (source: www.cbs.nl) StatMine 0.2 4
  • 6. What is the state of the Netherlands? StatLine contains over 1.000.000.000 figures! StatMine 6
  • 7. Problem 1 Figures ≠ Information StatMine 7
  • 8. 1. Figures ≠ Information We know (from user study): • Some important user don’t get the most out of StatLine: • Data journalists • Policy makers • They don’t find and see interesting information, because of tabular presention (data = table) StatMine 0.2 8
  • 11. 2. Fragmented information For policy makers and journalist most information in OutputDB is fragmented: • Users need to combine fragments from different statistics • Diabetes (insuline usage, hospital admissions, mortality, visits to doctor, obesity) • Energy consumption vs economic growth • Income vs economic growth • (Perceived) public safety vs registered crimes StatMine 0.2 11
  • 12. 2. Solution: Let users combine tables (even if we wouldn’t …) StatMine 12
  • 13. Prototype StatMine 0.2 Implements: • Visual interactive data browsing • Combining fragments of different tables Tested on: • 40 SN employees (++) • 40 policy makers (++) StatMine 0.2 13
  • 14. Line chart Bar chart - Show development - Compare Bubble/scatter chart Mosaic chart - Show correlation - Show structure StatMine 0.2 14
  • 17. Technical HTML5 JSON R JavaScript CSS SVG • Runs on desktop • makkelijk over te zetten naar webserver StatMine 0.2 17
  • 18. Currently (2013) • All Official Statistics have confidence interval. • StatMine 0.3 will test if showing uncertainty improves/changes understanding of (quality of) figures. • May lead to publishing interval estimates (in stead of point estimates). StatMine 18
  • 19. Conclusion • Visual data browsing is promising for • Our own statisticians (quality control) • External policy makers and journalists • Using real end users for testing is very helpful: • Lots of suggestions for improvement from users • Users feel involved in innovation process of NSI StatMine 19