SlideShare a Scribd company logo
Experimental transformation of
ABS data into Data Cube
Vocabulary (DCV) format
Why?
How?
What was learned?
Outline
I. Context
– Transforming national & international statistical
systems
– Semantic Web / Linked Data meets Official Statistics
– SemStats 2013
– Parameters for the R&D project
II. Investigation of existing tools
III. Summary of the transformation process
IV. Lessons learned
V. Discussion
2009 (Australia)
• The case for an international statistical innovation program
Transforming national and international statistics systems
• Future capabilities
1. From static data products to “common information services”
2. From publications to communication
3. Support for transaction data flowing at a much higher volume
4. Ability to rapidly incorporate new issues and views of data into
standards and classifications
5. ‘Rapid-response’ capability
6. Connecting processes and passing metadata and data easily
between them
7. Analysing assemblies of data
The Challenges
Increasing
cost &
difficulty of
acquiring
survey data
New sources
& changing
expectations
Rapid changes
in the
environment
Competition
for skilled
resourcesDiminishing
budgets
Riding the
big data
wave
HLG
• High-Level Group for the Modernisation of Statistical Production and Services
• Comprises 10 heads of national and international statistical organisations
– Gosse van der Veen (Netherlands) - Chairman
– Brian Pink (Australia)
– Eduardo Sojo Garza-Aldape (Mexico)
– Enrico Giovannini (Italy)
– Woo, Ki-Jong (Republic of Korea)
– Irena Križman (Slovenia)
– Katherine Wallman (United States)
– Walter Radermacher (Eurostat)
– Martine Durand (OECD)
– Lidia Bratanova (UNECE)
The official statistics industry
and its place in the wider
information industry
From Strategy to implement the vision of
the HLG (2012)
Grouping the challenges
1. Product Challenge - Modernising Statistical Services
• Designing and delivering new and better statistical
outputs (products and services)
2. Process Challenge – Modernising Statistical
Production
• Developing and implementing new and better production
processes and methods which are capable of delivering
statistical outputs with
i. reduced cost, and
ii. greater flexibility.
HLG Strategy
• Standards-based, collaborative modernisation of official statistics.
• Create an environment (eg “common architecture”) that facilitates
collaborative development, sharing and reuse of
– statistical business processes
– statistical methods
– IT components
– data repositories
• Explicit role for
– common conceptual frameworks, eg
• GSIM (Generic Statistical Information Model)
– and common implementation standards, eg
• SDMX (Statistical Data and Metadata eXchange), working with
• DDI (Data Documentation Initiative)
ABS main data service support SDMX
• ABS.Stat Beta
– Dissemination from predefined aggregate data cubes
• eg Consumer Price Index
– Featured at GovHack 2013
– Based on OECD.Stat
• Now used by OECD, IMF, UNESCO, European Commission, ABS,
Statistics New Zealand, Statistics Italy
• Further development through SIS Collaboration Community
• TableBuilder
– Dissemination of on demand tabulations from microdata
• Includes Population Census
Harnessing the opportunities
• Global community around SDMX
– intersects with SIS Collaboration Community
• Working on
– SDMX to JSON (JavaScript Object Notation)
• Making life easier for third party developers
– No need to parse SDMX-ML
• Object model similar to Data Cube Vocabulary (DCV)
• Expected to be released for review in September
– SDMX to Data Cube Vocabulary (DCV)
• Much earlier stage within SIS Collaboration Community
Layering standards on standards
• RDF Data Cube Vocabulary (DCV) developed
under W3C
– designed for publishing multi-dimensional data, such
as statistics, on the web in such a way that it can be
linked to related data sets and concepts
– based upon the approach used by the SDMX ISO
standard for statistical data exchange
– very general and can be used for other data sets such
as survey data, spreadsheets and OLAP data cubes
Use of DCV
• Usage within
– data.gov.uk
– Eurostat
– Other institutions within the European Union via
the EU’s Open Data Portal
• eg European Environment Agency
– Experimental use within data.gov.au
Linked Data view on Official Statistics
• Official Statistics and the Practice of Data Fidelity
– Official statistics are the “crown jewels” of a nation’s public data
– Provide empirical evidence for policy making and economic research
– Statistical offices are among the most “data-savvy” organisations in
government
– Handling of Statistical Data as Linked Data requires particular attention
to maintain its integrity and fidelity
• Linked SDMX Data
– Challenges
• Automation of data transformation of data from high profile statistical
organizations
• Minimization of third-party interpretation of the source data and metadata and
lossless transformations
(Unofficial) view from Official Statistics
• Semantic Statistics opportunities include :
– external application of statistical classifications, and other statistical
concept schemes, as ontologies
– simpler, more flexible and more powerful use of statistical data along side
other data
– partnering more closely with other “data” communities
• Semantic Statistics issues and risks include
– ensuring production process is sustainable
– ensuring semantics are identified consistently across all statistical outputs
from a single agency
– possible lack of rigour when defining and linking concepts to outputs from
other sources
– the possibility of “fuzzy” semantics leading to incorrect data analyses
SemStats 2013
• Interest in “Semantic Statistics” is growing rapidly
within Statistical and Semantic Web communities
• There are existing semantic web developments
building on both SDMX and DDI
• SemStats 2013 provides a rare opportunity to interact
with world experts while they’re in Australia
• We are interested in what entrants might create and
demonstrate in regard to SemStats 2013 Challenge
SemStats 2013 Challenge
• Provides Australian and French Census data in
Data Cube Vocabulary (DCV) format
– Data is Geography x Sex x Age x “Activity” status
– Entrants are asked to demonstrate value from innovative
application of semantic web technologies to the data.
Aim when preparing Australian content
• use as an opportunity for practical learning
• start with SDMX-ML (not, eg, CSV) (if possible)
– Plan A: SDMX-ML from TableBuilder
• use existing international tools for SDMX-ML to
DCV transformations (if possible)
• do the work within the ABS (if possible)
• Plan B was to ask INSEE (Statistics France) to help us with the
transformation
Investigation
• Datalift
– Supports multiple input types
– Generic transformation
– Supports dissemination to the web
• Mimas
– XSLT based
– Complicated
• Guillaume report
– From INSEE
– Highly tailored to the input data
Datalift
• Free to use – source code also available
• Java web application
• Supports multiple input types
– Semantic graphs
– Relational databases
– Files (CSV, XML, etc)
• Supports entire cycle
– INSEE plan to use in future
• SDMX -> DCV plug-in in development
Mimas
• Inflexible
– XML input only
– XML output only
• Cumbersome
– Requires multiple intermediate conversions
• Inefficient for large volumes of data
Guillaume Report
• INSEE short term solution
• Datalift was not mature enough
• MIMAS identified as cumbersome and
inefficient
• Opted to use Apache Jena for small Java
application
Technology Overview
• Census TableBuilder
– Data extracted in SDMX and CSV
• Java
– Apache Jena library
– SDMX 2.0 XML beans
• Ontologies used
– Simple Knowledge Organisation System
– Data Cube Vocabulary
• Turtle RDF syntax
– Easy to read for humans and machines
SDMX Extraction Tool Overview
• Reads in SDMX structure file
– Uses SDMX 2.0 beans to parse file
• Disassembles XML to main components
– Code lists
– Concepts
– Key Families
• Build semantic model with Apache Jena
• Write to file in Turtle syntax
Code Lists
• Representation of a classification
– Can be hierarchical or flat
Code Schemes
Code scheme
information
Code information
Codes
Code schemes
Generate SKOS
concept scheme
SKOS Concept Schemes
Unique identifier
Type
Parent category
Label
Classification/
concept scheme
Code
Concepts & Components
• Links observations to their:
– Classification
– Concept
Concept Schemes
Concept informationConcepts
Concept Schemes
Components
Component
informationComponents
Key families
Create data structure
definition
Data Structure Definition
Can only be values of
this type
List of codes to use
Concept dimension is
measuring
What the observation
is measuring
The Data - SDMX
• Series key – dimensions being measured
• Attributes – extra metadata about observation
• Obs – the value of the observation (i.e. people
counted)
The Data - DCV
• More condensed – attributes attached to the
dataset instead of the observation
Dimensions
Coded values
Observation
value
Dataset
observation is
from
Lessons Learned (1)
• Subject Matter Experts needed
– What dimensions to use?
– What attributes to use?
– What concepts are we measuring?
• Current tools not yet mature
• Full validation of data complex
• Heavy resource usage for large data
– Unable to process SA2 level data on 32bit
Lessons Learned (2)
• Conversion straight forward
– Standards very similar
• Promotes reuse
– Power comes from linking data
• Linked nature makes you think about what
you are doing
– E.g. How close is INSEE activity to ABS labour force
status?
Semantic Considerations
• How much, how soon, do we aim to harness opportunities
for carrying more usable semantics in Data Cube
Vocabulary?
– Expected an external ontology for sex – but most are for Gender
• How close is “close enough” for semantic assertions in Linked Open
Data?
• Aim for statistical harmonisation first (eg SDMX Cross Domain
Concepts) then explore links to broader ontologies?
• Even data producers are not sure if Age is a common
concept across ABS & INSEE (Statistics France).
• Risk of overselling the technical format before semantic
payload is sorted?
Laying the foundations
• The project confirmed that, in order to deliver more useable semantics in
our outputs, on a sustainable basis, we need statistical data and metadata
to be defined and managed on a consistent, standards aligned basis across
the organisation, including
– across all statistical subject matter domains (social, economic, environmental)
– “end to end” (ie spanning design, collection, processing/integration, analysis
and dissemination)
• We also need production processes to be automated & sustainable.
• This is one example of why ABS needs to “modernise statistical
production” to reflect the changed world in which we operate and to offer
new services that address new needs and expectations of users.
• In the 13/14 Budget Papers funding of $2.1 million was provided to
develop a second pass business case for a major statistical infrastructure
and business process reengineering project.
Discussion

More Related Content

What's hot

2016 SDMX Experts meeting, Checklist for SDMX Design Projects, Daniel Suranyi...
2016 SDMX Experts meeting, Checklist for SDMX Design Projects, Daniel Suranyi...2016 SDMX Experts meeting, Checklist for SDMX Design Projects, Daniel Suranyi...
2016 SDMX Experts meeting, Checklist for SDMX Design Projects, Daniel Suranyi...
StatsCommunications
 
Workshop Rio de Janeiro Strategies for Web Based Data Dissemination
Workshop Rio de Janeiro Strategies for Web Based Data DisseminationWorkshop Rio de Janeiro Strategies for Web Based Data Dissemination
Workshop Rio de Janeiro Strategies for Web Based Data DisseminationZoltan Nagy
 
Open data presentation 2013 v0 5
Open data presentation 2013 v0 5Open data presentation 2013 v0 5
Open data presentation 2013 v0 5
Alan Kong
 
Service innovation: the hidden value of open data
Service innovation: the hidden value of open dataService innovation: the hidden value of open data
Service innovation: the hidden value of open data
Slim Turki, Dr.
 
From open data to data-driven services
From open data to data-driven servicesFrom open data to data-driven services
From open data to data-driven services
Slim Turki, Dr.
 
Industry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraftIndustry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraft
RuleML
 
Linked Open Government Data: What’s Next?
Linked Open Government Data:  What’s Next?Linked Open Government Data:  What’s Next?
Linked Open Government Data: What’s Next?
Li Ding
 
3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics
Nandita Nityanandam
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talk
benosteen
 
How Government Agencies are Using MongoDB to Build Data as a Service Solutions
How Government Agencies are Using MongoDB to Build Data as a Service SolutionsHow Government Agencies are Using MongoDB to Build Data as a Service Solutions
How Government Agencies are Using MongoDB to Build Data as a Service Solutions
MongoDB
 
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...
LIBER Europe
 
Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15
madynav
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
Albert Alex
 
Analytical tools
Analytical toolsAnalytical tools
Analytical tools
Aniket Joshi
 
Mapping presentation THAG big data from space
Mapping presentation THAG big data from spaceMapping presentation THAG big data from space
Mapping presentation THAG big data from space
Bartosz Szkudlarek
 
Introduction to BIG DATA
Introduction to BIG DATA Introduction to BIG DATA
Introduction to BIG DATA
Zeeshan Khan
 
Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...Jonathan Challener
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Dez Blanchfield
 
Global IT Outsourcing case study
Global IT Outsourcing case studyGlobal IT Outsourcing case study
Global IT Outsourcing case study
Nandita Nityanandam
 
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
DeVonne Parks, CEM
 

What's hot (20)

2016 SDMX Experts meeting, Checklist for SDMX Design Projects, Daniel Suranyi...
2016 SDMX Experts meeting, Checklist for SDMX Design Projects, Daniel Suranyi...2016 SDMX Experts meeting, Checklist for SDMX Design Projects, Daniel Suranyi...
2016 SDMX Experts meeting, Checklist for SDMX Design Projects, Daniel Suranyi...
 
Workshop Rio de Janeiro Strategies for Web Based Data Dissemination
Workshop Rio de Janeiro Strategies for Web Based Data DisseminationWorkshop Rio de Janeiro Strategies for Web Based Data Dissemination
Workshop Rio de Janeiro Strategies for Web Based Data Dissemination
 
Open data presentation 2013 v0 5
Open data presentation 2013 v0 5Open data presentation 2013 v0 5
Open data presentation 2013 v0 5
 
Service innovation: the hidden value of open data
Service innovation: the hidden value of open dataService innovation: the hidden value of open data
Service innovation: the hidden value of open data
 
From open data to data-driven services
From open data to data-driven servicesFrom open data to data-driven services
From open data to data-driven services
 
Industry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraftIndustry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraft
 
Linked Open Government Data: What’s Next?
Linked Open Government Data:  What’s Next?Linked Open Government Data:  What’s Next?
Linked Open Government Data: What’s Next?
 
3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talk
 
How Government Agencies are Using MongoDB to Build Data as a Service Solutions
How Government Agencies are Using MongoDB to Build Data as a Service SolutionsHow Government Agencies are Using MongoDB to Build Data as a Service Solutions
How Government Agencies are Using MongoDB to Build Data as a Service Solutions
 
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...
 
Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
 
Analytical tools
Analytical toolsAnalytical tools
Analytical tools
 
Mapping presentation THAG big data from space
Mapping presentation THAG big data from spaceMapping presentation THAG big data from space
Mapping presentation THAG big data from space
 
Introduction to BIG DATA
Introduction to BIG DATA Introduction to BIG DATA
Introduction to BIG DATA
 
Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
 
Global IT Outsourcing case study
Global IT Outsourcing case studyGlobal IT Outsourcing case study
Global IT Outsourcing case study
 
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
 

Viewers also liked

Explorador de windows xp
Explorador de windows xpExplorador de windows xp
Explorador de windows xpAura Duque
 
Apunte Sistema Operativo
Apunte  Sistema  OperativoApunte  Sistema  Operativo
Apunte Sistema Operativo033
 
History of Manga and Anime
History of Manga and AnimeHistory of Manga and Anime
History of Manga and Anime
paultopinio
 
Anexo 06 sc equipos e inst electromecanicas
Anexo 06  sc equipos e inst electromecanicasAnexo 06  sc equipos e inst electromecanicas
Anexo 06 sc equipos e inst electromecanicas
scarpin alexis
 
English: Manga & Anime Lesson
English: Manga & Anime LessonEnglish: Manga & Anime Lesson
English: Manga & Anime Lesson
KatieEnglishTutoring
 
IT Leaders 2012年5月号 No.43
IT Leaders 2012年5月号 No.43IT Leaders 2012年5月号 No.43
IT Leaders 2012年5月号 No.43Takumi ITOH
 
Sistemas2
Sistemas2Sistemas2
Sistemas2
Diego Alves
 
Modelos de comunicacion
Modelos de comunicacionModelos de comunicacion
Modelos de comunicacion
María Isabel Zapata Cárdenas
 
Informatica enfermeria 1_ro
Informatica enfermeria 1_roInformatica enfermeria 1_ro
Informatica enfermeria 1_roSolcitocruz
 
La globalización: consecuencias humanas
La globalización: consecuencias humanasLa globalización: consecuencias humanas
La globalización: consecuencias humanas
Marcela Mikowski
 
Orden HAP/467/2015 de 13 de marzo por la que se aprueban los modelos de Rent...
Orden HAP/467/2015  de 13 de marzo por la que se aprueban los modelos de Rent...Orden HAP/467/2015  de 13 de marzo por la que se aprueban los modelos de Rent...
Orden HAP/467/2015 de 13 de marzo por la que se aprueban los modelos de Rent...
José Manuel Arroyo Quero
 
Basic Mandarin Chinese | Lesson 6 | Introducing yourself & meeting new friends
Basic Mandarin Chinese | Lesson 6 | Introducing yourself & meeting new friendsBasic Mandarin Chinese | Lesson 6 | Introducing yourself & meeting new friends
Basic Mandarin Chinese | Lesson 6 | Introducing yourself & meeting new friends
CultureAlley
 
Contoh Tugasan : Tajuk Rakan Sebaya
Contoh Tugasan : Tajuk Rakan SebayaContoh Tugasan : Tajuk Rakan Sebaya
Contoh Tugasan : Tajuk Rakan Sebayanazri15
 
Modelo de Negócios - Business Model Canvas
Modelo de Negócios - Business Model CanvasModelo de Negócios - Business Model Canvas
Modelo de Negócios - Business Model Canvas
Maurício Magalhães
 
Guía entorno socioeconómico
Guía entorno socioeconómicoGuía entorno socioeconómico
Guía entorno socioeconómico
NICOLL VARGAS
 
Proyecto de investigación motivación escolar metodología por proyectos
Proyecto de investigación motivación escolar  metodología por proyectos Proyecto de investigación motivación escolar  metodología por proyectos
Proyecto de investigación motivación escolar metodología por proyectos
Mónica Andrea Hidalgo Vergara
 
Estación 2 yaneth
Estación 2 yanethEstación 2 yaneth
Estación 2 yaneth
Yaneth Laguado
 
Bienvenido a la republica independiente de las pruebas unitarias con Core Data
Bienvenido a la republica independiente de las pruebas unitarias con Core DataBienvenido a la republica independiente de las pruebas unitarias con Core Data
Bienvenido a la republica independiente de las pruebas unitarias con Core Data
Alfonso Alba
 
Unidad 4 actividad 3
Unidad 4 actividad 3Unidad 4 actividad 3
Unidad 4 actividad 3KARY
 

Viewers also liked (20)

Explorador de windows xp
Explorador de windows xpExplorador de windows xp
Explorador de windows xp
 
Apunte Sistema Operativo
Apunte  Sistema  OperativoApunte  Sistema  Operativo
Apunte Sistema Operativo
 
History of Manga and Anime
History of Manga and AnimeHistory of Manga and Anime
History of Manga and Anime
 
Anexo 06 sc equipos e inst electromecanicas
Anexo 06  sc equipos e inst electromecanicasAnexo 06  sc equipos e inst electromecanicas
Anexo 06 sc equipos e inst electromecanicas
 
English: Manga & Anime Lesson
English: Manga & Anime LessonEnglish: Manga & Anime Lesson
English: Manga & Anime Lesson
 
IT Leaders 2012年5月号 No.43
IT Leaders 2012年5月号 No.43IT Leaders 2012年5月号 No.43
IT Leaders 2012年5月号 No.43
 
Sistemas2
Sistemas2Sistemas2
Sistemas2
 
Modelos de comunicacion
Modelos de comunicacionModelos de comunicacion
Modelos de comunicacion
 
Informatica enfermeria 1_ro
Informatica enfermeria 1_roInformatica enfermeria 1_ro
Informatica enfermeria 1_ro
 
La globalización: consecuencias humanas
La globalización: consecuencias humanasLa globalización: consecuencias humanas
La globalización: consecuencias humanas
 
Orden HAP/467/2015 de 13 de marzo por la que se aprueban los modelos de Rent...
Orden HAP/467/2015  de 13 de marzo por la que se aprueban los modelos de Rent...Orden HAP/467/2015  de 13 de marzo por la que se aprueban los modelos de Rent...
Orden HAP/467/2015 de 13 de marzo por la que se aprueban los modelos de Rent...
 
Basic Mandarin Chinese | Lesson 6 | Introducing yourself & meeting new friends
Basic Mandarin Chinese | Lesson 6 | Introducing yourself & meeting new friendsBasic Mandarin Chinese | Lesson 6 | Introducing yourself & meeting new friends
Basic Mandarin Chinese | Lesson 6 | Introducing yourself & meeting new friends
 
Contoh Tugasan : Tajuk Rakan Sebaya
Contoh Tugasan : Tajuk Rakan SebayaContoh Tugasan : Tajuk Rakan Sebaya
Contoh Tugasan : Tajuk Rakan Sebaya
 
Modelo de Negócios - Business Model Canvas
Modelo de Negócios - Business Model CanvasModelo de Negócios - Business Model Canvas
Modelo de Negócios - Business Model Canvas
 
Guía entorno socioeconómico
Guía entorno socioeconómicoGuía entorno socioeconómico
Guía entorno socioeconómico
 
Codigo tributario
Codigo tributarioCodigo tributario
Codigo tributario
 
Proyecto de investigación motivación escolar metodología por proyectos
Proyecto de investigación motivación escolar  metodología por proyectos Proyecto de investigación motivación escolar  metodología por proyectos
Proyecto de investigación motivación escolar metodología por proyectos
 
Estación 2 yaneth
Estación 2 yanethEstación 2 yaneth
Estación 2 yaneth
 
Bienvenido a la republica independiente de las pruebas unitarias con Core Data
Bienvenido a la republica independiente de las pruebas unitarias con Core DataBienvenido a la republica independiente de las pruebas unitarias con Core Data
Bienvenido a la republica independiente de las pruebas unitarias con Core Data
 
Unidad 4 actividad 3
Unidad 4 actividad 3Unidad 4 actividad 3
Unidad 4 actividad 3
 

Similar to Experimental transformation of ABS data into Data Cube Vocabulary (DCV) format : Why, How and What was learned

IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...
IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...
IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...
Istituto nazionale di statistica
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
IdontKnow66967
 
Lecture1
Lecture1Lecture1
Lecture1
Manish Singh
 
Identity Management: Tools, processes & services
Identity Management: Tools, processes & servicesIdentity Management: Tools, processes & services
Identity Management: Tools, processes & services
JISC Netskills
 
Connected development data
Connected development dataConnected development data
Connected development data
Rob Worthington
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
ElsonPaul2
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
Piet J.H. Daas
 
Creating Effective Data Visualizations in Excel 2016: Some Basics
Creating Effective Data Visualizations in Excel 2016:  Some BasicsCreating Effective Data Visualizations in Excel 2016:  Some Basics
Creating Effective Data Visualizations in Excel 2016: Some Basics
Shalin Hai-Jew
 
2016 SDMX Experts meeting, Opening of SDMX Capacity Building - Introduction ...
2016 SDMX Experts meeting, Opening of SDMX Capacity Building  - Introduction ...2016 SDMX Experts meeting, Opening of SDMX Capacity Building  - Introduction ...
2016 SDMX Experts meeting, Opening of SDMX Capacity Building - Introduction ...
StatsCommunications
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
itnewsafrica
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
hktripathy
 
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Research Data Alliance
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Geoffrey Fox
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
RojaT4
 
Data Mesh
Data MeshData Mesh
V. Del Vecchio - Sdmx versus other standards
V. Del Vecchio - Sdmx versus other standards V. Del Vecchio - Sdmx versus other standards
V. Del Vecchio - Sdmx versus other standards
Istituto nazionale di statistica
 
BDA-Module-1.pptx
BDA-Module-1.pptxBDA-Module-1.pptx
BDA-Module-1.pptx
ASHWIN808488
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
AIMLSEMINARS
 

Similar to Experimental transformation of ABS data into Data Cube Vocabulary (DCV) format : Why, How and What was learned (20)

IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...
IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...
IT Architectures for Handling Big Data in Official Statistics: the Case of Sc...
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
Identity Management: Tools, processes & services
Identity Management: Tools, processes & servicesIdentity Management: Tools, processes & services
Identity Management: Tools, processes & services
 
Connected development data
Connected development dataConnected development data
Connected development data
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
Creating Effective Data Visualizations in Excel 2016: Some Basics
Creating Effective Data Visualizations in Excel 2016:  Some BasicsCreating Effective Data Visualizations in Excel 2016:  Some Basics
Creating Effective Data Visualizations in Excel 2016: Some Basics
 
2016 SDMX Experts meeting, Opening of SDMX Capacity Building - Introduction ...
2016 SDMX Experts meeting, Opening of SDMX Capacity Building  - Introduction ...2016 SDMX Experts meeting, Opening of SDMX Capacity Building  - Introduction ...
2016 SDMX Experts meeting, Opening of SDMX Capacity Building - Introduction ...
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
V. Del Vecchio - Sdmx versus other standards
V. Del Vecchio - Sdmx versus other standards V. Del Vecchio - Sdmx versus other standards
V. Del Vecchio - Sdmx versus other standards
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
BDA-Module-1.pptx
BDA-Module-1.pptxBDA-Module-1.pptx
BDA-Module-1.pptx
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 

Recently uploaded

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 

Recently uploaded (20)

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 

Experimental transformation of ABS data into Data Cube Vocabulary (DCV) format : Why, How and What was learned

  • 1. Experimental transformation of ABS data into Data Cube Vocabulary (DCV) format Why? How? What was learned?
  • 2. Outline I. Context – Transforming national & international statistical systems – Semantic Web / Linked Data meets Official Statistics – SemStats 2013 – Parameters for the R&D project II. Investigation of existing tools III. Summary of the transformation process IV. Lessons learned V. Discussion
  • 3. 2009 (Australia) • The case for an international statistical innovation program Transforming national and international statistics systems • Future capabilities 1. From static data products to “common information services” 2. From publications to communication 3. Support for transaction data flowing at a much higher volume 4. Ability to rapidly incorporate new issues and views of data into standards and classifications 5. ‘Rapid-response’ capability 6. Connecting processes and passing metadata and data easily between them 7. Analysing assemblies of data
  • 4. The Challenges Increasing cost & difficulty of acquiring survey data New sources & changing expectations Rapid changes in the environment Competition for skilled resourcesDiminishing budgets Riding the big data wave
  • 5. HLG • High-Level Group for the Modernisation of Statistical Production and Services • Comprises 10 heads of national and international statistical organisations – Gosse van der Veen (Netherlands) - Chairman – Brian Pink (Australia) – Eduardo Sojo Garza-Aldape (Mexico) – Enrico Giovannini (Italy) – Woo, Ki-Jong (Republic of Korea) – Irena Križman (Slovenia) – Katherine Wallman (United States) – Walter Radermacher (Eurostat) – Martine Durand (OECD) – Lidia Bratanova (UNECE) The official statistics industry and its place in the wider information industry From Strategy to implement the vision of the HLG (2012)
  • 6. Grouping the challenges 1. Product Challenge - Modernising Statistical Services • Designing and delivering new and better statistical outputs (products and services) 2. Process Challenge – Modernising Statistical Production • Developing and implementing new and better production processes and methods which are capable of delivering statistical outputs with i. reduced cost, and ii. greater flexibility.
  • 7. HLG Strategy • Standards-based, collaborative modernisation of official statistics. • Create an environment (eg “common architecture”) that facilitates collaborative development, sharing and reuse of – statistical business processes – statistical methods – IT components – data repositories • Explicit role for – common conceptual frameworks, eg • GSIM (Generic Statistical Information Model) – and common implementation standards, eg • SDMX (Statistical Data and Metadata eXchange), working with • DDI (Data Documentation Initiative)
  • 8. ABS main data service support SDMX • ABS.Stat Beta – Dissemination from predefined aggregate data cubes • eg Consumer Price Index – Featured at GovHack 2013 – Based on OECD.Stat • Now used by OECD, IMF, UNESCO, European Commission, ABS, Statistics New Zealand, Statistics Italy • Further development through SIS Collaboration Community • TableBuilder – Dissemination of on demand tabulations from microdata • Includes Population Census
  • 9. Harnessing the opportunities • Global community around SDMX – intersects with SIS Collaboration Community • Working on – SDMX to JSON (JavaScript Object Notation) • Making life easier for third party developers – No need to parse SDMX-ML • Object model similar to Data Cube Vocabulary (DCV) • Expected to be released for review in September – SDMX to Data Cube Vocabulary (DCV) • Much earlier stage within SIS Collaboration Community
  • 10. Layering standards on standards • RDF Data Cube Vocabulary (DCV) developed under W3C – designed for publishing multi-dimensional data, such as statistics, on the web in such a way that it can be linked to related data sets and concepts – based upon the approach used by the SDMX ISO standard for statistical data exchange – very general and can be used for other data sets such as survey data, spreadsheets and OLAP data cubes
  • 11. Use of DCV • Usage within – data.gov.uk – Eurostat – Other institutions within the European Union via the EU’s Open Data Portal • eg European Environment Agency – Experimental use within data.gov.au
  • 12. Linked Data view on Official Statistics • Official Statistics and the Practice of Data Fidelity – Official statistics are the “crown jewels” of a nation’s public data – Provide empirical evidence for policy making and economic research – Statistical offices are among the most “data-savvy” organisations in government – Handling of Statistical Data as Linked Data requires particular attention to maintain its integrity and fidelity • Linked SDMX Data – Challenges • Automation of data transformation of data from high profile statistical organizations • Minimization of third-party interpretation of the source data and metadata and lossless transformations
  • 13. (Unofficial) view from Official Statistics • Semantic Statistics opportunities include : – external application of statistical classifications, and other statistical concept schemes, as ontologies – simpler, more flexible and more powerful use of statistical data along side other data – partnering more closely with other “data” communities • Semantic Statistics issues and risks include – ensuring production process is sustainable – ensuring semantics are identified consistently across all statistical outputs from a single agency – possible lack of rigour when defining and linking concepts to outputs from other sources – the possibility of “fuzzy” semantics leading to incorrect data analyses
  • 14. SemStats 2013 • Interest in “Semantic Statistics” is growing rapidly within Statistical and Semantic Web communities • There are existing semantic web developments building on both SDMX and DDI • SemStats 2013 provides a rare opportunity to interact with world experts while they’re in Australia • We are interested in what entrants might create and demonstrate in regard to SemStats 2013 Challenge
  • 15. SemStats 2013 Challenge • Provides Australian and French Census data in Data Cube Vocabulary (DCV) format – Data is Geography x Sex x Age x “Activity” status – Entrants are asked to demonstrate value from innovative application of semantic web technologies to the data.
  • 16. Aim when preparing Australian content • use as an opportunity for practical learning • start with SDMX-ML (not, eg, CSV) (if possible) – Plan A: SDMX-ML from TableBuilder • use existing international tools for SDMX-ML to DCV transformations (if possible) • do the work within the ABS (if possible) • Plan B was to ask INSEE (Statistics France) to help us with the transformation
  • 17. Investigation • Datalift – Supports multiple input types – Generic transformation – Supports dissemination to the web • Mimas – XSLT based – Complicated • Guillaume report – From INSEE – Highly tailored to the input data
  • 18. Datalift • Free to use – source code also available • Java web application • Supports multiple input types – Semantic graphs – Relational databases – Files (CSV, XML, etc) • Supports entire cycle – INSEE plan to use in future • SDMX -> DCV plug-in in development
  • 19. Mimas • Inflexible – XML input only – XML output only • Cumbersome – Requires multiple intermediate conversions • Inefficient for large volumes of data
  • 20. Guillaume Report • INSEE short term solution • Datalift was not mature enough • MIMAS identified as cumbersome and inefficient • Opted to use Apache Jena for small Java application
  • 21. Technology Overview • Census TableBuilder – Data extracted in SDMX and CSV • Java – Apache Jena library – SDMX 2.0 XML beans • Ontologies used – Simple Knowledge Organisation System – Data Cube Vocabulary • Turtle RDF syntax – Easy to read for humans and machines
  • 22. SDMX Extraction Tool Overview • Reads in SDMX structure file – Uses SDMX 2.0 beans to parse file • Disassembles XML to main components – Code lists – Concepts – Key Families • Build semantic model with Apache Jena • Write to file in Turtle syntax
  • 23. Code Lists • Representation of a classification – Can be hierarchical or flat
  • 24. Code Schemes Code scheme information Code information Codes Code schemes Generate SKOS concept scheme
  • 25. SKOS Concept Schemes Unique identifier Type Parent category Label Classification/ concept scheme Code
  • 26. Concepts & Components • Links observations to their: – Classification – Concept
  • 29. Data Structure Definition Can only be values of this type List of codes to use Concept dimension is measuring What the observation is measuring
  • 30. The Data - SDMX • Series key – dimensions being measured • Attributes – extra metadata about observation • Obs – the value of the observation (i.e. people counted)
  • 31. The Data - DCV • More condensed – attributes attached to the dataset instead of the observation Dimensions Coded values Observation value Dataset observation is from
  • 32. Lessons Learned (1) • Subject Matter Experts needed – What dimensions to use? – What attributes to use? – What concepts are we measuring? • Current tools not yet mature • Full validation of data complex • Heavy resource usage for large data – Unable to process SA2 level data on 32bit
  • 33. Lessons Learned (2) • Conversion straight forward – Standards very similar • Promotes reuse – Power comes from linking data • Linked nature makes you think about what you are doing – E.g. How close is INSEE activity to ABS labour force status?
  • 34. Semantic Considerations • How much, how soon, do we aim to harness opportunities for carrying more usable semantics in Data Cube Vocabulary? – Expected an external ontology for sex – but most are for Gender • How close is “close enough” for semantic assertions in Linked Open Data? • Aim for statistical harmonisation first (eg SDMX Cross Domain Concepts) then explore links to broader ontologies? • Even data producers are not sure if Age is a common concept across ABS & INSEE (Statistics France). • Risk of overselling the technical format before semantic payload is sorted?
  • 35. Laying the foundations • The project confirmed that, in order to deliver more useable semantics in our outputs, on a sustainable basis, we need statistical data and metadata to be defined and managed on a consistent, standards aligned basis across the organisation, including – across all statistical subject matter domains (social, economic, environmental) – “end to end” (ie spanning design, collection, processing/integration, analysis and dissemination) • We also need production processes to be automated & sustainable. • This is one example of why ABS needs to “modernise statistical production” to reflect the changed world in which we operate and to offer new services that address new needs and expectations of users. • In the 13/14 Budget Papers funding of $2.1 million was provided to develop a second pass business case for a major statistical infrastructure and business process reengineering project.

Editor's Notes

  1. National Statistical Institutions face shared constraints and challenges.External ChallengesRapidly changing external environment - 24 / 7 access to informationIncreasing demand by sophisticated users for more timely, relevant statistical data to meet ‘current’ day issuesincreasing demand for more accessible and ‘joined up’ data to solve complex policy questionsConstraintsReduced funding and volatility in funding Our costs are increasing significantly – unable to contact many households, response rates are dropping, it is becoming more and more difficult to recruit and retain interviewers skills shortages – competing for statistical and ICT skills across government complex work programs siloed processesand aging infrastructure