SlideShare a Scribd company logo
A tool for the automatic collection of administrative data
to produce official statistics
Conference of European Statistics Stakeholders
Budapest, 20-21 October 2016
Alessandro Capezzuoli, Emanuela Recchini
Official statistics and data integration1
3
4
2
Model
Technology
Architecture
5 Concluding remarks
DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics
Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
1. Official statistics and data integration
1
Bringing together information from different
sources makes it possible to fill information
gaps or provide insights which cannot be
gleaned from unlinked data and to improve
the knowledge and understanding of
specific phenomena.
Introductory remarks (1)
DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics
Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
There is worldwide recognition of the
increasing role played by administrative data
in the production of more timely, more
disaggregated statistics at higher frequencies
than traditional survey data.
The efficient use of all available information
to produce timely, accurate and high quality
statistics is a challenge for National Statistical
Offices (NSOs), which are even more
committed to developing methods and
suitable tools for the production, collection,
standardization and integration of different
types of statistical data.
Nowadays, the exploitation of administrative data for statistical purposes is a normal
practice for a large number of NSOs. This improves the quality of statistical outputs, reduces
the statistical burden on respondents and minimizes costs.
The Italian National Institute of Statistics (Istat) collects and manages a large amounts of
administrative data from different sources, among which:
• Italian Agency of Revenue
• Bank of Italy
• Ministries
• Social Security Institutions
• Government Institutions
• Private Institutions
• …
From 2009 to
2015,
administrative
data supplied
to Istat have
trebled
1. Official statistics and data integration
Introductory remarks (2)
2
DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics
Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
According to the provisions of the Italian Digital Administration Code:
➢ before proceeding to the collection of new data, public administrations are required
to verify whether the information they need can be acquired through access to
information already in the possession of other public authorities or public bodies.
➢ the technical options for the usability of data are:
 web access through the website of the supplier institution or an ad hoc thematic
website
 Interoperability among public administrations for data collection and data
integration
 the user can process data collected exclusively for the pursuit of its institutional
goals; data transfer from one information system to another does not change data
ownership
 the transfer of a data from an information system to another does not change the
ownership of the given
1. Official statistics and data integration
The Italian legislation on data collection
(Guidelines for the drafting of conventions on the usability Public Administrations data; Legislative Decree n. 82/2005,
commonly referred to as the “Digital Administration Code”, modified by the Legislative Decree n. 235/2010)
3
DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics
Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
1. Official statistics and data integration
Administrative data collected by Istat
Data collected
by Istat are very
different from
each other in
type, content and
structure
4
DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics
Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
DATA SUPPLIER
- receives data requests
- elaborates data requests
- prepares data to be sent
- sends data to data collector
DATA COLLECTOR
- manages data requests
- defines methods and standards
- manages reminders
- stores data and metadata
- standardizes and disseminates data
1. Official statistics and data integration
Data collection process (1)
5
DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics
Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
1. Official statistics and data integration
Data collection process (2)
✓ Data collection through File Transfer Protocol
(FTP)
✓ Data uploading through an ad hoc website to
manage reminders and data supply requests
THESE SOLUTIONS DO NOT PERMIT PROCESS AUTOMATION
✓ Management of data requests and reminders
✓ Complex IT infrastructure
✓ Burden for data suppliers
✓ Human resources for transactions management
6
DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics
Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
2. Tecnology
Representational State Transfer (REST)
• is not a standard, is just an architecture style for designing networked applications
• defines a set of guidelines to use the HTTP protocol in order to perform 4 operations summarized in the acronym
CRUD (Create, Read, Update, Delete), by means of an API (Application Programming Interface).
…the World Wide Web offers a possible solution!
HTTP (Hypertext Transfer Protocol), the set of rules for transferring files on the Web, can be
conveniently used for data collection and data exchange.
It is a request/response protocol based on the client-server architecture.
7
DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics
Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
CRUD principles
REST is a service concept that may be summarized by the CRUD principles
REST allows data suppliers
to create, read and update
resources with a logic
similar to that used to
perform operations on any
SQL database.
2. Tecnology
8
DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics
Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
REST architecture enables users to separate relational DB from
the client through an API, which exploits HTTP to transmit data
and exchange information.
2. Tecnology
9
DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics
Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
3. Model
UNSTRUCTURED DATA - a model collecting data in their essence (key/value) is more convenient
and immediate than defining multiple standards for data representation;
SCALABILITY - a highly extensible architecture is needed, in case of possible conceptual/architectural
future upgrade;
INTUITIVE SCHEMA - the model should be easily applied by data suppliers, without resorting to
complex studies of any imposed standard;
BIG-DATA-ORIENTED ARCHITECTURE - the system should be in line with big-data processing
techniques;
INTEGRATION WITH MODERN IT TOOLS FOR BIG DATA - storage is closely linked to the tools
used for semantic search, data analysis and data visualization. Elasticsearch, Hadhoop, Solr, Cassandra
provide a complete integrated environment for managing them.
The different types of data, IT tools and skills of data suppliers require a model implying:
10
DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics
Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
KEY/VALUE storage model
{
"keyspace" :
{
"columnfamily" :
{
"rowkey" :
{
"supercolumn" :
{
"column name" : "column value"
}
}
}
}
}
Statistical Key Value
Data Model
3. Model
The format that is better suited
for HTTP use is JSON (JavaScript
Object Notation) to which
different models for data
representation can be
associated. In particular, dealing
with highly heterogeneous data,
it is recommended to use a
model to represent them in their
simplest form: a key/value pair.
11
DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics
Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
4. Architecture
DataSTAT Hub is a tool for data collection that takes advantage of the potential
offered by HTTP 2.0 and REST architecture and exploits the methods offered by
the CRUD architecture (Create, Read, Update, Delete).
12
DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics
Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
Most entities or objects in most applications can be serialized into a
JSON object, with keys and values. A key is the name of a field or
property, and a value can be a string, a number, a Boolean, another
object, an array of values, or some other specialized type such as a
string representing a date or an object representing a geolocation.
Elasticsearch is an open source search engine that
can be conveniently used for collection and
release of data. Through Elasticsearch it is
possible to index and map documents/data
through querystrings to be sent via HTTP in JSON
format.
4. Architecture
Documents are indexed—stored and made searchable—by
using the index API, which uniquely identify the document.
Mapping is the process of defining how a document,
and the fields it contains, are stored and indexed.
DOCUMENT
INDEX / TYPE
MAPPING
13
DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics
Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
ELASTICSERACH
Data contained in the index can be easily
stored in a database that uses the
Key/Value model (Eg. Cassandra)
Data suppliers can autonomously
create data index, describe data
content and perform any operation
on them (put/update/delete/get)
Indexed data have an immediate dissemination
channel which Elasticsearch is associated to as a
powerful engine for searching among big data
and, possibly, an API that standardizes the output
4. Architecture
DATA SUPPLIER
OUTPUT CHANNEL
DATA STORAGE
14
DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics
Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
ELASTICSERACH
4. Architecture
DATA SUPPLIER
15
DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics
Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
SEARCH ENGINE
REST WEBSERVICES
WIDGET / USERS
INTERFACE
Datastat Hub applied to statistical
classifications
www. statisticlass.eu
5. Concluding remarks
DataSTAT Hub is a suitable and easy tool for the automated collection,
standardization and integration of administrative data.
Reduction of burden on users: this hub does not require the knowledge of the
internal data base since the updating is performed through the HTTP querystrings
and can be used with any programming language; once created, the procedure
will be used for each next data supply.
Reduction of costs in terms of employment of human resources for organizational,
bureaucratic and IT management
By allowing us to overcome some critical issues related to the use of
administrative data, including those connected with privacy and security, a tool
such as DataSTAT Hub is time-saving and cost-effective.
It is a user-friendly tool developed by making use of open source technologies and
can be conveniently shared among NSOs, while it is extensible to any other
institution.
16
DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics
Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
alessandro.capezzuoli@istat.it
emanuela.recchini@istat.it
THANK YOU FOR YOUR ATTENTION
FOR ANY
QUESTIONS
CONTACT US:

More Related Content

What's hot

Statista-Company-Brochure
Statista-Company-BrochureStatista-Company-Brochure
Statista-Company-BrochureArianna Terzago
 
Strategic Management in Maritime Ports
Strategic Management in Maritime PortsStrategic Management in Maritime Ports
Strategic Management in Maritime PortsParadigma Consulting
 
A Data-driven Approach for Internet of Things Applications: Methods and Case ...
A Data-driven Approach for Internet of Things Applications: Methods and Case ...A Data-driven Approach for Internet of Things Applications: Methods and Case ...
A Data-driven Approach for Internet of Things Applications: Methods and Case ...Suparna De
 
Linked Data for a privacy-aware Smart Grid
Linked Data for a privacy-aware Smart GridLinked Data for a privacy-aware Smart Grid
Linked Data for a privacy-aware Smart GridWagner Andreas
 
Presentation statista corporate-account_22-03c
Presentation statista corporate-account_22-03cPresentation statista corporate-account_22-03c
Presentation statista corporate-account_22-03cStatista
 
Linda newsletter issue 1 dec2014
Linda newsletter issue 1 dec2014Linda newsletter issue 1 dec2014
Linda newsletter issue 1 dec2014LinDa_FP7
 
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...ErhardRahm
 
How to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsHow to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsOntotext
 

What's hot (9)

Statista-Company-Brochure
Statista-Company-BrochureStatista-Company-Brochure
Statista-Company-Brochure
 
Strategic Management in Maritime Ports
Strategic Management in Maritime PortsStrategic Management in Maritime Ports
Strategic Management in Maritime Ports
 
A Data-driven Approach for Internet of Things Applications: Methods and Case ...
A Data-driven Approach for Internet of Things Applications: Methods and Case ...A Data-driven Approach for Internet of Things Applications: Methods and Case ...
A Data-driven Approach for Internet of Things Applications: Methods and Case ...
 
The Social Data Web
The Social Data WebThe Social Data Web
The Social Data Web
 
Linked Data for a privacy-aware Smart Grid
Linked Data for a privacy-aware Smart GridLinked Data for a privacy-aware Smart Grid
Linked Data for a privacy-aware Smart Grid
 
Presentation statista corporate-account_22-03c
Presentation statista corporate-account_22-03cPresentation statista corporate-account_22-03c
Presentation statista corporate-account_22-03c
 
Linda newsletter issue 1 dec2014
Linda newsletter issue 1 dec2014Linda newsletter issue 1 dec2014
Linda newsletter issue 1 dec2014
 
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
 
How to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsHow to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk Analytics
 

Viewers also liked

Menkel joshua ppp1
Menkel joshua ppp1Menkel joshua ppp1
Menkel joshua ppp1jmack3021
 
Opendata, Datasharing, Linked Open Data, Big Data
Opendata, Datasharing, Linked Open Data, Big DataOpendata, Datasharing, Linked Open Data, Big Data
Opendata, Datasharing, Linked Open Data, Big DataALESSANDRO CAPEZZUOLI
 
DATASTAT HUB: HTTP protocol, REST, CRUD and automatic data collection
DATASTAT HUB: HTTP protocol, REST, CRUD and automatic data collectionDATASTAT HUB: HTTP protocol, REST, CRUD and automatic data collection
DATASTAT HUB: HTTP protocol, REST, CRUD and automatic data collectionALESSANDRO CAPEZZUOLI
 
Ods Sicilia 2016: Statistica e geodati
Ods Sicilia  2016: Statistica e geodatiOds Sicilia  2016: Statistica e geodati
Ods Sicilia 2016: Statistica e geodatiALESSANDRO CAPEZZUOLI
 
Cartografia, GIS, Geoserver, Sistemi Cartografici, Postgis
Cartografia, GIS, Geoserver, Sistemi Cartografici, PostgisCartografia, GIS, Geoserver, Sistemi Cartografici, Postgis
Cartografia, GIS, Geoserver, Sistemi Cartografici, PostgisALESSANDRO CAPEZZUOLI
 
Web GIS, statistical geospatial data visualization and real time monitoring
Web GIS, statistical geospatial data visualization and real time monitoringWeb GIS, statistical geospatial data visualization and real time monitoring
Web GIS, statistical geospatial data visualization and real time monitoringALESSANDRO CAPEZZUOLI
 
Design of Spatial Applications
Design of Spatial ApplicationsDesign of Spatial Applications
Design of Spatial Applicationscreativesynthesis
 
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Webinar -Data Warehouse Augmentation: Cut Costs, Increase PowerWebinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Webinar -Data Warehouse Augmentation: Cut Costs, Increase PowerZaloni
 
Uber's data science workbench
Uber's data science workbenchUber's data science workbench
Uber's data science workbenchRan Wei
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
 
Architecting a Next Generation Data Platform
Architecting a Next Generation Data PlatformArchitecting a Next Generation Data Platform
Architecting a Next Generation Data Platformhadooparchbook
 
STATVIEW: a web platform for visualisation and dissemination of statistical d...
STATVIEW: a web platform for visualisation and dissemination of statistical d...STATVIEW: a web platform for visualisation and dissemination of statistical d...
STATVIEW: a web platform for visualisation and dissemination of statistical d...ALESSANDRO CAPEZZUOLI
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationZaloni
 

Viewers also liked (15)

Menkel joshua ppp1
Menkel joshua ppp1Menkel joshua ppp1
Menkel joshua ppp1
 
Opendata, Datasharing, Linked Open Data, Big Data
Opendata, Datasharing, Linked Open Data, Big DataOpendata, Datasharing, Linked Open Data, Big Data
Opendata, Datasharing, Linked Open Data, Big Data
 
DATASTAT HUB: HTTP protocol, REST, CRUD and automatic data collection
DATASTAT HUB: HTTP protocol, REST, CRUD and automatic data collectionDATASTAT HUB: HTTP protocol, REST, CRUD and automatic data collection
DATASTAT HUB: HTTP protocol, REST, CRUD and automatic data collection
 
Ods Sicilia 2016: Statview
Ods Sicilia 2016: StatviewOds Sicilia 2016: Statview
Ods Sicilia 2016: Statview
 
Ods Sicilia 2016: Statistica e geodati
Ods Sicilia  2016: Statistica e geodatiOds Sicilia  2016: Statistica e geodati
Ods Sicilia 2016: Statistica e geodati
 
Cartografia, GIS, Geoserver, Sistemi Cartografici, Postgis
Cartografia, GIS, Geoserver, Sistemi Cartografici, PostgisCartografia, GIS, Geoserver, Sistemi Cartografici, Postgis
Cartografia, GIS, Geoserver, Sistemi Cartografici, Postgis
 
Web GIS, statistical geospatial data visualization and real time monitoring
Web GIS, statistical geospatial data visualization and real time monitoringWeb GIS, statistical geospatial data visualization and real time monitoring
Web GIS, statistical geospatial data visualization and real time monitoring
 
Design of Spatial Applications
Design of Spatial ApplicationsDesign of Spatial Applications
Design of Spatial Applications
 
STATVIEW
STATVIEWSTATVIEW
STATVIEW
 
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Webinar -Data Warehouse Augmentation: Cut Costs, Increase PowerWebinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
 
Uber's data science workbench
Uber's data science workbenchUber's data science workbench
Uber's data science workbench
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Architecting a Next Generation Data Platform
Architecting a Next Generation Data PlatformArchitecting a Next Generation Data Platform
Architecting a Next Generation Data Platform
 
STATVIEW: a web platform for visualisation and dissemination of statistical d...
STATVIEW: a web platform for visualisation and dissemination of statistical d...STATVIEW: a web platform for visualisation and dissemination of statistical d...
STATVIEW: a web platform for visualisation and dissemination of statistical d...
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma Presentation
 

Similar to DATASTAT HUB

HLG Big Data project and Sandbox
HLG Big Data project and SandboxHLG Big Data project and Sandbox
HLG Big Data project and SandboxCarlo Vaccari
 
data - driven journalism 1
 data - driven journalism 1 data - driven journalism 1
data - driven journalism 1FIAT/IFTA
 
BDE SC6-hang out - technology part-SWC - Martin
BDE SC6-hang out - technology part-SWC - MartinBDE SC6-hang out - technology part-SWC - Martin
BDE SC6-hang out - technology part-SWC - MartinBigData_Europe
 
ADEQUATe and CommuniData
ADEQUATe and CommuniDataADEQUATe and CommuniData
ADEQUATe and CommuniDataStadt Wien
 
BDE SC6-ws-05/12/2016 technology part - SWC
BDE SC6-ws-05/12/2016 technology part - SWCBDE SC6-ws-05/12/2016 technology part - SWC
BDE SC6-ws-05/12/2016 technology part - SWCBigData_Europe
 
The EUnetHTA perspective on the HTA database
The EUnetHTA perspective on the HTA databaseThe EUnetHTA perspective on the HTA database
The EUnetHTA perspective on the HTA databasePatrice Chalon
 
Enabling Data Analytics from Knowledge Graphs @ ISWC 2017 Doctoral Consortium
Enabling Data Analytics from Knowledge Graphs @ ISWC 2017 Doctoral ConsortiumEnabling Data Analytics from Knowledge Graphs @ ISWC 2017 Doctoral Consortium
Enabling Data Analytics from Knowledge Graphs @ ISWC 2017 Doctoral ConsortiumHenrique O. Santos
 
Intact danish workshop_20171001
Intact danish workshop_20171001Intact danish workshop_20171001
Intact danish workshop_20171001Dirk Pieper
 
SC4 Workshop 2: Soren Auer BDE project Overview
SC4 Workshop 2: Soren Auer BDE project OverviewSC4 Workshop 2: Soren Auer BDE project Overview
SC4 Workshop 2: Soren Auer BDE project OverviewBigData_Europe
 
Best Practise: Why Vienna has Open Data
Best Practise: Why Vienna has Open DataBest Practise: Why Vienna has Open Data
Best Practise: Why Vienna has Open DataMartin Kaltenböck
 
SC7 Workshop 2: The BigDataEurope project
SC7 Workshop 2: The BigDataEurope projectSC7 Workshop 2: The BigDataEurope project
SC7 Workshop 2: The BigDataEurope projectBigData_Europe
 
BDE SC6 workshop - introduction 2016
BDE SC6 workshop - introduction 2016BDE SC6 workshop - introduction 2016
BDE SC6 workshop - introduction 2016BigData_Europe
 
SC1 Workshop 2 General Introduction to BDE
SC1 Workshop 2 General Introduction to BDESC1 Workshop 2 General Introduction to BDE
SC1 Workshop 2 General Introduction to BDEBigData_Europe
 
European Open Data Portal and Policy Compass: from national Open Data reposit...
European Open Data Portal and Policy Compass: from national Open Data reposit...European Open Data Portal and Policy Compass: from national Open Data reposit...
European Open Data Portal and Policy Compass: from national Open Data reposit...OW2
 
ICARUS @EBDVF 2018 - BDVA Policy Session (November 2018, Vienna)
ICARUS @EBDVF 2018 - BDVA Policy Session (November 2018, Vienna)ICARUS @EBDVF 2018 - BDVA Policy Session (November 2018, Vienna)
ICARUS @EBDVF 2018 - BDVA Policy Session (November 2018, Vienna)ICARUS2020.aero
 
Giorgio Alleva, Data Innovation in Official Statistics: the Leading Role of O...
Giorgio Alleva, Data Innovation in Official Statistics: the Leading Role of O...Giorgio Alleva, Data Innovation in Official Statistics: the Leading Role of O...
Giorgio Alleva, Data Innovation in Official Statistics: the Leading Role of O...Istituto nazionale di statistica
 
BDE SC6-pilot - 05/12/16 - cologne Michalis Vafopoulos
BDE SC6-pilot - 05/12/16 - cologne Michalis VafopoulosBDE SC6-pilot - 05/12/16 - cologne Michalis Vafopoulos
BDE SC6-pilot - 05/12/16 - cologne Michalis VafopoulosBigData_Europe
 

Similar to DATASTAT HUB (20)

HLG Big Data project and Sandbox
HLG Big Data project and SandboxHLG Big Data project and Sandbox
HLG Big Data project and Sandbox
 
data - driven journalism 1
 data - driven journalism 1 data - driven journalism 1
data - driven journalism 1
 
BDE SC6-hang out - technology part-SWC - Martin
BDE SC6-hang out - technology part-SWC - MartinBDE SC6-hang out - technology part-SWC - Martin
BDE SC6-hang out - technology part-SWC - Martin
 
ADEQUATe and CommuniData
ADEQUATe and CommuniDataADEQUATe and CommuniData
ADEQUATe and CommuniData
 
BDE SC6-ws-05/12/2016 technology part - SWC
BDE SC6-ws-05/12/2016 technology part - SWCBDE SC6-ws-05/12/2016 technology part - SWC
BDE SC6-ws-05/12/2016 technology part - SWC
 
The EUnetHTA perspective on the HTA database
The EUnetHTA perspective on the HTA databaseThe EUnetHTA perspective on the HTA database
The EUnetHTA perspective on the HTA database
 
Enabling Data Analytics from Knowledge Graphs @ ISWC 2017 Doctoral Consortium
Enabling Data Analytics from Knowledge Graphs @ ISWC 2017 Doctoral ConsortiumEnabling Data Analytics from Knowledge Graphs @ ISWC 2017 Doctoral Consortium
Enabling Data Analytics from Knowledge Graphs @ ISWC 2017 Doctoral Consortium
 
Intact danish workshop_20171001
Intact danish workshop_20171001Intact danish workshop_20171001
Intact danish workshop_20171001
 
SC4 Workshop 2: Soren Auer BDE project Overview
SC4 Workshop 2: Soren Auer BDE project OverviewSC4 Workshop 2: Soren Auer BDE project Overview
SC4 Workshop 2: Soren Auer BDE project Overview
 
Census Hub Project
Census Hub ProjectCensus Hub Project
Census Hub Project
 
Best Practise: Why Vienna has Open Data
Best Practise: Why Vienna has Open DataBest Practise: Why Vienna has Open Data
Best Practise: Why Vienna has Open Data
 
SC7 Workshop 2: The BigDataEurope project
SC7 Workshop 2: The BigDataEurope projectSC7 Workshop 2: The BigDataEurope project
SC7 Workshop 2: The BigDataEurope project
 
BDE SC6 workshop - introduction 2016
BDE SC6 workshop - introduction 2016BDE SC6 workshop - introduction 2016
BDE SC6 workshop - introduction 2016
 
Harvesting&Metadata Enrich Project EVA 2009
Harvesting&Metadata Enrich Project   EVA 2009Harvesting&Metadata Enrich Project   EVA 2009
Harvesting&Metadata Enrich Project EVA 2009
 
SC1 Workshop 2 General Introduction to BDE
SC1 Workshop 2 General Introduction to BDESC1 Workshop 2 General Introduction to BDE
SC1 Workshop 2 General Introduction to BDE
 
European Open Data Portal and Policy Compass: from national Open Data reposit...
European Open Data Portal and Policy Compass: from national Open Data reposit...European Open Data Portal and Policy Compass: from national Open Data reposit...
European Open Data Portal and Policy Compass: from national Open Data reposit...
 
ICARUS @EBDVF 2018 - BDVA Policy Session (November 2018, Vienna)
ICARUS @EBDVF 2018 - BDVA Policy Session (November 2018, Vienna)ICARUS @EBDVF 2018 - BDVA Policy Session (November 2018, Vienna)
ICARUS @EBDVF 2018 - BDVA Policy Session (November 2018, Vienna)
 
Giorgio Alleva, Data Innovation in Official Statistics: the Leading Role of O...
Giorgio Alleva, Data Innovation in Official Statistics: the Leading Role of O...Giorgio Alleva, Data Innovation in Official Statistics: the Leading Role of O...
Giorgio Alleva, Data Innovation in Official Statistics: the Leading Role of O...
 
BDVA default slide pack
BDVA default slide packBDVA default slide pack
BDVA default slide pack
 
BDE SC6-pilot - 05/12/16 - cologne Michalis Vafopoulos
BDE SC6-pilot - 05/12/16 - cologne Michalis VafopoulosBDE SC6-pilot - 05/12/16 - cologne Michalis Vafopoulos
BDE SC6-pilot - 05/12/16 - cologne Michalis Vafopoulos
 

More from ALESSANDRO CAPEZZUOLI

SETTIMANA SOCIOLOGIA - Professioni.pdf
SETTIMANA SOCIOLOGIA - Professioni.pdfSETTIMANA SOCIOLOGIA - Professioni.pdf
SETTIMANA SOCIOLOGIA - Professioni.pdfALESSANDRO CAPEZZUOLI
 
Trasformazione Digitale - Strumenti per i webmeeting
Trasformazione Digitale - Strumenti per i webmeetingTrasformazione Digitale - Strumenti per i webmeeting
Trasformazione Digitale - Strumenti per i webmeetingALESSANDRO CAPEZZUOLI
 
Trasformazione Digitale - Le competenze
Trasformazione Digitale - Le competenzeTrasformazione Digitale - Le competenze
Trasformazione Digitale - Le competenzeALESSANDRO CAPEZZUOLI
 
La trasformazione digitale, le professioni e le competenze
La trasformazione digitale, le professioni e le competenzeLa trasformazione digitale, le professioni e le competenze
La trasformazione digitale, le professioni e le competenzeALESSANDRO CAPEZZUOLI
 
Professioni, RiformAttiva, Pubblica amministrazione
Professioni, RiformAttiva, Pubblica amministrazioneProfessioni, RiformAttiva, Pubblica amministrazione
Professioni, RiformAttiva, Pubblica amministrazioneALESSANDRO CAPEZZUOLI
 
LOD. Open data, sistema informativo professioni, tassonomie, certificazione c...
LOD. Open data, sistema informativo professioni, tassonomie, certificazione c...LOD. Open data, sistema informativo professioni, tassonomie, certificazione c...
LOD. Open data, sistema informativo professioni, tassonomie, certificazione c...ALESSANDRO CAPEZZUOLI
 
Professioni ICT conoscerle e certificarle
Professioni ICT conoscerle e certificarleProfessioni ICT conoscerle e certificarle
Professioni ICT conoscerle e certificarleALESSANDRO CAPEZZUOLI
 
Professioni e orientamento: dagli open data al mercato del lavoro
Professioni e orientamento: dagli open data al mercato del lavoroProfessioni e orientamento: dagli open data al mercato del lavoro
Professioni e orientamento: dagli open data al mercato del lavoroALESSANDRO CAPEZZUOLI
 
Open Data, Datasharing, Sematic Web, Linked Open Data
Open Data, Datasharing, Sematic Web, Linked Open DataOpen Data, Datasharing, Sematic Web, Linked Open Data
Open Data, Datasharing, Sematic Web, Linked Open DataALESSANDRO CAPEZZUOLI
 

More from ALESSANDRO CAPEZZUOLI (12)

SISTEMA INFORMATIVO PROFESSIONI
SISTEMA INFORMATIVO PROFESSIONISISTEMA INFORMATIVO PROFESSIONI
SISTEMA INFORMATIVO PROFESSIONI
 
DATI E COOPERAZIONE APPLICATIVA
DATI E COOPERAZIONE APPLICATIVADATI E COOPERAZIONE APPLICATIVA
DATI E COOPERAZIONE APPLICATIVA
 
SETTIMANA SOCIOLOGIA - Professioni.pdf
SETTIMANA SOCIOLOGIA - Professioni.pdfSETTIMANA SOCIOLOGIA - Professioni.pdf
SETTIMANA SOCIOLOGIA - Professioni.pdf
 
Trasformazione Digitale - Strumenti per i webmeeting
Trasformazione Digitale - Strumenti per i webmeetingTrasformazione Digitale - Strumenti per i webmeeting
Trasformazione Digitale - Strumenti per i webmeeting
 
Trasformazione Digitale - Le competenze
Trasformazione Digitale - Le competenzeTrasformazione Digitale - Le competenze
Trasformazione Digitale - Le competenze
 
Trasformazione Digitale
Trasformazione DigitaleTrasformazione Digitale
Trasformazione Digitale
 
La trasformazione digitale, le professioni e le competenze
La trasformazione digitale, le professioni e le competenzeLa trasformazione digitale, le professioni e le competenze
La trasformazione digitale, le professioni e le competenze
 
Professioni, RiformAttiva, Pubblica amministrazione
Professioni, RiformAttiva, Pubblica amministrazioneProfessioni, RiformAttiva, Pubblica amministrazione
Professioni, RiformAttiva, Pubblica amministrazione
 
LOD. Open data, sistema informativo professioni, tassonomie, certificazione c...
LOD. Open data, sistema informativo professioni, tassonomie, certificazione c...LOD. Open data, sistema informativo professioni, tassonomie, certificazione c...
LOD. Open data, sistema informativo professioni, tassonomie, certificazione c...
 
Professioni ICT conoscerle e certificarle
Professioni ICT conoscerle e certificarleProfessioni ICT conoscerle e certificarle
Professioni ICT conoscerle e certificarle
 
Professioni e orientamento: dagli open data al mercato del lavoro
Professioni e orientamento: dagli open data al mercato del lavoroProfessioni e orientamento: dagli open data al mercato del lavoro
Professioni e orientamento: dagli open data al mercato del lavoro
 
Open Data, Datasharing, Sematic Web, Linked Open Data
Open Data, Datasharing, Sematic Web, Linked Open DataOpen Data, Datasharing, Sematic Web, Linked Open Data
Open Data, Datasharing, Sematic Web, Linked Open Data
 

Recently uploaded

Pollinator Ambassador Earth Steward Day Presentation 2024-05-22
Pollinator Ambassador Earth Steward Day Presentation 2024-05-22Pollinator Ambassador Earth Steward Day Presentation 2024-05-22
Pollinator Ambassador Earth Steward Day Presentation 2024-05-22LHelferty
 
527598851-ppc-due-to-various-govt-policies.pdf
527598851-ppc-due-to-various-govt-policies.pdf527598851-ppc-due-to-various-govt-policies.pdf
527598851-ppc-due-to-various-govt-policies.pdfrajpreetkaur75080
 
Eureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 PresentationEureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 PresentationAccess Innovations, Inc.
 
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Orkestra
 
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
0x01 - Newton's Third Law:  Static vs. Dynamic Abusers0x01 - Newton's Third Law:  Static vs. Dynamic Abusers
0x01 - Newton's Third Law: Static vs. Dynamic AbusersOWASP Beja
 
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdfOracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdfSkillCertProExams
 
Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...
Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...
Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...Rahsaan L. Browne
 
The Canoga Gardens Development Project. PDF
The Canoga Gardens Development Project. PDFThe Canoga Gardens Development Project. PDF
The Canoga Gardens Development Project. PDFRahsaan L. Browne
 
Getting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerGetting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerVladimir Samoylov
 
Hi-Tech Industry 2024-25 Prospective.pptx
Hi-Tech Industry 2024-25 Prospective.pptxHi-Tech Industry 2024-25 Prospective.pptx
Hi-Tech Industry 2024-25 Prospective.pptxShivamM16
 
05232024 Joint Meeting - Community Networking
05232024 Joint Meeting - Community Networking05232024 Joint Meeting - Community Networking
05232024 Joint Meeting - Community NetworkingMichael Orias
 
123445566544333222333444dxcvbcvcvharsh.pptx
123445566544333222333444dxcvbcvcvharsh.pptx123445566544333222333444dxcvbcvcvharsh.pptx
123445566544333222333444dxcvbcvcvharsh.pptxgargh1099
 
Introduction of Biology in living organisms
Introduction of Biology in living organismsIntroduction of Biology in living organisms
Introduction of Biology in living organismssoumyapottola
 
Acorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesAcorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesIP ServerOne
 

Recently uploaded (14)

Pollinator Ambassador Earth Steward Day Presentation 2024-05-22
Pollinator Ambassador Earth Steward Day Presentation 2024-05-22Pollinator Ambassador Earth Steward Day Presentation 2024-05-22
Pollinator Ambassador Earth Steward Day Presentation 2024-05-22
 
527598851-ppc-due-to-various-govt-policies.pdf
527598851-ppc-due-to-various-govt-policies.pdf527598851-ppc-due-to-various-govt-policies.pdf
527598851-ppc-due-to-various-govt-policies.pdf
 
Eureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 PresentationEureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 Presentation
 
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
 
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
0x01 - Newton's Third Law:  Static vs. Dynamic Abusers0x01 - Newton's Third Law:  Static vs. Dynamic Abusers
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
 
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdfOracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
 
Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...
Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...
Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...
 
The Canoga Gardens Development Project. PDF
The Canoga Gardens Development Project. PDFThe Canoga Gardens Development Project. PDF
The Canoga Gardens Development Project. PDF
 
Getting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerGetting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control Tower
 
Hi-Tech Industry 2024-25 Prospective.pptx
Hi-Tech Industry 2024-25 Prospective.pptxHi-Tech Industry 2024-25 Prospective.pptx
Hi-Tech Industry 2024-25 Prospective.pptx
 
05232024 Joint Meeting - Community Networking
05232024 Joint Meeting - Community Networking05232024 Joint Meeting - Community Networking
05232024 Joint Meeting - Community Networking
 
123445566544333222333444dxcvbcvcvharsh.pptx
123445566544333222333444dxcvbcvcvharsh.pptx123445566544333222333444dxcvbcvcvharsh.pptx
123445566544333222333444dxcvbcvcvharsh.pptx
 
Introduction of Biology in living organisms
Introduction of Biology in living organismsIntroduction of Biology in living organisms
Introduction of Biology in living organisms
 
Acorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesAcorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutes
 

DATASTAT HUB

  • 1. A tool for the automatic collection of administrative data to produce official statistics Conference of European Statistics Stakeholders Budapest, 20-21 October 2016 Alessandro Capezzuoli, Emanuela Recchini
  • 2. Official statistics and data integration1 3 4 2 Model Technology Architecture 5 Concluding remarks DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
  • 3. 1. Official statistics and data integration 1 Bringing together information from different sources makes it possible to fill information gaps or provide insights which cannot be gleaned from unlinked data and to improve the knowledge and understanding of specific phenomena. Introductory remarks (1) DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016 There is worldwide recognition of the increasing role played by administrative data in the production of more timely, more disaggregated statistics at higher frequencies than traditional survey data. The efficient use of all available information to produce timely, accurate and high quality statistics is a challenge for National Statistical Offices (NSOs), which are even more committed to developing methods and suitable tools for the production, collection, standardization and integration of different types of statistical data.
  • 4. Nowadays, the exploitation of administrative data for statistical purposes is a normal practice for a large number of NSOs. This improves the quality of statistical outputs, reduces the statistical burden on respondents and minimizes costs. The Italian National Institute of Statistics (Istat) collects and manages a large amounts of administrative data from different sources, among which: • Italian Agency of Revenue • Bank of Italy • Ministries • Social Security Institutions • Government Institutions • Private Institutions • … From 2009 to 2015, administrative data supplied to Istat have trebled 1. Official statistics and data integration Introductory remarks (2) 2 DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
  • 5. According to the provisions of the Italian Digital Administration Code: ➢ before proceeding to the collection of new data, public administrations are required to verify whether the information they need can be acquired through access to information already in the possession of other public authorities or public bodies. ➢ the technical options for the usability of data are:  web access through the website of the supplier institution or an ad hoc thematic website  Interoperability among public administrations for data collection and data integration  the user can process data collected exclusively for the pursuit of its institutional goals; data transfer from one information system to another does not change data ownership  the transfer of a data from an information system to another does not change the ownership of the given 1. Official statistics and data integration The Italian legislation on data collection (Guidelines for the drafting of conventions on the usability Public Administrations data; Legislative Decree n. 82/2005, commonly referred to as the “Digital Administration Code”, modified by the Legislative Decree n. 235/2010) 3 DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
  • 6. 1. Official statistics and data integration Administrative data collected by Istat Data collected by Istat are very different from each other in type, content and structure 4 DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
  • 7. DATA SUPPLIER - receives data requests - elaborates data requests - prepares data to be sent - sends data to data collector DATA COLLECTOR - manages data requests - defines methods and standards - manages reminders - stores data and metadata - standardizes and disseminates data 1. Official statistics and data integration Data collection process (1) 5 DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
  • 8. 1. Official statistics and data integration Data collection process (2) ✓ Data collection through File Transfer Protocol (FTP) ✓ Data uploading through an ad hoc website to manage reminders and data supply requests THESE SOLUTIONS DO NOT PERMIT PROCESS AUTOMATION ✓ Management of data requests and reminders ✓ Complex IT infrastructure ✓ Burden for data suppliers ✓ Human resources for transactions management 6 DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
  • 9. 2. Tecnology Representational State Transfer (REST) • is not a standard, is just an architecture style for designing networked applications • defines a set of guidelines to use the HTTP protocol in order to perform 4 operations summarized in the acronym CRUD (Create, Read, Update, Delete), by means of an API (Application Programming Interface). …the World Wide Web offers a possible solution! HTTP (Hypertext Transfer Protocol), the set of rules for transferring files on the Web, can be conveniently used for data collection and data exchange. It is a request/response protocol based on the client-server architecture. 7 DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
  • 10. CRUD principles REST is a service concept that may be summarized by the CRUD principles REST allows data suppliers to create, read and update resources with a logic similar to that used to perform operations on any SQL database. 2. Tecnology 8 DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
  • 11. REST architecture enables users to separate relational DB from the client through an API, which exploits HTTP to transmit data and exchange information. 2. Tecnology 9 DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
  • 12. 3. Model UNSTRUCTURED DATA - a model collecting data in their essence (key/value) is more convenient and immediate than defining multiple standards for data representation; SCALABILITY - a highly extensible architecture is needed, in case of possible conceptual/architectural future upgrade; INTUITIVE SCHEMA - the model should be easily applied by data suppliers, without resorting to complex studies of any imposed standard; BIG-DATA-ORIENTED ARCHITECTURE - the system should be in line with big-data processing techniques; INTEGRATION WITH MODERN IT TOOLS FOR BIG DATA - storage is closely linked to the tools used for semantic search, data analysis and data visualization. Elasticsearch, Hadhoop, Solr, Cassandra provide a complete integrated environment for managing them. The different types of data, IT tools and skills of data suppliers require a model implying: 10 DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
  • 13. KEY/VALUE storage model { "keyspace" : { "columnfamily" : { "rowkey" : { "supercolumn" : { "column name" : "column value" } } } } } Statistical Key Value Data Model 3. Model The format that is better suited for HTTP use is JSON (JavaScript Object Notation) to which different models for data representation can be associated. In particular, dealing with highly heterogeneous data, it is recommended to use a model to represent them in their simplest form: a key/value pair. 11 DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
  • 14. 4. Architecture DataSTAT Hub is a tool for data collection that takes advantage of the potential offered by HTTP 2.0 and REST architecture and exploits the methods offered by the CRUD architecture (Create, Read, Update, Delete). 12 DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
  • 15. Most entities or objects in most applications can be serialized into a JSON object, with keys and values. A key is the name of a field or property, and a value can be a string, a number, a Boolean, another object, an array of values, or some other specialized type such as a string representing a date or an object representing a geolocation. Elasticsearch is an open source search engine that can be conveniently used for collection and release of data. Through Elasticsearch it is possible to index and map documents/data through querystrings to be sent via HTTP in JSON format. 4. Architecture Documents are indexed—stored and made searchable—by using the index API, which uniquely identify the document. Mapping is the process of defining how a document, and the fields it contains, are stored and indexed. DOCUMENT INDEX / TYPE MAPPING 13 DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
  • 16. ELASTICSERACH Data contained in the index can be easily stored in a database that uses the Key/Value model (Eg. Cassandra) Data suppliers can autonomously create data index, describe data content and perform any operation on them (put/update/delete/get) Indexed data have an immediate dissemination channel which Elasticsearch is associated to as a powerful engine for searching among big data and, possibly, an API that standardizes the output 4. Architecture DATA SUPPLIER OUTPUT CHANNEL DATA STORAGE 14 DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
  • 17. ELASTICSERACH 4. Architecture DATA SUPPLIER 15 DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016 SEARCH ENGINE REST WEBSERVICES WIDGET / USERS INTERFACE Datastat Hub applied to statistical classifications www. statisticlass.eu
  • 18. 5. Concluding remarks DataSTAT Hub is a suitable and easy tool for the automated collection, standardization and integration of administrative data. Reduction of burden on users: this hub does not require the knowledge of the internal data base since the updating is performed through the HTTP querystrings and can be used with any programming language; once created, the procedure will be used for each next data supply. Reduction of costs in terms of employment of human resources for organizational, bureaucratic and IT management By allowing us to overcome some critical issues related to the use of administrative data, including those connected with privacy and security, a tool such as DataSTAT Hub is time-saving and cost-effective. It is a user-friendly tool developed by making use of open source technologies and can be conveniently shared among NSOs, while it is extensible to any other institution. 16 DataSTAT Hub: a tool for the automatic collection of administrative data to produce official statistics Alessandro Capezzuoli, Emanuela Recchini – Conference of European Statistics Stakeholders, Budapest, 20-21 October 2016
  • 19. alessandro.capezzuoli@istat.it emanuela.recchini@istat.it THANK YOU FOR YOUR ATTENTION FOR ANY QUESTIONS CONTACT US: