The Census European  Hub Project Workshop on Data Transmission 17-19 June Becici -  Montenegro Vincenzo PATRUNO
Overview   It's the proposal of a  new system   to achieve the publication of the 2011 Census data on Eurostat website  using SDMX standards
Overview   Census taking is a very cost intensive exercise justified by the unparalleled quality of the result.  Important aspects of that quality are: The flexibility to cross tabulate different variables
An easy access to data
Detailed data methodogically comparable
Overview   F L E X I B I L I T Y HAR MO NI ZA TION
Access to detailed Census data that are methodologically comparable among the Member States and structured in the same way  Harmonization
Final user should have the possibility  to cross tabulate different variables  Flexibility
The Goals   The dissemination of the result of the censuses in the EU should reflect those advantages to the highest possible extent.
The Traditional Approach   Member States provide microdata to Eurostat. Eurostat aggregates microdata and stores obtained data in a central repository. This repository will be used for data dissemination Member States provide predefinited tables to Eurostat. Eurostat publishes those tables on its website 1 2
Approach  (1)  maximises flexibility in offering data to final users. But:  –  Aggregation functions on the central system could be very difficult to implement due to: •  different confidentiality rules to be applied to microdata from different Countries; •  whether data come from a "full" census (conventional or register-based) or from a sample survey. –  Data maintenance could be very cumbersome because every time a revision is issued, an entire set of microdata needs to be updated or replaced. The Traditional Approach
Approach  (2)  greatly simplifies the exercise But:  It doesn't offer enough flexibility to final users, who would have limited possibilities to tailor data to their information needs. The Traditional Approach
The Traditional Approach  NSIs EUROSTAT
We have normally two different approach to exchange data:  PUSH  and  PULL Push and Pool
PUSH  mode means that the data provider takes action to send the data to the party collecting the data.  PULL  mode implies that the data provider makes the data available via the Internet. The data consumer then fetches the data on his own initiative.  Push and Pool
SDMX  is primarily focused on the  exchange  and  dissemination  of statistical data and metadata. SDMX promotes a “ data sharing ” model to facilitate low-cost, high-quality statistical data and metadata exchange. Data Providers publishes the availability of data/metadata to Data Consumers and the latter are responsible for fetching the data/metadata at will. . Data Sharing Model
Data-sharing only works if there are  standard formats
Like the Web itself, a data-sharing model relies on  pull  exchanges, not  push  exchanges Data consumers discover the data they need, and its location, and then go and get it
Data producers don’t have to send data Notes about Data Sharing
The Census Hub is based on the concept of  data sharing :  A group of partners agree on providing access to their data according to standard processes, formats and technologies The Census Hub Idea IT, IE, DE, PT, MT, SI, EE, BG Countries involved GB, ES and GR Additional Countries involved before the end of the year
SDMX standards support the  "pull"  mode of data sharing, where the collecting organization retrieves the data from the providers' web servers. The data: may be made available for download in  a SDMX-conformant file
may be retrieved from a database in response to an SDMX-conformant query This architecture often includes also an SDMX registry that  implements the general idea of a metadata registry The Census Hub Idea
Each National Statistics Institute (NSI) creates a set of non-disclosure data.  The delivery of this data would be via an information hub that enabled data sharing on the Internet. Each NSI would provide web access to their data according to standard formats and technologies. A data user browses the hub to search for a dataset of interest using structural metadata (dimensions, attributes, code lists, etc). Data is retrieved directly from the NSI system to the Hub. The Census Hub Idea
The Pilot Project Architecture
Census Hub pilot project architecture The central Hub – Eurostat side The web service – NSI side The pilot hypercube Sex
Age
Current Activity Status
Geography
Data Sharing in Census Hub Query SDMX Data SDMX-ML WS NSI
The Pilot Project Architecture The Q uery builder  constructs one or more SDMX queries that will be sent to the related NSIs web services through the W eb service client.  When the  Web service client  receives the responses (in the format of a SDMX cross-sectional data message) from the queried web services, it forwards those to the  Result aggregation manager .  The  Result aggregation manager  puts together all the received SDMX data messages and sends the result to the D issemination transformer  that makes a transformation from an XML format to HTML or CSV.
The Pilot Project Architecture The  web service  receives a SDMX query and forwards it to the SDMX q uery parser . The  SDMX  Q uery parser  breaks down the query and sends it to the  SQL query builder . The  SQL query builder  creates one or more SQL queries and sends them to D atabase . The result is assembled, by the  SDMX-ML assembler , in a SDMX cross-sectional message that will be sent, by the web service, to the central Hub.  NSI
The Pilot Project Architecture Statistics Portugal Architecture Model
The Pilot Project Architecture Statistisches Bundesamt  Architecture Model
The Pilot Project Census Task Force  (in the April 2007 meeting) agreed to explore the Hub solution and decided to launch a pilot project (DE, IE, IT and PT involved);  Eurostat define some guidelines to this project: Simple hypercube  in order to let NSI produce it quickly;
Data should comprise the following dimensions:  Sex, Age, Current Activity Status and Territory;
A  Data Structure Definition  also provided
January 2008:  start of the pilot project. Four countries decided to participate (Germany, Ireland, Italy and Portugal);
March 2008:  preparation of requirement specification, functional and technical analysis;
April 2008:  choice of one data hypercube and related breakdowns to use during the pilot; development of the Data Structure Definition (DSD);
June - September 2008:  building of application modules (both Eurostat and NSI side); tests;
October 2008:  evaluation report of the pilot; functional and technical analysis for the full 2011 Census Hub.   The Pilot Project Roadmap
Eurostat has developed the central Hub  and, at the beginning of February 2009, it will be accessible in a test environment  . Italy, Portugal, Germany and Ireland have already setup the architecture Italy, Portugal and Ireland have produced documents (available on CIRCA) regarding their experience during the pilot phase  ( http://circa.europa.eu/Members/irc/dsis/x-dis-xensus-hub/library?l=/census_documents_1/case_studies)  Results of the pilot project
 
Moreover it was produced the  Census Hub Web Service implementation Guidelines3  that explains how to build web services, using different IT technologies, capable of communicating correctly with the central hub. (http://circa.europa.eu/Members/irc/dsis/x-dis-xensus-hub/library?l=/census_documents_1/documents )   Finally it is important to highlight how sharing experience and software, between all the involved actors (Eurostat and NSIs), have allowed the reduction of production costs and development time. Results of the pilot project

Census Hub Project

  • 1.
    The Census European Hub Project Workshop on Data Transmission 17-19 June Becici - Montenegro Vincenzo PATRUNO
  • 2.
    Overview It's the proposal of a new system to achieve the publication of the 2011 Census data on Eurostat website using SDMX standards
  • 3.
    Overview Census taking is a very cost intensive exercise justified by the unparalleled quality of the result. Important aspects of that quality are: The flexibility to cross tabulate different variables
  • 4.
  • 5.
  • 6.
    Overview F L E X I B I L I T Y HAR MO NI ZA TION
  • 7.
    Access to detailedCensus data that are methodologically comparable among the Member States and structured in the same way Harmonization
  • 8.
    Final user shouldhave the possibility to cross tabulate different variables Flexibility
  • 9.
    The Goals The dissemination of the result of the censuses in the EU should reflect those advantages to the highest possible extent.
  • 10.
    The Traditional Approach Member States provide microdata to Eurostat. Eurostat aggregates microdata and stores obtained data in a central repository. This repository will be used for data dissemination Member States provide predefinited tables to Eurostat. Eurostat publishes those tables on its website 1 2
  • 11.
    Approach (1) maximises flexibility in offering data to final users. But: – Aggregation functions on the central system could be very difficult to implement due to: • different confidentiality rules to be applied to microdata from different Countries; • whether data come from a "full" census (conventional or register-based) or from a sample survey. – Data maintenance could be very cumbersome because every time a revision is issued, an entire set of microdata needs to be updated or replaced. The Traditional Approach
  • 12.
    Approach (2) greatly simplifies the exercise But: It doesn't offer enough flexibility to final users, who would have limited possibilities to tailor data to their information needs. The Traditional Approach
  • 13.
  • 14.
    We have normallytwo different approach to exchange data: PUSH and PULL Push and Pool
  • 15.
    PUSH modemeans that the data provider takes action to send the data to the party collecting the data. PULL mode implies that the data provider makes the data available via the Internet. The data consumer then fetches the data on his own initiative. Push and Pool
  • 16.
    SDMX isprimarily focused on the exchange and dissemination of statistical data and metadata. SDMX promotes a “ data sharing ” model to facilitate low-cost, high-quality statistical data and metadata exchange. Data Providers publishes the availability of data/metadata to Data Consumers and the latter are responsible for fetching the data/metadata at will. . Data Sharing Model
  • 17.
    Data-sharing only worksif there are standard formats
  • 18.
    Like the Webitself, a data-sharing model relies on pull exchanges, not push exchanges Data consumers discover the data they need, and its location, and then go and get it
  • 19.
    Data producers don’thave to send data Notes about Data Sharing
  • 20.
    The Census Hubis based on the concept of data sharing : A group of partners agree on providing access to their data according to standard processes, formats and technologies The Census Hub Idea IT, IE, DE, PT, MT, SI, EE, BG Countries involved GB, ES and GR Additional Countries involved before the end of the year
  • 21.
    SDMX standards supportthe "pull" mode of data sharing, where the collecting organization retrieves the data from the providers' web servers. The data: may be made available for download in a SDMX-conformant file
  • 22.
    may be retrievedfrom a database in response to an SDMX-conformant query This architecture often includes also an SDMX registry that implements the general idea of a metadata registry The Census Hub Idea
  • 23.
    Each National StatisticsInstitute (NSI) creates a set of non-disclosure data. The delivery of this data would be via an information hub that enabled data sharing on the Internet. Each NSI would provide web access to their data according to standard formats and technologies. A data user browses the hub to search for a dataset of interest using structural metadata (dimensions, attributes, code lists, etc). Data is retrieved directly from the NSI system to the Hub. The Census Hub Idea
  • 24.
    The Pilot ProjectArchitecture
  • 25.
    Census Hub pilotproject architecture The central Hub – Eurostat side The web service – NSI side The pilot hypercube Sex
  • 26.
  • 27.
  • 28.
  • 29.
    Data Sharing inCensus Hub Query SDMX Data SDMX-ML WS NSI
  • 30.
    The Pilot ProjectArchitecture The Q uery builder constructs one or more SDMX queries that will be sent to the related NSIs web services through the W eb service client. When the Web service client receives the responses (in the format of a SDMX cross-sectional data message) from the queried web services, it forwards those to the Result aggregation manager . The Result aggregation manager puts together all the received SDMX data messages and sends the result to the D issemination transformer that makes a transformation from an XML format to HTML or CSV.
  • 31.
    The Pilot ProjectArchitecture The web service receives a SDMX query and forwards it to the SDMX q uery parser . The SDMX Q uery parser breaks down the query and sends it to the SQL query builder . The SQL query builder creates one or more SQL queries and sends them to D atabase . The result is assembled, by the SDMX-ML assembler , in a SDMX cross-sectional message that will be sent, by the web service, to the central Hub. NSI
  • 32.
    The Pilot ProjectArchitecture Statistics Portugal Architecture Model
  • 33.
    The Pilot ProjectArchitecture Statistisches Bundesamt Architecture Model
  • 34.
    The Pilot ProjectCensus Task Force (in the April 2007 meeting) agreed to explore the Hub solution and decided to launch a pilot project (DE, IE, IT and PT involved); Eurostat define some guidelines to this project: Simple hypercube in order to let NSI produce it quickly;
  • 35.
    Data should comprisethe following dimensions: Sex, Age, Current Activity Status and Territory;
  • 36.
    A DataStructure Definition also provided
  • 37.
    January 2008: start of the pilot project. Four countries decided to participate (Germany, Ireland, Italy and Portugal);
  • 38.
    March 2008: preparation of requirement specification, functional and technical analysis;
  • 39.
    April 2008: choice of one data hypercube and related breakdowns to use during the pilot; development of the Data Structure Definition (DSD);
  • 40.
    June - September2008: building of application modules (both Eurostat and NSI side); tests;
  • 41.
    October 2008: evaluation report of the pilot; functional and technical analysis for the full 2011 Census Hub. The Pilot Project Roadmap
  • 42.
    Eurostat has developedthe central Hub and, at the beginning of February 2009, it will be accessible in a test environment . Italy, Portugal, Germany and Ireland have already setup the architecture Italy, Portugal and Ireland have produced documents (available on CIRCA) regarding their experience during the pilot phase ( http://circa.europa.eu/Members/irc/dsis/x-dis-xensus-hub/library?l=/census_documents_1/case_studies) Results of the pilot project
  • 43.
  • 44.
    Moreover it wasproduced the Census Hub Web Service implementation Guidelines3 that explains how to build web services, using different IT technologies, capable of communicating correctly with the central hub. (http://circa.europa.eu/Members/irc/dsis/x-dis-xensus-hub/library?l=/census_documents_1/documents ) Finally it is important to highlight how sharing experience and software, between all the involved actors (Eurostat and NSIs), have allowed the reduction of production costs and development time. Results of the pilot project