Census Hub Project


Published on

The Census Hub Project can be considerated at the moment as the most advanced project where Internet technologies and SDMX solutions for data transmission get together for an ambicious goal: the data dissemination of Census 2011 results.
We analyze the Census Hub architecture, where a central Hub at Eurostat side manage the user interface, transforming all selections made by the user on the screen in an sdmx query. This query is sent to the web service at NSI side, that parses the query and transforms it in an SQL query that can be used with a data base containing census data. Depending on how many countrys are involved in the answer, the hub will query the web service provided for that country. Finally, the Hub receive all answer fron NSI's and build up a final table, putting all answers toghether. The importance of this implementation is that is a completely new system that change completely the way to disseminate and exchange official data among organizations.

Published in: Technology, Economy & Finance
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Census Hub Project

  1. 1. The Census European Hub Project Workshop on Data Transmission 17-19 June Becici - Montenegro Vincenzo PATRUNO
  2. 2. Overview It's the proposal of a new system to achieve the publication of the 2011 Census data on Eurostat website using SDMX standards
  3. 3. Overview Census taking is a very cost intensive exercise justified by the unparalleled quality of the result. Important aspects of that quality are: <ul><li>The flexibility to cross tabulate different variables
  4. 4. An easy access to data
  5. 5. Detailed data methodogically comparable </li></ul>
  6. 6. Overview F L E X I B I L I T Y HAR MO NI ZA TION
  7. 7. Access to detailed Census data that are methodologically comparable among the Member States and structured in the same way Harmonization
  8. 8. Final user should have the possibility to cross tabulate different variables Flexibility
  9. 9. The Goals The dissemination of the result of the censuses in the EU should reflect those advantages to the highest possible extent.
  10. 10. The Traditional Approach Member States provide microdata to Eurostat. Eurostat aggregates microdata and stores obtained data in a central repository. This repository will be used for data dissemination Member States provide predefinited tables to Eurostat. Eurostat publishes those tables on its website 1 2
  11. 11. Approach (1) maximises flexibility in offering data to final users. But: – Aggregation functions on the central system could be very difficult to implement due to: • different confidentiality rules to be applied to microdata from different Countries; • whether data come from a &quot;full&quot; census (conventional or register-based) or from a sample survey. – Data maintenance could be very cumbersome because every time a revision is issued, an entire set of microdata needs to be updated or replaced. The Traditional Approach
  12. 12. Approach (2) greatly simplifies the exercise But: It doesn't offer enough flexibility to final users, who would have limited possibilities to tailor data to their information needs. The Traditional Approach
  13. 13. The Traditional Approach NSIs EUROSTAT
  14. 14. We have normally two different approach to exchange data: PUSH and PULL Push and Pool
  15. 15. PUSH mode means that the data provider takes action to send the data to the party collecting the data. PULL mode implies that the data provider makes the data available via the Internet. The data consumer then fetches the data on his own initiative. Push and Pool
  16. 16. SDMX is primarily focused on the exchange and dissemination of statistical data and metadata. SDMX promotes a “ data sharing ” model to facilitate low-cost, high-quality statistical data and metadata exchange. Data Providers publishes the availability of data/metadata to Data Consumers and the latter are responsible for fetching the data/metadata at will. . Data Sharing Model
  17. 17. <ul><li>Data-sharing only works if there are standard formats
  18. 18. Like the Web itself, a data-sharing model relies on pull exchanges, not push exchanges </li><ul><li>Data consumers discover the data they need, and its location, and then go and get it
  19. 19. Data producers don’t have to send data </li></ul></ul>Notes about Data Sharing
  20. 20. The Census Hub is based on the concept of data sharing : A group of partners agree on providing access to their data according to standard processes, formats and technologies The Census Hub Idea IT, IE, DE, PT, MT, SI, EE, BG Countries involved GB, ES and GR Additional Countries involved before the end of the year
  21. 21. <ul>SDMX standards support the &quot;pull&quot; mode of data sharing, where the collecting organization retrieves the data from the providers' web servers. The data: <ul><li>may be made available for download in a SDMX-conformant file
  22. 22. may be retrieved from a database in response to an SDMX-conformant query </li></ul>This architecture often includes also an SDMX registry that implements the general idea of a metadata registry </ul>The Census Hub Idea
  23. 23. Each National Statistics Institute (NSI) creates a set of non-disclosure data. The delivery of this data would be via an information hub that enabled data sharing on the Internet. Each NSI would provide web access to their data according to standard formats and technologies. A data user browses the hub to search for a dataset of interest using structural metadata (dimensions, attributes, code lists, etc). Data is retrieved directly from the NSI system to the Hub. The Census Hub Idea
  24. 24. The Pilot Project Architecture
  25. 25. Census Hub pilot project architecture <ul><li>The central Hub – Eurostat side </li></ul><ul><li>The web service – NSI side </li></ul><ul><li>The pilot hypercube </li></ul><ul><ul><li>Sex
  26. 26. Age
  27. 27. Current Activity Status
  28. 28. Geography </li></ul></ul>
  29. 29. Data Sharing in Census Hub Query SDMX Data SDMX-ML WS NSI
  30. 30. The Pilot Project Architecture The Q uery builder constructs one or more SDMX queries that will be sent to the related NSIs web services through the W eb service client. When the Web service client receives the responses (in the format of a SDMX cross-sectional data message) from the queried web services, it forwards those to the Result aggregation manager . The Result aggregation manager puts together all the received SDMX data messages and sends the result to the D issemination transformer that makes a transformation from an XML format to HTML or CSV.
  31. 31. The Pilot Project Architecture The web service receives a SDMX query and forwards it to the SDMX q uery parser . The SDMX Q uery parser breaks down the query and sends it to the SQL query builder . The SQL query builder creates one or more SQL queries and sends them to D atabase . The result is assembled, by the SDMX-ML assembler , in a SDMX cross-sectional message that will be sent, by the web service, to the central Hub. NSI
  32. 32. The Pilot Project Architecture Statistics Portugal Architecture Model
  33. 33. The Pilot Project Architecture Statistisches Bundesamt Architecture Model
  34. 34. The Pilot Project Census Task Force (in the April 2007 meeting) agreed to explore the Hub solution and decided to launch a pilot project (DE, IE, IT and PT involved); Eurostat define some guidelines to this project: <ul><ul><li>Simple hypercube in order to let NSI produce it quickly;
  35. 35. Data should comprise the following dimensions: Sex, Age, Current Activity Status and Territory;
  36. 36. A Data Structure Definition also provided </li></ul></ul>
  37. 37. <ul><li>January 2008: start of the pilot project. Four countries decided to participate (Germany, Ireland, Italy and Portugal);
  38. 38. March 2008: preparation of requirement specification, functional and technical analysis;
  39. 39. April 2008: choice of one data hypercube and related breakdowns to use during the pilot; development of the Data Structure Definition (DSD);
  40. 40. June - September 2008: building of application modules (both Eurostat and NSI side); tests;
  41. 41. October 2008: evaluation report of the pilot; functional and technical analysis for the full 2011 Census Hub. </li></ul>The Pilot Project Roadmap
  42. 42. Eurostat has developed the central Hub and, at the beginning of February 2009, it will be accessible in a test environment . Italy, Portugal, Germany and Ireland have already setup the architecture Italy, Portugal and Ireland have produced documents (available on CIRCA) regarding their experience during the pilot phase ( http://circa.europa.eu/Members/irc/dsis/x-dis-xensus-hub/library?l=/census_documents_1/case_studies) Results of the pilot project
  43. 44. Moreover it was produced the Census Hub Web Service implementation Guidelines3 that explains how to build web services, using different IT technologies, capable of communicating correctly with the central hub. (http://circa.europa.eu/Members/irc/dsis/x-dis-xensus-hub/library?l=/census_documents_1/documents ) Finally it is important to highlight how sharing experience and software, between all the involved actors (Eurostat and NSIs), have allowed the reduction of production costs and development time. Results of the pilot project
  44. 45. The following benefits will be real: <ul><li>P articipants will be part of a project that will allow them to share experiences among the different actors, both statisticians and IT personnel, at different levels (planning, production, etc.);
  45. 46. Participants will build an IT infrastructure useful not only for the pilot exercise but also for their 2011 census data warehouse using standards recognized at international level;
  46. 47. The same SDMX architecture could be used in other projects with few or no changes. </li></ul>Benefits in participating to the project
  47. 48. <ul><li>Costs for implementing an SDMX infrastructure needed for the Census Hub Pilot Project are limited and can be embedded in the more general project that each NSI will support for the 2011 Census;
  48. 49. The use of an XML-based data format will help to reduce costs of implementation as follows: </li></ul><ul><li>many NSIs are already using, or planning to use XML as the basis for their data management and dissemination systems;
  49. 50. a wide selection of IT commercial applications and tools are available to work with XML-based data;
  50. 51. expertise for working with XML is readily available and will often be available in-house </li></ul><ul><li>K nowledge and software developed by the participants at the first phase of the pilot are available and can be used immediately </li></ul>Costs in participating in the project
  51. 52. <ul><li>Involve more Member States in the exercise </li></ul><ul><li>Develop and Test additional functionalities </li></ul><ul><ul><li>Cache system
  52. 53. New GUI </li></ul></ul><ul><li>Develop all the necessary DSDs related to the more 100 hypercubes foreseen in the “population and housing regulation” </li></ul>What milestones in 2009
  53. 54. <ul><li>The Census Hub pilot project has been necessary in order to well understand how to proceed for the 2011 Census
  54. 55. The used architecture represents the most advanced example of the data sharing detailed in the SDMX standards
  55. 56. Volunteer NSIs can acquire a good experience in managing complex IT projects and a good knowledge of SDMX standards
  56. 57. As the Pilot has been planned as simple as possible in order to let all the NSIs participate with a minor effort, this project is a good occasion for all those who want to start using SDMX </li></ul>Conclusion
  57. 58. Thank You for Your Attention [email_address]