Your SlideShare is downloading. ×
Census Hub Project
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Census Hub Project


Published on

The Census Hub Project can be considerated at the moment as the most advanced project where Internet technologies and SDMX solutions for data transmission get together for an ambicious goal: the data …

The Census Hub Project can be considerated at the moment as the most advanced project where Internet technologies and SDMX solutions for data transmission get together for an ambicious goal: the data dissemination of Census 2011 results.
We analyze the Census Hub architecture, where a central Hub at Eurostat side manage the user interface, transforming all selections made by the user on the screen in an sdmx query. This query is sent to the web service at NSI side, that parses the query and transforms it in an SQL query that can be used with a data base containing census data. Depending on how many countrys are involved in the answer, the hub will query the web service provided for that country. Finally, the Hub receive all answer fron NSI's and build up a final table, putting all answers toghether. The importance of this implementation is that is a completely new system that change completely the way to disseminate and exchange official data among organizations.

Published in: Technology, Economy & Finance
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. The Census European Hub Project Workshop on Data Transmission 17-19 June Becici - Montenegro Vincenzo PATRUNO
  • 2. Overview It's the proposal of a new system to achieve the publication of the 2011 Census data on Eurostat website using SDMX standards
  • 3. Overview Census taking is a very cost intensive exercise justified by the unparalleled quality of the result. Important aspects of that quality are:
    • The flexibility to cross tabulate different variables
    • 4. An easy access to data
    • 5. Detailed data methodogically comparable
  • 6. Overview F L E X I B I L I T Y HAR MO NI ZA TION
  • 7. Access to detailed Census data that are methodologically comparable among the Member States and structured in the same way Harmonization
  • 8. Final user should have the possibility to cross tabulate different variables Flexibility
  • 9. The Goals The dissemination of the result of the censuses in the EU should reflect those advantages to the highest possible extent.
  • 10. The Traditional Approach Member States provide microdata to Eurostat. Eurostat aggregates microdata and stores obtained data in a central repository. This repository will be used for data dissemination Member States provide predefinited tables to Eurostat. Eurostat publishes those tables on its website 1 2
  • 11. Approach (1) maximises flexibility in offering data to final users. But: – Aggregation functions on the central system could be very difficult to implement due to: • different confidentiality rules to be applied to microdata from different Countries; • whether data come from a "full" census (conventional or register-based) or from a sample survey. – Data maintenance could be very cumbersome because every time a revision is issued, an entire set of microdata needs to be updated or replaced. The Traditional Approach
  • 12. Approach (2) greatly simplifies the exercise But: It doesn't offer enough flexibility to final users, who would have limited possibilities to tailor data to their information needs. The Traditional Approach
  • 13. The Traditional Approach NSIs EUROSTAT
  • 14. We have normally two different approach to exchange data: PUSH and PULL Push and Pool
  • 15. PUSH mode means that the data provider takes action to send the data to the party collecting the data. PULL mode implies that the data provider makes the data available via the Internet. The data consumer then fetches the data on his own initiative. Push and Pool
  • 16. SDMX is primarily focused on the exchange and dissemination of statistical data and metadata. SDMX promotes a “ data sharing ” model to facilitate low-cost, high-quality statistical data and metadata exchange. Data Providers publishes the availability of data/metadata to Data Consumers and the latter are responsible for fetching the data/metadata at will. . Data Sharing Model
  • 17.
    • Data-sharing only works if there are standard formats
    • 18. Like the Web itself, a data-sharing model relies on pull exchanges, not push exchanges
      • Data consumers discover the data they need, and its location, and then go and get it
      • 19. Data producers don’t have to send data
    Notes about Data Sharing
  • 20. The Census Hub is based on the concept of data sharing : A group of partners agree on providing access to their data according to standard processes, formats and technologies The Census Hub Idea IT, IE, DE, PT, MT, SI, EE, BG Countries involved GB, ES and GR Additional Countries involved before the end of the year
  • 21.
      SDMX standards support the "pull" mode of data sharing, where the collecting organization retrieves the data from the providers' web servers. The data:
      • may be made available for download in a SDMX-conformant file
      • 22. may be retrieved from a database in response to an SDMX-conformant query
      This architecture often includes also an SDMX registry that implements the general idea of a metadata registry
    The Census Hub Idea
  • 23. Each National Statistics Institute (NSI) creates a set of non-disclosure data. The delivery of this data would be via an information hub that enabled data sharing on the Internet. Each NSI would provide web access to their data according to standard formats and technologies. A data user browses the hub to search for a dataset of interest using structural metadata (dimensions, attributes, code lists, etc). Data is retrieved directly from the NSI system to the Hub. The Census Hub Idea
  • 24. The Pilot Project Architecture
  • 25. Census Hub pilot project architecture
    • The central Hub – Eurostat side
    • The web service – NSI side
    • The pilot hypercube
  • 29. Data Sharing in Census Hub Query SDMX Data SDMX-ML WS NSI
  • 30. The Pilot Project Architecture The Q uery builder constructs one or more SDMX queries that will be sent to the related NSIs web services through the W eb service client. When the Web service client receives the responses (in the format of a SDMX cross-sectional data message) from the queried web services, it forwards those to the Result aggregation manager . The Result aggregation manager puts together all the received SDMX data messages and sends the result to the D issemination transformer that makes a transformation from an XML format to HTML or CSV.
  • 31. The Pilot Project Architecture The web service receives a SDMX query and forwards it to the SDMX q uery parser . The SDMX Q uery parser breaks down the query and sends it to the SQL query builder . The SQL query builder creates one or more SQL queries and sends them to D atabase . The result is assembled, by the SDMX-ML assembler , in a SDMX cross-sectional message that will be sent, by the web service, to the central Hub. NSI
  • 32. The Pilot Project Architecture Statistics Portugal Architecture Model
  • 33. The Pilot Project Architecture Statistisches Bundesamt Architecture Model
  • 34. The Pilot Project Census Task Force (in the April 2007 meeting) agreed to explore the Hub solution and decided to launch a pilot project (DE, IE, IT and PT involved); Eurostat define some guidelines to this project:
      • Simple hypercube in order to let NSI produce it quickly;
      • 35. Data should comprise the following dimensions: Sex, Age, Current Activity Status and Territory;
      • 36. A Data Structure Definition also provided
  • 37.
    • January 2008: start of the pilot project. Four countries decided to participate (Germany, Ireland, Italy and Portugal);
    • 38. March 2008: preparation of requirement specification, functional and technical analysis;
    • 39. April 2008: choice of one data hypercube and related breakdowns to use during the pilot; development of the Data Structure Definition (DSD);
    • 40. June - September 2008: building of application modules (both Eurostat and NSI side); tests;
    • 41. October 2008: evaluation report of the pilot; functional and technical analysis for the full 2011 Census Hub.
    The Pilot Project Roadmap
  • 42. Eurostat has developed the central Hub and, at the beginning of February 2009, it will be accessible in a test environment . Italy, Portugal, Germany and Ireland have already setup the architecture Italy, Portugal and Ireland have produced documents (available on CIRCA) regarding their experience during the pilot phase ( Results of the pilot project
  • 43.  
  • 44. Moreover it was produced the Census Hub Web Service implementation Guidelines3 that explains how to build web services, using different IT technologies, capable of communicating correctly with the central hub. ( ) Finally it is important to highlight how sharing experience and software, between all the involved actors (Eurostat and NSIs), have allowed the reduction of production costs and development time. Results of the pilot project
  • 45. The following benefits will be real:
    • P articipants will be part of a project that will allow them to share experiences among the different actors, both statisticians and IT personnel, at different levels (planning, production, etc.);
    • 46. Participants will build an IT infrastructure useful not only for the pilot exercise but also for their 2011 census data warehouse using standards recognized at international level;
    • 47. The same SDMX architecture could be used in other projects with few or no changes.
    Benefits in participating to the project
  • 48.
    • Costs for implementing an SDMX infrastructure needed for the Census Hub Pilot Project are limited and can be embedded in the more general project that each NSI will support for the 2011 Census;
    • 49. The use of an XML-based data format will help to reduce costs of implementation as follows:
    • many NSIs are already using, or planning to use XML as the basis for their data management and dissemination systems;
    • 50. a wide selection of IT commercial applications and tools are available to work with XML-based data;
    • 51. expertise for working with XML is readily available and will often be available in-house
    • K nowledge and software developed by the participants at the first phase of the pilot are available and can be used immediately
    Costs in participating in the project
  • 52.
    • Involve more Member States in the exercise
    • Develop and Test additional functionalities
      • Cache system
      • 53. New GUI
    • Develop all the necessary DSDs related to the more 100 hypercubes foreseen in the “population and housing regulation”
    What milestones in 2009
  • 54.
    • The Census Hub pilot project has been necessary in order to well understand how to proceed for the 2011 Census
    • 55. The used architecture represents the most advanced example of the data sharing detailed in the SDMX standards
    • 56. Volunteer NSIs can acquire a good experience in managing complex IT projects and a good knowledge of SDMX standards
    • 57. As the Pilot has been planned as simple as possible in order to let all the NSIs participate with a minor effort, this project is a good occasion for all those who want to start using SDMX
  • 58. Thank You for Your Attention [email_address]
  • 59.  
  • 60.  
  • 61.  
  • 62.  
  • 63.  
  • 64.  
  • 65.  
  • 66.  
  • 67.