SDMX interface for ILOSTAT

438
-1

Published on

ILOSTAT, the new database of labour statistics, has been designed based on a number of key ideas among which the aim to reduce the overburden to data providers by supporting as many data channels as possible, to be metadata driven, and to adopt every possible standard, played a fundamental role.
With these in mind, we developed a bi-directional interface to allow the dissemination and collection of data and metadata from and to ILOSTAT through SDMX datasets and related artefacts.
The implementation project had to get over several issues, especially on the conceptual side.
In this presentation we are going to see how the Software architecture for the interface was defined, the concepts that conforms the ILOSTAT concepts scheme, how it deals with the Descriptive metadata, a crucial resource in ILOSTAT, the definition of the scope of the DSD, with its pros and cons, and the implementation of a virtual registry and versioning system.
(Presented at SDMX Global Conference 2013, Paris)

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
438
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  •   Introduction  SDMX has been «around» the ILO Department of Statisticssince 2002.Difficulties to implementwere:LABORSTA’s information model wasvery hard to match the SDMX IM. Different DB schemas for each type of series, missing keys and multiple unstructured descriptive metadataweresome of the characteristics of LABORSTA system that made SDMX mappingtoodifficult.Lack of resources to undertake the project, and the existence of otherpriorities.The perception that the SDMX standard was not mature enough as to beadopted.ILOSTAT design started in 2010 and the new information model wasconceivedtakingintoconsideration SDMX Content Oriented Guidelines.At the same time, SDMX wasconsidered as an integral part of the ILOSTAT project.ILOSTAT developmentstarted in October 2011 and including the concept of multiple data channels for collection and dissemination, being SDMX one of them in both directions.As ILOSTAT is a metadatadriven information system, and its information model wasdesignedfollowing SDMX standard recommendations, the implementation of SDMX resulted in an interface for the data flowsestablishedfrom SDMX to ILOSTAT datawarehouse and vice-versa.
  • The system can be split into three main modules that maps with the three main stages of the data compilation process: Data collection,Data processing and Data dissemination. The Data collection module comprises the design and build of the data collection instruments, which vary according to the data channel to be used. Currently ILOSTAT collects data through Excel questionnaires and csv files. The SDMX connector will be released very soon and will allow for uploading data through SDMX data flows. An electronic questionnaire (on line web form) is in the roadmap for this year as well.Some of the activities in this stage include sending e-mails with questionnaires and reminders, answering questions from countries, uploading the data, etc. Once data is collected, regardless the mean used for, it is processed by an exhaustive consistency checking and correction process. At this stage is where some descriptive metadata (in the form of footnotes and annotations) is coded based on free text provided by the countries.Each weekend (or whenever the amount of data incorporated justifies it) an automatic process computes a set of derived or calculated indicators based on those indicators that have passed the consistency rules and are considered “ready for dissemination”. After this process, all these indicators (collected or calculated) are moved to the dissemination database. The last module comprises the tools for data dissemination. It includes a dynamic website, data download in csv, pdf and Excel formats and SDMX web services (very soon). The Workflow control module tracks the evolution of the questionnaires and questionnaires’ tables through the overall process, and the Metadata module provides the tools and procedures for general metadata maintenance.
  • Indicator is the combination of one Represented variable broken down by one or more classifications.The Represented variable concept is assumed by the OBS_VALUE concept, the primary measureNote Types attributes can be Mandatory or Conditional, depending on structural metadata information in ILOSTAT.
  •   Introduction  SDMX has been «around» the ILO Department of Statisticssince 2002.Difficulties to implementwere:LABORSTA’s information model wasvery hard to match the SDMX IM. Different DB schemas for each type of series, missing keys and multiple unstructured descriptive metadataweresome of the characteristics of LABORSTA system that made SDMX mappingtoodifficult.Lack of resources to undertake the project, and the existence of otherpriorities.The perception that the SDMX standard was not mature enough as to beadopted.ILOSTAT design started in 2010 and the new information model wasconceivedtakingintoconsideration SDMX Content Oriented Guidelines.At the same time, SDMX wasconsidered as an integral part of the ILOSTAT project.ILOSTAT developmentstarted in October 2011 and including the concept of multiple data channels for collection and dissemination, being SDMX one of them in both directions.As ILOSTAT is a metadatadriven information system, and its information model wasdesignedfollowing SDMX standard recommendations, the implementation of SDMX resulted in an interface for the data flowsestablishedfrom SDMX to ILOSTAT datawarehouse and vice-versa.
  • SDMX interface for ILOSTAT

    1. 1. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013. ILO Department of Statistics Edgardo Greising greising@ilo.org 1
    2. 2. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013. I. Introduction II. Design III. Software Architecture IV. Data Collection & Dissemination V. Next Steps 2
    3. 3. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.  SDMX has been “around” since 2002 LABORSTA’s information model drawbacks Lack of resources Waiting for the standard to mature  ILOSTAT design in 2010 New information model following SDMX COG SDMX included as part of ILOSTAT project  ILOSTAT development in 2011 SDMX interface for data collection and dissemination 3
    4. 4. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.  ILOSTAT modules Data collection Data cleaning process Data dissemination Workflow control Metadata 4
    5. 5. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.  ILOSTAT concepts scheme Dimensions Collection Country Frequency Survey Represented Variable (OBS_VALUE) Classification Type (1..6) Time Attributes Note Types Value Status Unit of measure Unit multiplier Time format 5
    6. 6. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.  Data Structure Definition Scope of the DSD 1. One general DSD?  Easy to maintain but huge and volatile 2. One DSD per topic? (~20 topics)  Still too big and volatile. 3. One DSD per indicator? (~100 ind - i.e. Employment by sex and age)  OK for dissemination  Too many useless entries in country specific code lists 4. One DSD per Questionnaire table (indicator + country)?  OK. But … How to maintain ~100 ind x ~200 cou = Solution: Virtual Registry & Versioning module 20.000 DSD’s 6
    7. 7. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.  Virtual Registry Key factors: 1. ILOSTAT is metadata driven 2. ILOSTAT Information Model is very similar to SDMX Information Model All SDMX artifacts considered as «virtually» existing. The SDMX connector creates and delivers «on-the-fly» any requested artifact 7
    8. 8. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.  Versioning Automatic for data structures and related data flows Version increases with any change in the structural metadata (code lists, classification versions, required notes, etc.) Process:  The data structure is generated with the default 1.0 version and full references  The result is serialized to an in memory buffer and a SHA1 hash computed  The hash result is compared to the data stored in the database: • If no existing hash exists, the new hash is stored and the version initialized at 1 • If the hashes are equal, the current version is returned • If the hashes differ, the version is incremented and the new hash stored  The generated version number is passed to the actual structure generation process, to be included in the returning flow 8
    9. 9. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.  Descriptive metadata (metacontent) ILOSTAT includes many notes at different levels All the notes are coded and classified by Note_Type Avoided MSD usage for simplification Notes are included in the DSD/DF as coded attributes Attachment level:  Currently: All notes attached at Observation_value level. Actual level determined by attribute name.  Future: Notes attached at the proper level (format change req.) Only for collection:  Special “Free_Text” note type allow for capturing non-coded annotations 9
    10. 10. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013. Java EE application based on the following components:  SDMXsource  Oracle Application Development Framework (ADF)  ILOSTAT Taskflow Library (also used for the ILOSTAT Website) 10
    11. 11. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.  Dissemination  Standard SDMX RESTful API (partial) http://www.ilo.org/ilostat/sdmx/ws/rest/...  Collection  Triggered by an APEX interface for a given file 11
    12. 12. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013.  Set up provision agreements International Organizations Countries’ NSO & MoL  Develop new interfaces JSON SDMX 2.1 SDMX-RI Gateway  End-user access tools ILO Information & Knowledge Management Gateway ILOSTAT country profile report Grapher tool Mobile Excel add-in Capacity building + Tools 12
    13. 13. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013. E-mail: greising@ilo.org Skype: egreising Twitter: egreising LinkedIn: http://www.linkedin.com/in/egreising 13
    14. 14. SDMX Global Conference - Paris, September 2013.SDMX Global Conference - Paris, September 2013. E-mail: greising@ilo.org Skype: egreising Twitter: egreising LinkedIn: http://www.linkedin.com/in/egreising 14
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×