JSON-stat in the session "The future of standards in statistics", United Nations, Geneva

4,445 views

Published on

In 2015, I was invited to present the JSON-stat standard in the Workshop on International Collaboration for Standards-Based Modernisation (Geneva, 5-7 May 2015).

http://www1.unece.org/stat/platform/display/WICSBM/Geneva%2C+5-7+May+2015

Published in: Data & Analytics

JSON-stat in the session "The future of standards in statistics", United Nations, Geneva

  1. 1. JSON-stat standards in statistics Workshop on International Collaboration for Standards-Based Modernisation United Nations Economic Commission for Europe Geneva, 7 May 2015 The future of Xavier Badosa Statistical Institute of Catalonia }
  2. 2. standards as conversation enablers
  3. 3. standards as conversation enablers What Who Why
  4. 4. standards in statistics What Who Why
  5. 5. Abbreviation
  6. 6. Cubic Model Describe data in dimension terms
  7. 7. standards as conversation enablers What Who Why
  8. 8. data as an infrastructure
  9. 9. API data as an infrastructure NSO as a platform
  10. 10. A B Shared environment E C H A N G E
  11. 11. D I S S E M I N A T I O N
  12. 12. ?
  13. 13. self service
  14. 14. simple
  15. 15. simple
  16. 16. general
  17. 17. standards as conversation enablers What Who Why
  18. 18. data provider developer
  19. 19. data provider end user app
  20. 20. data API end user app
  21. 21. end user app
  22. 22. lightweight
  23. 23. Cubic Simple General Light
  24. 24. Cubic Simple General Light
  25. 25. Cubic Simple General Light <?xml version='1.0' encoding='UTF-8'?> <message:GenericData xmlns:footer="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message/footer" xmlns:generic="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic" xmlns:common="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/common" xmlns:message="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xml="http://www.w3.org/XML/1998/namespace"> <message:Header> <message:ID>45a9f625f3887526f72e5341577e13e5</message:ID> <message:Test>false</message:Test> <message:Prepared>2015-04-20T13:37:23</message:Prepared> <message:Sender id="ESTAT"> <common:Name xml:lang="en">Eurostat</common:Name> <message:Timezone>+01:00</message:Timezone> </message:Sender> <message:Receiver id="RECEIVER"/> <message:Structure structureID="ESTAT_DSD_cdh_e_fos_1_0“ dimensionAtObservation="TIME_PERIOD"> <common:Structure> <Ref agencyID="ESTAT" id="DSD_cdh_e_fos" version="1.0"/> </common:Structure> </message:Structure> <message:DataSetAction>Append</message:DataSetAction> <message:DataSetID>cdh_e_fos</message:DataSetID> </message:Header> <message:DataSet structureRef="ESTAT_DSD_cdh_e_fos_1_0"> <generic:Series> <generic:SeriesKey> <generic:Value id="UNIT" value="PC"/> <generic:Value id="Y_GRAD" value="TOTAL"/> <generic:Value id="FOS07" value="FOS1"/> <generic:Value id="GEO" value="BE"/> <generic:Value id="FREQ" value="A"/> </generic:SeriesKey> <generic:Obs> <generic:ObsDimension value="2009"/> <generic:ObsValue value="NaN"/> <generic:Attributes> <generic:Value id="OBS_STATUS" value="na"/> </generic:Attributes> </generic:Obs> <generic:Obs> <generic:ObsDimension value="2006"/> <generic:ObsValue value="NaN"/> <generic:Attributes> <generic:Value id="OBS_STATUS" value="na"/> </generic:Attributes> </generic:Obs> </generic:Series> <generic:Series> <generic:SeriesKey> <generic:Value id="UNIT" value="PC"/> <generic:Value id="Y_GRAD" value="Y_GE1990"/> <generic:Value id="FOS07" value="FOS1"/> <generic:Value id="GEO" value="BE"/> <generic:Value id="FREQ" value="A"/> </generic:SeriesKey> <generic:Obs> <generic:ObsDimension value="2009"/> <generic:ObsValue value="43.75"/> </generic:Obs> <generic:Obs> <generic:ObsDimension value="2006"/> <generic:ObsValue value="NaN"/> <generic:Attributes> <generic:Value id="OBS_STATUS" value="na"/> </generic:Attributes> </generic:Obs> </generic:Series> </message:DataSet> </message:GenericData> SDMX-ML
  26. 26. Cubic Simple General Light <?xml version='1.0' encoding='UTF-8'?> <message:GenericData xmlns:footer="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message/footer" xmlns:generic="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/data/generic" xmlns:common="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/common" xmlns:message="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xml="http://www.w3.org/XML/1998/namespace"> <message:Header> <message:ID>45a9f625f3887526f72e5341577e13e5</message:ID> <message:Test>false</message:Test> <message:Prepared>2015-04-20T13:37:23</message:Prepared> <message:Sender id="ESTAT"> <common:Name xml:lang="en">Eurostat</common:Name> <message:Timezone>+01:00</message:Timezone> </message:Sender> <message:Receiver id="RECEIVER"/> <message:Structure structureID="ESTAT_DSD_cdh_e_fos_1_0“ dimensionAtObservation="TIME_PERIOD"> <common:Structure> <Ref agencyID="ESTAT" id="DSD_cdh_e_fos" version="1.0"/> </common:Structure> </message:Structure> <message:DataSetAction>Append</message:DataSetAction> <message:DataSetID>cdh_e_fos</message:DataSetID> </message:Header> <message:DataSet structureRef="ESTAT_DSD_cdh_e_fos_1_0"> <generic:Series> <generic:SeriesKey> <generic:Value id="UNIT" value="PC"/> <generic:Value id="Y_GRAD" value="TOTAL"/> <generic:Value id="FOS07" value="FOS1"/> <generic:Value id="GEO" value="BE"/> <generic:Value id="FREQ" value="A"/> </generic:SeriesKey> <generic:Obs> <generic:ObsDimension value="2009"/> <generic:ObsValue value="NaN"/> <generic:Attributes> <generic:Value id="OBS_STATUS" value="na"/> </generic:Attributes> </generic:Obs> <generic:Obs> <generic:ObsDimension value="2006"/> <generic:ObsValue value="NaN"/> <generic:Attributes> <generic:Value id="OBS_STATUS" value="na"/> </generic:Attributes> </generic:Obs> </generic:Series> <generic:Series> <generic:SeriesKey> <generic:Value id="UNIT" value="PC"/> <generic:Value id="Y_GRAD" value="Y_GE1990"/> <generic:Value id="FOS07" value="FOS1"/> <generic:Value id="GEO" value="BE"/> <generic:Value id="FREQ" value="A"/> </generic:SeriesKey> <generic:Obs> <generic:ObsDimension value="2009"/> <generic:ObsValue value="43.75"/> </generic:Obs> <generic:Obs> <generic:ObsDimension value="2006"/> <generic:ObsValue value="NaN"/> <generic:Attributes> <generic:Value id="OBS_STATUS" value="na"/> </generic:Attributes> </generic:Obs> </generic:Series> </message:DataSet> </message:GenericData> SDMX-ML
  27. 27. [ { "area_name": "Panama", "measurement": "tonnes", "value": 1152.87890625, "year": 2007, "footnotes": [...] }, ... ]
  28. 28. [ {...}, [ { "indicator": { "id": "NY.GDP.MKTP.CD", "value": "GDP (current US$)" }, "country": { "id": "BR", "value": "Brazil" }, "value": "1620165226993.77", "decimal": "0", "date": "2009" }, ... ] ]
  29. 29. { "JSONDataResult": [ { "AgencyName": "MCC", "Amount": "-6983525", "BenefitingLocation": "Ghana", "Category": "Economic Development", "FiscalYear": "2013", "Sector": "Agriculture" }, ... ] }
  30. 30. { ..., "Results": [{ "series": [{ "seriesID": "LAUCN040010000000005", "data": [ { "year": "2013", "period": "M11", "periodName": "November", "value": "16393", "footnotes": [...] }, ... ] }] }] }
  31. 31. { "BEAAPI": { "Request": {...}, "Results": { "Statistic": "Per capita personal income", "UnitOfMeasure":"dollars", "PublicTable":"CA1-3 Personal income summary", "UTCProductionTime":"2014-05-11T17:02:55.817", "NoteRef":"2", "Dimensions": [...], "Data": [ { "GeoFips":"00000", "GeoName":"United States", "Code":"PCPI_CI", "TimePeriod":"2009", "CL_UNIT":"USD", "UNIT_MULT":"0", "DataValue":"39357" }, ... ], "Notes": [...] } } }
  32. 32. { "BEAAPI": { "Request": {...}, "Results": { "Statistic": "Per capita personal income", "UnitOfMeasure":"dollars", "PublicTable":"CA1-3 Personal income summary", "UTCProductionTime":"2014-05-11T17:02:55.817", "NoteRef":"2", "Dimensions": [...], "Data": [ { "GeoFips":"00000", "GeoName":"United States", "Code":"PCPI_CI", "TimePeriod":"2009", "CL_UNIT":"USD", "UNIT_MULT":"0", "DataValue":"39357" }, ... ], "Notes": [...] } } } General
  33. 33. { "BEAAPI": { "Request": {...}, "Results": { "Statistic": "Per capita personal income", "UnitOfMeasure":"dollars", "PublicTable":"CA1-3 Personal income summary", "UTCProductionTime":"2014-05-11T17:02:55.817", "NoteRef":"2", "Dimensions": [...], "Data": [ { "GeoFips":"00000", "GeoName":"United States", "Code":"PCPI_CI", "TimePeriod":"2009", "CL_UNIT":"USD", "UNIT_MULT":"0", "DataValue":"39357" }, ... ], "Notes": [...] } } } General Light?
  34. 34. { "BEAAPI": { "Request": {...}, "Results": { "Statistic": "Per capita personal income", "UnitOfMeasure":"dollars", "PublicTable":"CA1-3 Personal income summary", "UTCProductionTime":"2014-05-11T17:02:55.817", "NoteRef":"2", "Dimensions": [...], "Data": [ { "GeoFips":"00000", "GeoName":"United States", "Code":"PCPI_CI", "TimePeriod":"2009", "CL_UNIT":"USD", "UNIT_MULT":"0", "DataValue":"39357" }, ... ], "Notes": [...] } } } General Light?Cubic
  35. 35. JSON-stat 1. Keeps metadata and data apart Cubic Simple General Light
  36. 36. { "columns": [ { "code":"region", "text": "Region" }, { "code":"ageG5", "text":"Age", "comment": ... }, { "code":"period", "text":"Time", "type":"t" }, { "code":"x", "text":"Population", "type":"c", "unit":"amount" } ], "comments": [...], "data":[ { "key": ["02","0-7","2003"], "values": [ 100 ] }, { "key": ["02","0-7","2004"], "values": [ 101 ] }, ... ] }
  37. 37. "dataSets" : [ { "action" : "Information", "series" : { "0:0:0:0" : { "attributes" : [0, 0, 0], "observations" : { "0" : [100.0, null], "1" : [103.3038, null], "2" : [105.1249, null], "3" : [107.7003, null] } }, "0:0:0:1" : { "attributes" : [0, 0, 0], "observations" : { ...
  38. 38. JSON-stat 1. Keeps metadata and data apart 2. Avoids indices Cubic Simple General Light
  39. 39. [ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , "value" :
  40. 40. { "dataset": { "label": "Value (NOK 1 000) by imports/exports...", "source": "Statistics Norway", "updated": "2014-05-13T18:36:18Z", "dimension": {...}, "value": [ 77287250, 70377617, 73499048, ... ], ... } }
  41. 41. function arr2num( arr, size ){ for(var i=0, num=0, mult=1, ndims=size.length; i<ndims; i++){ mult*=(i>0) ? size[ndims-i] : 1; num+=mult*arr[ndims-i-1]; } return num; } Maths is what computers do best Row-major order
  42. 42. dev data API
  43. 43. dev data API SDK
  44. 44. JSON-stat 1. Keeps metadata and data apart 2. Avoids indices 3. Has a simple ontology Cubic Simple General Light
  45. 45. metadata data
  46. 46. dataset dimension category value
  47. 47. JSON-stat 1. Keeps metadata and data apart 2. Avoids indices 3. Has a simple ontology Cubic Simple General Light4. Requires very few properties
  48. 48. dimension category value id size index or label (from 27 reserved words)
  49. 49. buildable from existing systems
  50. 50. JSON-stat.org a simple light standard for all kinds of data disseminators
  51. 51. end user app
  52. 52. http://bl.ocks.org/badosa Examples
  53. 53. http://bl.ocks.org/badosa Examples
  54. 54. Thank you Xavier Badosa @badosa JSON-stat @jsonstat Statistical Institute of Catalonia (Idescat)
  55. 55. Credits “Soma” (blocks’ background) by Dru! (CC BY–NC) “Deep in conversation” (bar conversation) by Ross Pollack (BY–NC–SA) “Metal movable type” by Willi Heidelbach (CC BY–SA) “Portrait” (cubic head) by Thomas Leth-Olsen (CC BY) “Sterile” (walking girl) by Lee Nachtigal (CC BY) “Railroad” by Xavier Badosa (CC BY) “Dartboard” by Jacob Vance (CC BY–NC)

×