JSON-stat, a simple light standard for all kinds of data disseminators

52,465 views

Published on

An introduction to the JSON-stat ecosystem. Originally published in July 2013, it was edited in 2015: it was updated to the latest changes in the standard and aspects not directly related to the JSON-stat document format were removed).

A very brief version of this presentation was used at the Data Tuesday BCN (Sept. 17th, 2013).

Published in: Technology, Business

JSON-stat, a simple light standard for all kinds of data disseminators

  1. 1. JSON-stat a simple light standard for all kinds of data disseminators Xavier Badosa @badosa http://xavierbadosa.com http://json-stat.org December, 2015
  2. 2. a simple light standard for all kinds of data disseminators Who needs to disseminate data?
  3. 3. Who needs to disseminate data? Nowadays? Everybody! Of course! NSOs* Central Banks Intl. orgs * National Statistical Offices Companies The media Citizens… But also NGOs a simple light standard for all kinds of data disseminators
  4. 4. How is data usually disseminated? a simple light standard for all kinds of data disseminators
  5. 5. How is data usually disseminated? intableform
  6. 6. intableform
  7. 7. Wherever there’s data addressed to humans there is (usually) a table
  8. 8. plain old tables Why are tables so popular?
  9. 9. Why are tables so popular? a display device Tablesare
  10. 10. a display device with analytical features
  11. 11. an abbreviation, a compressor a metadata saver
  12. 12. a cube model that avoids repeating metadata for every cell an abbreviation, a compressor a metadata saver
  13. 13. Cubic Thinking Describe data in dimension terms
  14. 14. Simple, for everybody? How? a simple light standard for all kinds of data disseminators
  15. 15. Simple, for everybody? How? If you managed to disseminate data for humans in tables, you should be able to do it for machines with no effort! a simple light standard for all kinds of data disseminators
  16. 16. JSON is a data format used in most APIs. It can include data and metadata in a single doc. Simple, for everybody? How? In JSON.
  17. 17. Using a very simple cube model that mimics a plain old table. Simple, for everybody? How? In JSON-stat.
  18. 18. A Canadian Example
  19. 19. table
  20. 20. data
  21. 21. What’s the simplest way to express these data in JSON?
  22. 22. [ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , anarray (flat) What’s the simplest way to express these data in JSON?
  23. 23. [ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Basic metadata?
  24. 24. [ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , "value" : "class" : "dataset", "label" : "Population by sex and age group. Canada. 2012", "source" : "Statistics Canada, CANSIM, table 051-0001", "updated" : "2012-09-27", } {
  25. 25. [ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , "value" : } { "id" : [ "country" , "year" , "age" , "concept" , "sex" ], "size" : [ 1 , 1 , 20 , 2 , 3 ], "class" : "dataset", "label" : "Population by sex and age group. Canada. 2012", "source" : "Statistics Canada, CANSIM, table 051-0001", "updated" : "2012-09-27", "dimension" : { … } id and size are needed to “unflatten” the value array.
  26. 26. , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , "id" : [ "country" , "year" , "age" , "concept" , "sex" ], "size" : [ 1 , 1 , 20 , 2 , 3 ], "dimension" : { … } id and size are needed to “unflatten” the value array. Method: Row-major order In computing, row-major order and column- major order describe methods for arranging multidimensional arrays in linear storage such as memory.
  27. 27. value note/source/updated label
  28. 28. dimension
  29. 29. age dimension
  30. 30. age 20 categories dimension Size
  31. 31. age Role 20 class dimension Size
  32. 32. age concept 20 2 class metric dimension RoleSize
  33. 33. age concept sex 20 2 3 class metric class dimension RoleSize
  34. 34. age concept sex country 20 2 3 1 class metric class geo dimension RoleSize
  35. 35. age concept sex country year 20 2 3 1 1 class metric class geo time dimension RoleSize
  36. 36. Persons (thousands) 2012 Canada
  37. 37. [ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , To make sense of this array, dimensions must be ordered.
  38. 38. [ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ["country", "year", "age", "concept", "sex"] To make sense of this array, dimensions must be ordered.
  39. 39. ["country", "year", "age", "concept", "sex"] Criterion: What does not change, first. To make sense of this array, dimensions must be ordered. (Position of dimensions of size 1 is irrelevant.)
  40. 40. country year age concept sex CA CA CA CA CA CA 2012 2012 2012 2012 2012 2012 Total Total Total Total Total Total Persons Persons Persons % % % Total TotalM M FF What does not change, first.
  41. 41. "value" : [ … ] } { "version" : "2.0", "class" : "dataset", "label" : "Population by sex and age group. Canada. 2012", "source" : "Statistics Canada, CANSIM, table 051-0001", "updated" : "2012-09-27", "id" : [ "country" , "year" , "age" , "concept" , "sex" ], "size" : [ 1 , 1 , 20 , 2 , 3 ], "role" : { "time" : ["year"] , "geo" : ["country"] , "metric" : ["concept"] }, "dimension" : { … }
  42. 42. "value" : [ … ] } { "id" : [ "country" , "year" , "age" , "concept" , "sex" ], "size" : [ 1 , 1 , 20 , 2 , 3 ], "role" : { "time" : ["year"] , "geo" : ["country"] , "metric" : ["concept"] }, "dimension" : { "country" : { … }, "year" : { … }, "age" : { … }, "concept" : { … }, "sex" : { … } } "version" : "2.0", "class" : "dataset", "label" : "Population by sex and age group. Canada. 2012", "source" : "Statistics Canada, CANSIM, table 051-0001", "updated" : "2012-09-27",
  43. 43. country year age concept sex CA CA CA CA CA CA 2012 2012 2012 2012 2012 2012 Total Total Total Total Total Total Persons Persons Persons % % % Total TotalM M FF
  44. 44. "value" : [ … ] } { "id" : [ "country" , "year" , "age" , "concept" , "sex" ], "size" : [ 1 , 1 , 20 , 2 , 3 ], "role" : { "time" : ["year"] , "geo" : ["country"] , "metric" : ["concept"] }, "dimension" : { "country" : { … }, "year" : { … }, "age" : { … }, "concept" : { … }, "sex" : { … } } "version" : "2.0", "class" : "dataset", "label" : "Population by sex and age group. Canada. 2012", "source" : "Statistics Canada, CANSIM, table 051-0001", "updated" : "2012-09-27",
  45. 45. "sex" : { "label" : "sex", "category" : { "index" : ["T", "M", "F"], "label" : { "T" : "Total", "M" : "Male", "F" : "Female" } } }
  46. 46. "sex" : { "label" : "sex", "category" : { "index" : ["T", "M", "F"], "label" : { "T" : "Total", "M" : "Male", "F" : "Female" } } } {"T" : 0, "M" : 1, "F" : 2}, Also accepted (faster access)* * See “Arrays vs. Objects” http://bl.ocks.org/5708161
  47. 47. The “unflattening” problem
  48. 48. The “unflattening” problem from dimension positions [0,0,7,0,2]
  49. 49. The “unflattening” problem from dimension positions to value position 44[0,0,7,0,2]
  50. 50. [ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ["country", "year", "age", "concept", "sex"]
  51. 51. [ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ["country", "year", "age", "concept", "sex"] 0 1 2 3 4 5… 44 …120
  52. 52. Persons (thousands) 2012 Canada ["country", "year", "age", "concept", "sex"] 0 0 0 is the first position (first category of the dimension)
  53. 53. Persons (thousands) 2012 Canada ["country", "year", "age", "concept", "sex"] 0 0 0 1 2 3 4 5 6 7
  54. 54. Persons (thousands) 2012 Canada ["country", "year", "age", "concept", "sex"] 0 0 0 1 0
  55. 55. Persons (thousands) 2012 Canada ["country", "year", "age", "concept", "sex"] 0 0 0 0 1 2
  56. 56. Persons (thousands) 2012 Canada ["country", "year", "age", "concept", "sex"] 0 0 0 [0, 0, 7, 0, 2] → 44
  57. 57. 2012 Canada ["country", "year", "age", "concept", "sex"] 0 0 0 [ 1, 1, 20, 2, 3 ] Persons (thousands) (Size) [0, 0, 7, 0, 2] → 44
  58. 58. The“unflattening” problem ["country", "year", "age", "concept", "sex"] [ 1, 1, 20, 2, 3 ] [0, 0, 7, 0, 2] → 44
  59. 59. The“unflattening” problem ["country", "year", "age", "concept", "sex"] [ 1, 1, 20, 2, 3 ] It’s a simple mathematical problem Compute value position using dimension position & size [0, 0, 7, 0, 2] → 44
  60. 60. Lost in cells? Method: Row-major order In computing, row-major order and column- major order describe methods for arranging multidimensional arrays in linear storage such as memory.
  61. 61. Lost in cells? There’s a Javascript library that takes care of this.
  62. 62. Lost in cells? Are you a coder? Do you want to develop your own library? arr2num( [0,0,7,0,2], [1,1,20,2,3] ) 44
  63. 63. Lost in cells? Here’s a simple solution to the “unflattening” problem. function arr2num( arr, size ){ for(var i=0, num=0, mult=1, ndims=size.length; i<ndims; i++){ mult*=(i>0) ? size[ndims-i] : 1; num+=mult*arr[ndims-i-1]; } return num; } arr2num( [0,0,7,0,2], [1,1,20,2,3] ) 44
  64. 64. Lost in cells? Or check the sample code section at http://json-stat.org/tools/ function arr2num( arr, size ){ for(var i=0, num=0, mult=1, ndims=size.length; i<ndims; i++){ mult*=(i>0) ? size[ndims-i] : 1; num+=mult*arr[ndims-i-1]; } return num; }
  65. 65. The JSON-stat Ecosystem format libs conn. schema
  66. 66. thank you
  67. 67. all pictures from Blocks picture in slide 1: Soma, by Dru! (CC BY-NC) Cubic head in slide 13: Portrait by Thomas Leth-Olsen (CC BY) Rubik’s Cube in slide 18: BW Rubik’s Cube, by Gerwin Sturm (CC BY-SA) Shiny cube in slide 48: SONY DSC, by Javier Manso (CC BY-NC-SA) Walking girl in slide 61: Sterile, by Lee Nachtigal (CC BY) Atomium in slide 66: Fighting Gravity – Atomium, Brussels, by Jan Faborsky (CC BY-NC-ND) Eggs in slide 77: Eggs n. 3, by Leonardo D’Amico (CC BY-SA-ND)

×