Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
JSON-stat
a simple light standard
for all kinds of data disseminators
Xavier Badosa
@badosa
http://xavierbadosa.com
http:/...
a simple light standard
for all kinds of data disseminators
Who needs to disseminate data?
Who needs to disseminate data?
Nowadays? Everybody!
Of course!
NSOs*
Central Banks
Intl. orgs
* National Statistical Offic...
How is data usually disseminated?
a simple light standard
for all kinds of data disseminators
How is data usually disseminated?
intableform
intableform
Wherever there’s data
addressed to humans
there is (usually) a table
plain old tables
Why are tables so popular?
Why are tables so popular?
a display device
Tablesare
a display device
with analytical features
an abbreviation, a compressor
a metadata saver
a cube model
that avoids
repeating
metadata
for every cell
an abbreviation, a compressor
a metadata saver
Cubic
Thinking
Describe data
in dimension terms
Simple, for everybody? How?
a simple light standard
for all kinds of data disseminators
Simple, for everybody? How?
If you managed to disseminate data
for humans in tables, you should be
able to do it for machi...
JSON is a data format used in
most APIs. It can include data
and metadata in a single doc.
Simple, for everybody? How?
In ...
Using a very simple cube model
that mimics a plain old table.
Simple, for everybody? How?
In JSON-stat.
A Canadian Example
table
data
What’s the simplest way to express these data in JSON?
[
]
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
...
[
]
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
...
[
]
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
...
[
] ,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
...
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
...
value
note/source/updated
label
dimension
age
dimension
age 20 categories
dimension
Size
age
Role
20 class
dimension
Size
age
concept
20
2
class
metric
dimension
RoleSize
age
concept
sex
20
2
3
class
metric
class
dimension
RoleSize
age
concept
sex
country
20
2
3
1
class
metric
class
geo
dimension
RoleSize
age
concept
sex
country
year
20
2
3
1
1
class
metric
class
geo
time
dimension
RoleSize
Persons (thousands)
2012
Canada
[
]
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
...
[
]
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
...
["country", "year", "age", "concept", "sex"]
Criterion:
What does not change, first.
To make sense of this array, dimensio...
country
year
age
concept
sex
CA CA CA CA CA CA
2012 2012 2012 2012 2012 2012
Total Total Total Total Total Total
Persons P...
"value" : [ … ]
}
{
"version" : "2.0",
"class" : "dataset",
"label" : "Population by sex and age group. Canada. 2012",
"so...
"value" : [ … ]
}
{
"id" : [ "country" , "year" , "age" , "concept" , "sex" ],
"size" : [ 1 , 1 , 20 , 2 , 3 ],
"role" : {...
country
year
age
concept
sex
CA CA CA CA CA CA
2012 2012 2012 2012 2012 2012
Total Total Total Total Total Total
Persons P...
"value" : [ … ]
}
{
"id" : [ "country" , "year" , "age" , "concept" , "sex" ],
"size" : [ 1 , 1 , 20 , 2 , 3 ],
"role" : {...
"sex" : {
"label" : "sex",
"category" : {
"index" : ["T", "M", "F"],
"label" : {
"T" : "Total",
"M" : "Male",
"F" : "Femal...
"sex" : {
"label" : "sex",
"category" : {
"index" : ["T", "M", "F"],
"label" : {
"T" : "Total",
"M" : "Male",
"F" : "Femal...
The “unflattening” problem
The “unflattening” problem
from
dimension
positions
[0,0,7,0,2]
The “unflattening” problem
from
dimension
positions
to value
position
44[0,0,7,0,2]
[
]
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
...
[
]
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
...
Persons (thousands)
2012
Canada
["country", "year", "age", "concept", "sex"]
0
0
0 is the first position
(first category o...
Persons (thousands)
2012
Canada
["country", "year", "age", "concept", "sex"]
0
0
0
1
2
3
4
5
6
7
Persons (thousands)
2012
Canada
["country", "year", "age", "concept", "sex"]
0
0
0 1
0
Persons (thousands)
2012
Canada
["country", "year", "age", "concept", "sex"]
0
0
0
0 1 2
Persons (thousands)
2012
Canada
["country", "year", "age", "concept", "sex"]
0
0
0
[0, 0, 7, 0, 2] → 44
2012
Canada
["country", "year", "age", "concept", "sex"]
0
0
0
[ 1, 1, 20, 2, 3 ]
Persons (thousands)
(Size)
[0, 0, 7, 0, ...
The“unflattening” problem
["country", "year", "age", "concept", "sex"]
[ 1, 1, 20, 2, 3 ]
[0, 0, 7, 0, 2] → 44
The“unflattening” problem
["country", "year", "age", "concept", "sex"]
[ 1, 1, 20, 2, 3 ]
It’s a simple
mathematical
probl...
Lost in cells?
Method: Row-major order
In computing, row-major order and column-
major order describe methods for arrangin...
Lost in cells?
There’s a Javascript library that
takes care of this.
Lost in cells?
Are you a coder? Do you want to
develop your own library?
arr2num(
[0,0,7,0,2],
[1,1,20,2,3]
)
44
Lost in cells?
Here’s a simple solution to
the “unflattening” problem.
function arr2num( arr, size ){
for(var i=0, num=0, ...
Lost in cells?
Or check the sample code section
at http://json-stat.org/tools/
function arr2num( arr, size ){
for(var i=0,...
The JSON-stat
Ecosystem
format
libs
conn.
schema
thank you
all pictures from
Blocks picture in slide 1: Soma, by Dru! (CC BY-NC)
Cubic head in slide 13: Portrait by Thomas Leth-Olse...
JSON-stat, a simple light standard for all kinds of data disseminators
Upcoming SlideShare
Loading in …5
×

JSON-stat, a simple light standard for all kinds of data disseminators

72,929 views

Published on

An introduction to the JSON-stat ecosystem. Originally published in July 2013, it was edited in 2015: it was updated to the latest changes in the standard and aspects not directly related to the JSON-stat document format were removed).

A very brief version of this presentation was used at the Data Tuesday BCN (Sept. 17th, 2013).

Published in: Technology, Business

JSON-stat, a simple light standard for all kinds of data disseminators

  1. 1. JSON-stat a simple light standard for all kinds of data disseminators Xavier Badosa @badosa http://xavierbadosa.com http://json-stat.org December, 2015
  2. 2. a simple light standard for all kinds of data disseminators Who needs to disseminate data?
  3. 3. Who needs to disseminate data? Nowadays? Everybody! Of course! NSOs* Central Banks Intl. orgs * National Statistical Offices Companies The media Citizens… But also NGOs a simple light standard for all kinds of data disseminators
  4. 4. How is data usually disseminated? a simple light standard for all kinds of data disseminators
  5. 5. How is data usually disseminated? intableform
  6. 6. intableform
  7. 7. Wherever there’s data addressed to humans there is (usually) a table
  8. 8. plain old tables Why are tables so popular?
  9. 9. Why are tables so popular? a display device Tablesare
  10. 10. a display device with analytical features
  11. 11. an abbreviation, a compressor a metadata saver
  12. 12. a cube model that avoids repeating metadata for every cell an abbreviation, a compressor a metadata saver
  13. 13. Cubic Thinking Describe data in dimension terms
  14. 14. Simple, for everybody? How? a simple light standard for all kinds of data disseminators
  15. 15. Simple, for everybody? How? If you managed to disseminate data for humans in tables, you should be able to do it for machines with no effort! a simple light standard for all kinds of data disseminators
  16. 16. JSON is a data format used in most APIs. It can include data and metadata in a single doc. Simple, for everybody? How? In JSON.
  17. 17. Using a very simple cube model that mimics a plain old table. Simple, for everybody? How? In JSON-stat.
  18. 18. A Canadian Example
  19. 19. table
  20. 20. data
  21. 21. What’s the simplest way to express these data in JSON?
  22. 22. [ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , anarray (flat) What’s the simplest way to express these data in JSON?
  23. 23. [ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Basic metadata?
  24. 24. [ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , "value" : "class" : "dataset", "label" : "Population by sex and age group. Canada. 2012", "source" : "Statistics Canada, CANSIM, table 051-0001", "updated" : "2012-09-27", } {
  25. 25. [ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , "value" : } { "id" : [ "country" , "year" , "age" , "concept" , "sex" ], "size" : [ 1 , 1 , 20 , 2 , 3 ], "class" : "dataset", "label" : "Population by sex and age group. Canada. 2012", "source" : "Statistics Canada, CANSIM, table 051-0001", "updated" : "2012-09-27", "dimension" : { … } id and size are needed to “unflatten” the value array.
  26. 26. , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , "id" : [ "country" , "year" , "age" , "concept" , "sex" ], "size" : [ 1 , 1 , 20 , 2 , 3 ], "dimension" : { … } id and size are needed to “unflatten” the value array. Method: Row-major order In computing, row-major order and column- major order describe methods for arranging multidimensional arrays in linear storage such as memory.
  27. 27. value note/source/updated label
  28. 28. dimension
  29. 29. age dimension
  30. 30. age 20 categories dimension Size
  31. 31. age Role 20 class dimension Size
  32. 32. age concept 20 2 class metric dimension RoleSize
  33. 33. age concept sex 20 2 3 class metric class dimension RoleSize
  34. 34. age concept sex country 20 2 3 1 class metric class geo dimension RoleSize
  35. 35. age concept sex country year 20 2 3 1 1 class metric class geo time dimension RoleSize
  36. 36. Persons (thousands) 2012 Canada
  37. 37. [ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , To make sense of this array, dimensions must be ordered.
  38. 38. [ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ["country", "year", "age", "concept", "sex"] To make sense of this array, dimensions must be ordered.
  39. 39. ["country", "year", "age", "concept", "sex"] Criterion: What does not change, first. To make sense of this array, dimensions must be ordered. (Position of dimensions of size 1 is irrelevant.)
  40. 40. country year age concept sex CA CA CA CA CA CA 2012 2012 2012 2012 2012 2012 Total Total Total Total Total Total Persons Persons Persons % % % Total TotalM M FF What does not change, first.
  41. 41. "value" : [ … ] } { "version" : "2.0", "class" : "dataset", "label" : "Population by sex and age group. Canada. 2012", "source" : "Statistics Canada, CANSIM, table 051-0001", "updated" : "2012-09-27", "id" : [ "country" , "year" , "age" , "concept" , "sex" ], "size" : [ 1 , 1 , 20 , 2 , 3 ], "role" : { "time" : ["year"] , "geo" : ["country"] , "metric" : ["concept"] }, "dimension" : { … }
  42. 42. "value" : [ … ] } { "id" : [ "country" , "year" , "age" , "concept" , "sex" ], "size" : [ 1 , 1 , 20 , 2 , 3 ], "role" : { "time" : ["year"] , "geo" : ["country"] , "metric" : ["concept"] }, "dimension" : { "country" : { … }, "year" : { … }, "age" : { … }, "concept" : { … }, "sex" : { … } } "version" : "2.0", "class" : "dataset", "label" : "Population by sex and age group. Canada. 2012", "source" : "Statistics Canada, CANSIM, table 051-0001", "updated" : "2012-09-27",
  43. 43. country year age concept sex CA CA CA CA CA CA 2012 2012 2012 2012 2012 2012 Total Total Total Total Total Total Persons Persons Persons % % % Total TotalM M FF
  44. 44. "value" : [ … ] } { "id" : [ "country" , "year" , "age" , "concept" , "sex" ], "size" : [ 1 , 1 , 20 , 2 , 3 ], "role" : { "time" : ["year"] , "geo" : ["country"] , "metric" : ["concept"] }, "dimension" : { "country" : { … }, "year" : { … }, "age" : { … }, "concept" : { … }, "sex" : { … } } "version" : "2.0", "class" : "dataset", "label" : "Population by sex and age group. Canada. 2012", "source" : "Statistics Canada, CANSIM, table 051-0001", "updated" : "2012-09-27",
  45. 45. "sex" : { "label" : "sex", "category" : { "index" : ["T", "M", "F"], "label" : { "T" : "Total", "M" : "Male", "F" : "Female" } } }
  46. 46. "sex" : { "label" : "sex", "category" : { "index" : ["T", "M", "F"], "label" : { "T" : "Total", "M" : "Male", "F" : "Female" } } } {"T" : 0, "M" : 1, "F" : 2}, Also accepted (faster access)* * See “Arrays vs. Objects” http://bl.ocks.org/5708161
  47. 47. The “unflattening” problem
  48. 48. The “unflattening” problem from dimension positions [0,0,7,0,2]
  49. 49. The “unflattening” problem from dimension positions to value position 44[0,0,7,0,2]
  50. 50. [ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ["country", "year", "age", "concept", "sex"]
  51. 51. [ ] , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ["country", "year", "age", "concept", "sex"] 0 1 2 3 4 5… 44 …120
  52. 52. Persons (thousands) 2012 Canada ["country", "year", "age", "concept", "sex"] 0 0 0 is the first position (first category of the dimension)
  53. 53. Persons (thousands) 2012 Canada ["country", "year", "age", "concept", "sex"] 0 0 0 1 2 3 4 5 6 7
  54. 54. Persons (thousands) 2012 Canada ["country", "year", "age", "concept", "sex"] 0 0 0 1 0
  55. 55. Persons (thousands) 2012 Canada ["country", "year", "age", "concept", "sex"] 0 0 0 0 1 2
  56. 56. Persons (thousands) 2012 Canada ["country", "year", "age", "concept", "sex"] 0 0 0 [0, 0, 7, 0, 2] → 44
  57. 57. 2012 Canada ["country", "year", "age", "concept", "sex"] 0 0 0 [ 1, 1, 20, 2, 3 ] Persons (thousands) (Size) [0, 0, 7, 0, 2] → 44
  58. 58. The“unflattening” problem ["country", "year", "age", "concept", "sex"] [ 1, 1, 20, 2, 3 ] [0, 0, 7, 0, 2] → 44
  59. 59. The“unflattening” problem ["country", "year", "age", "concept", "sex"] [ 1, 1, 20, 2, 3 ] It’s a simple mathematical problem Compute value position using dimension position & size [0, 0, 7, 0, 2] → 44
  60. 60. Lost in cells? Method: Row-major order In computing, row-major order and column- major order describe methods for arranging multidimensional arrays in linear storage such as memory.
  61. 61. Lost in cells? There’s a Javascript library that takes care of this.
  62. 62. Lost in cells? Are you a coder? Do you want to develop your own library? arr2num( [0,0,7,0,2], [1,1,20,2,3] ) 44
  63. 63. Lost in cells? Here’s a simple solution to the “unflattening” problem. function arr2num( arr, size ){ for(var i=0, num=0, mult=1, ndims=size.length; i<ndims; i++){ mult*=(i>0) ? size[ndims-i] : 1; num+=mult*arr[ndims-i-1]; } return num; } arr2num( [0,0,7,0,2], [1,1,20,2,3] ) 44
  64. 64. Lost in cells? Or check the sample code section at http://json-stat.org/tools/ function arr2num( arr, size ){ for(var i=0, num=0, mult=1, ndims=size.length; i<ndims; i++){ mult*=(i>0) ? size[ndims-i] : 1; num+=mult*arr[ndims-i-1]; } return num; }
  65. 65. The JSON-stat Ecosystem format libs conn. schema
  66. 66. thank you
  67. 67. all pictures from Blocks picture in slide 1: Soma, by Dru! (CC BY-NC) Cubic head in slide 13: Portrait by Thomas Leth-Olsen (CC BY) Rubik’s Cube in slide 18: BW Rubik’s Cube, by Gerwin Sturm (CC BY-SA) Shiny cube in slide 48: SONY DSC, by Javier Manso (CC BY-NC-SA) Walking girl in slide 61: Sterile, by Lee Nachtigal (CC BY) Atomium in slide 66: Fighting Gravity – Atomium, Brussels, by Jan Faborsky (CC BY-NC-ND) Eggs in slide 77: Eggs n. 3, by Leonardo D’Amico (CC BY-SA-ND)

×