F I N D A N D U N D E R S TA N D D ATA                  Best Practices for       Publishing DataHjalmar Gislason, founder ...
Hjalmar                Gislason                Founder and CEOTwitter: @datamarketSlides: http://blog.datamarket.com/
HeavyData Consumers    Providers of Data Delivery  Technology
Computers                                                    Humans    |   B EST PR ACT ICE S fo r PUBL IS HI NG D ATA   |...
Computers                                                      Humans• Structure                                          ...
Computers                                                      Humans• Structure                                          ...
Publishing for Computers1. Simple formats2. Indexes, unique IDs and meta-data3. FAQs and feedback channels
Simple Formats"Dont anthropomorphize computers           - they hate it."                     - Unknown
Simple Formats
Simple Formats:Tim Berners-Lee’s Five Stars     |   B EST PR ACT ICE S fo r PUBL IS HI NG D ATA   |   Hjalmar Gislason, hg...
Simple Formats:You lost me at “Semantics”     |   B EST PR ACT ICE S fo r PUBL IS HI NG D ATA   |   Hjalmar Gislason, hg@d...
Standards will emerge and there willbe more and more of them                     • RDF                     • OData vs. GDa...
Indexes, unique ids and meta-data     |   B EST PR ACT ICE S fo r PUBL IS HI NG D ATA   |   Hjalmar Gislason, hg@datamarke...
Indexes, unique ids and meta-data     |   B EST PR ACT ICE S fo r PUBL IS HI NG D ATA   |   Hjalmar Gislason, hg@datamarke...
Indexes, unique ids and meta-data     |   B EST PR ACT ICE S fo r PUBL IS HI NG D ATA   |   Hjalmar Gislason, hg@datamarke...
Indexes, unique IDs and meta-data  • Must: Unique ID, Title, Last updated  • Should: Meta-data  • Why?   • No need for scr...
Indexes, unique IDs and meta-data  • Hard to emphasize enough!  • Unique IDs for everything: Datsets, columns, entities, ....
Indexes, unique IDs and meta-data  • Any relevant contextual information   • URL(s), descriptions, methodology, next updat...
FAQs and feedback channels   #1 reason for not publishing data:   “There are errors in the data and I dont       want othe...
FAQs and feedback channels   #1 reason for not publishing data:      “There are errors in the data and I do         want o...
FAQs and feedback channels     |   B EST PR ACT ICE S fo r PUBL IS HI NG D ATA   |   Hjalmar Gislason, hg@datamarket.com  ...
FAQs and feedback channels     |   B EST PR ACT ICE S fo r PUBL IS HI NG D ATA   |   Hjalmar Gislason, hg@datamarket.com  ...
Publishing for Computers1. Simple formats2. Indexes, unique IDs and meta-data3. FAQs and feedback channels
Computers                                                         Humans• Structure                                       ...
Publishing for Humans1. Search / Discovery2. Visualization3. Download
Search / Discovery  • Requirements differ from web/text search   • A lot less textual content to base on     • Synonyms, d...
Search / Discovery
Visualize
109 columns     x  340 lines     =37.060 cells
Visualize  • What you should offer depends on the data  • Statistical data    • Focus on the most common charts and get th...
Visualize
Visualize
Download  • Make it easy to use your data outside your tools   • Play nicely with those providing functionality beyond wha...
Computers                                                       Humans• Structure                                         ...
F I N D A N D U N D E R S TA N D D ATA              Hjalmar Gislason, founder & CEOTwitter: @datamarket · Facebook: DataMa...
Strata NY: Best Practices for Publishing Data
Strata NY: Best Practices for Publishing Data
Strata NY: Best Practices for Publishing Data
Strata NY: Best Practices for Publishing Data
Strata NY: Best Practices for Publishing Data
Strata NY: Best Practices for Publishing Data
Upcoming SlideShare
Loading in …5
×

Strata NY: Best Practices for Publishing Data

3,430 views

Published on

A presentation by Hjalmar Gislason, founder and CEO of DataMarket at the Strata Conference in New York, October 2012

Published in: Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,430
On SlideShare
0
From Embeds
0
Number of Embeds
2,790
Actions
Shares
0
Downloads
17
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Strata NY: Best Practices for Publishing Data

  1. 1. F I N D A N D U N D E R S TA N D D ATA Best Practices for Publishing DataHjalmar Gislason, founder & CEO - hg@datamarket.com October, 2012
  2. 2. Hjalmar Gislason Founder and CEOTwitter: @datamarketSlides: http://blog.datamarket.com/
  3. 3. HeavyData Consumers Providers of Data Delivery Technology
  4. 4. Computers Humans | B EST PR ACT ICE S fo r PUBL IS HI NG D ATA | Hjalmar Gislason, hg@datamarket.com | October 2012
  5. 5. Computers Humans• Structure • Understand and use | B EST PR ACT ICE S fo r PUBL IS HI NG D ATA | Hjalmar Gislason, hg@datamarket.com | October 2012
  6. 6. Computers Humans• Structure • Understand and use | B EST PR ACT ICE S fo r PUBL IS HI NG D ATA | Hjalmar Gislason, hg@datamarket.com | October 2012
  7. 7. Publishing for Computers1. Simple formats2. Indexes, unique IDs and meta-data3. FAQs and feedback channels
  8. 8. Simple Formats"Dont anthropomorphize computers - they hate it." - Unknown
  9. 9. Simple Formats
  10. 10. Simple Formats:Tim Berners-Lee’s Five Stars | B EST PR ACT ICE S fo r PUBL IS HI NG D ATA | Hjalmar Gislason, hg@datamarket.com | October 2012
  11. 11. Simple Formats:You lost me at “Semantics” | B EST PR ACT ICE S fo r PUBL IS HI NG D ATA | Hjalmar Gislason, hg@datamarket.com | October 2012
  12. 12. Standards will emerge and there willbe more and more of them • RDF • OData vs. GData • DSPL • SDMX | B EST PR ACT ICE S fo r PUBL IS HI NG D ATA | Hjalmar Gislason, hg@datamarket.com | October 2012
  13. 13. Indexes, unique ids and meta-data | B EST PR ACT ICE S fo r PUBL IS HI NG D ATA | Hjalmar Gislason, hg@datamarket.com | October 2012
  14. 14. Indexes, unique ids and meta-data | B EST PR ACT ICE S fo r PUBL IS HI NG D ATA | Hjalmar Gislason, hg@datamarket.com | October 2012
  15. 15. Indexes, unique ids and meta-data | B EST PR ACT ICE S fo r PUBL IS HI NG D ATA | Hjalmar Gislason, hg@datamarket.com | October 2012
  16. 16. Indexes, unique IDs and meta-data • Must: Unique ID, Title, Last updated • Should: Meta-data • Why? • No need for scraping • Less load on your end • Ensures full coverage • Ensures content removal and updates | B EST PR ACT ICE S fo r PUBL IS HI NG D ATA | Hjalmar Gislason, hg@datamarket.com | October 2012
  17. 17. Indexes, unique IDs and meta-data • Hard to emphasize enough! • Unique IDs for everything: Datsets, columns, entities, ... • Why? • Continuity: A small change for a man = giant leap for a computer | B EST PR ACT ICE S fo r PUBL IS HI NG D ATA | Hjalmar Gislason, hg@datamarket.com | October 2012
  18. 18. Indexes, unique IDs and meta-data • Any relevant contextual information • URL(s), descriptions, methodology, next updated, authors, keywords, units, license information, ... | B EST PR ACT ICE S fo r PUBL IS HI NG D ATA | Hjalmar Gislason, hg@datamarket.com | October 2012
  19. 19. FAQs and feedback channels #1 reason for not publishing data: “There are errors in the data and I dont want others to discover them” | B EST PR ACT ICE S fo r PUBL IS HI NG D ATA | Hjalmar Gislason, hg@datamarket.com | October 2012
  20. 20. FAQs and feedback channels #1 reason for not publishing data: “There are errors in the data and I do want others to discover them” | B EST PR ACT ICE S fo r PUBL IS HI NG D ATA | Hjalmar Gislason, hg@datamarket.com | October 2012
  21. 21. FAQs and feedback channels | B EST PR ACT ICE S fo r PUBL IS HI NG D ATA | Hjalmar Gislason, hg@datamarket.com | October 2012
  22. 22. FAQs and feedback channels | B EST PR ACT ICE S fo r PUBL IS HI NG D ATA | Hjalmar Gislason, hg@datamarket.com | October 2012
  23. 23. Publishing for Computers1. Simple formats2. Indexes, unique IDs and meta-data3. FAQs and feedback channels
  24. 24. Computers Humans• Structure • Understand and use | B EST PR ACT ICE S fo r PUBL IS HI NG D ATA | Hjalmar Gislason, hg@datamarket.com | October 2012
  25. 25. Publishing for Humans1. Search / Discovery2. Visualization3. Download
  26. 26. Search / Discovery • Requirements differ from web/text search • A lot less textual content to base on • Synonyms, dictionaries, autocomplete • But (hopefully) good meta-data = facets and filtering • Give people ways to browse • Categories vs. tags vs. search • Serendipity: Random, related, interesting...
  27. 27. Search / Discovery
  28. 28. Visualize
  29. 29. 109 columns x 340 lines =37.060 cells
  30. 30. Visualize • What you should offer depends on the data • Statistical data • Focus on the most common charts and get them right • Do NOT invent new visualizations or chart types • Use standards compatible technologies • No Flash! • Charting and visualization libraries
  31. 31. Visualize
  32. 32. Visualize
  33. 33. Download • Make it easy to use your data outside your tools • Play nicely with those providing functionality beyond what you can offer: Tableau, R, SAS, MathLab, Mathematica, SPSS, ... • Provide downloads in the formats most commonly used by your users: • Raw data: Excel, CSV, feeds (R, Excel live feeds, APIs) • Charts and visualizations: Bitmap, vector, PPT, embeds?
  34. 34. Computers Humans• Structure • Understand and use • Simple formats • Search / Discovery • Indexes, unique IDs and • Visualization meta-data • Download • FAQs and feedback channels | B EST PR ACT ICE S fo r PUBL IS HI NG D ATA | Hjalmar Gislason, hg@datamarket.com | October 2012
  35. 35. F I N D A N D U N D E R S TA N D D ATA Hjalmar Gislason, founder & CEOTwitter: @datamarket · Facebook: DataMarket · E-mail: hg@datamarket.com

×