Metadata, Extrametadata & Crowdknowing      Fostering Big Open Data in government            through Open Collaboration   ...
CROWDKNOWING                     Human-powered,                  Machine-accelerated,        Collective Knowledge Systems ...
0. Huge Open Data1. Extract Metadata2. Derive ExtraMetadata  (Semantics + Statistics + Algorithm + Crowd)3. Do Federated Q...
Crowdknowing     Human-powered, Machine-accelerated,        Collective Knowledge Systems                                  ...
a Semantic Data Dictionary                             5
Semantic Steroids• Searchable  • Faceted Search  • Drilldown• Interlinked• Semantic Browsing• Queryable• Query Results For...
NYCFacets Spider             v0.5• Crawls NYC Open Data Catalog every  weekend• RESTFul API• Extracts metadata & derive ex...
MetadataTop Level Metadata         Detail Metadata   •   Name/ID                •   Column Names   •   Category           ...
9
ExtraMetadata?• Derived using Algorithm & the Crowd”   “Semantics, Statistics,• “Supercharacterize” by sampling the underl...
ExtraMetadataTop Level                    DetailExtraMetadata                ExtraMetadata  •   Number of Rows           •...
12
13
“Crowd”Microconversations/contributions  •   Overall Rating  •   Comments (comment rating)  •   Bug Reports (data quality)...
Crowdknowing     Human-powered, Machine-accelerated,        Collective Knowledge Systems                                  ...
• More Datasources!• Not just Metadata!• Federated Queries!• SPARQL endpoint• Bugzilla Integration• Collaborative Ontology...
We need your help & feedback        A Smart Data Exchange for All Data NYC                  Find out more at          http...
CREDITS• Flickr User Weston Price, Paleo-Caveman-  Omnivore-LowCarb-Meat-Diet-Info (http://  www.flickr.com/photos/paleo-at...
Upcoming SlideShare
Loading in …5
×

NYCFacets: Metadata, Extrametadata and Crowdknowing

358 views

Published on

A quick "under-the-hood" look at Ontodia's winning entry for NYCBigApps 3.0 - NYCFacets.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

NYCFacets: Metadata, Extrametadata and Crowdknowing

  1. 1. Metadata, Extrametadata & Crowdknowing Fostering Big Open Data in government through Open Collaboration Ontolog - “Big Open Data” session 2 May 17, 2012 Joel Natividad, co-founder @jqnatividad 1
  2. 2. CROWDKNOWING Human-powered, Machine-accelerated, Collective Knowledge Systems 2
  3. 3. 0. Huge Open Data1. Extract Metadata2. Derive ExtraMetadata (Semantics + Statistics + Algorithm + Crowd)3. Do Federated Queries on both the Metadata AND the DataCrowdknowing 3
  4. 4. Crowdknowing Human-powered, Machine-accelerated, Collective Knowledge Systems Ontology, Inferencing, Semantic Curation, Comments, Mapping, Query Federation, Statistics, Feedback, Bug Reports, Pattern Recognition, MultivariateLikes, Shares, Profile, Votes, Analysis & Forecasting, Automated Subscribes, Tagging, linking, Feeds, Notifications etc. etc. etc. etc. etc. etc. 4
  5. 5. a Semantic Data Dictionary 5
  6. 6. Semantic Steroids• Searchable • Faceted Search • Drilldown• Interlinked• Semantic Browsing• Queryable• Query Results Formats ~3.5M facts~950 datasets/views 6
  7. 7. NYCFacets Spider v0.5• Crawls NYC Open Data Catalog every weekend• RESTFul API• Extracts metadata & derive extrametadata• Pumps the data into NYCFacets 7
  8. 8. MetadataTop Level Metadata Detail Metadata • Name/ID • Column Names • Category • Datatype • Dataset Type • Width, etc. • Attribution • Owner ID, etc. 8
  9. 9. 9
  10. 10. ExtraMetadata?• Derived using Algorithm & the Crowd” “Semantics, Statistics,• “Supercharacterize” by sampling the underlying not just the schema, but each dataset data as well• Score each dataset - Pediacities Rank• Virtuous Feedback Loop around the Data micro-conversations/contributions 10
  11. 11. ExtraMetadataTop Level DetailExtraMetadata ExtraMetadata • Number of Rows • Top Values • Pediacities Rank • Descriptive statistics • Freshness Score • Nulls/Non-nulls • Sparseness Score • Smallest Value • Social Score • Largest Value • Views Score • “Uniqueness” • Download Score • Rating Score • Simple Visualization 11
  12. 12. 12
  13. 13. 13
  14. 14. “Crowd”Microconversations/contributions • Overall Rating • Comments (comment rating) • Bug Reports (data quality) • Likes/Shares • Downloads 14
  15. 15. Crowdknowing Human-powered, Machine-accelerated, Collective Knowledge Systems Ontology, Inferencing, Semantic Curation, Comments, Mapping, Query Federation, Statistics, Feedback, Bug Reports, Pattern Recognition, MultivariateLikes, Shares, Profile, Votes, Analysis & Forecasting, Automated Subscribes, Tagging, linking, Feeds, Notifications etc. etc. etc. etc. etc. etc. 15
  16. 16. • More Datasources!• Not just Metadata!• Federated Queries!• SPARQL endpoint• Bugzilla Integration• Collaborative Ontology Modeling• Feeds• Microcontributions• Gamification• In time for NYCBigApps 4.0 16
  17. 17. We need your help & feedback A Smart Data Exchange for All Data NYC Find out more at http://nyc.pediacities.com/facets@jqnatividad @samimirzabaig @pediacities @ontodia 17
  18. 18. CREDITS• Flickr User Weston Price, Paleo-Caveman- Omnivore-LowCarb-Meat-Diet-Info (http:// www.flickr.com/photos/paleo-atkins-meat- diet-info/with/6718805047/)• Flickr User Gao Yi (http://www.flickr.com/ photos/gaoyi/178514677/) 18

×