NYCFacets: Metadata, Extrametadata and Crowdknowing

278 views
245 views

Published on

A quick "under-the-hood" look at Ontodia's winning entry for NYCBigApps 3.0 - NYCFacets.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
278
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

NYCFacets: Metadata, Extrametadata and Crowdknowing

  1. 1. Metadata, Extrametadata & Crowdknowing Fostering Big Open Data in government through Open Collaboration Ontolog - “Big Open Data” session 2 May 17, 2012 Joel Natividad, co-founder @jqnatividad 1
  2. 2. CROWDKNOWING Human-powered, Machine-accelerated, Collective Knowledge Systems 2
  3. 3. 0. Huge Open Data1. Extract Metadata2. Derive ExtraMetadata (Semantics + Statistics + Algorithm + Crowd)3. Do Federated Queries on both the Metadata AND the DataCrowdknowing 3
  4. 4. Crowdknowing Human-powered, Machine-accelerated, Collective Knowledge Systems Ontology, Inferencing, Semantic Curation, Comments, Mapping, Query Federation, Statistics, Feedback, Bug Reports, Pattern Recognition, MultivariateLikes, Shares, Profile, Votes, Analysis & Forecasting, Automated Subscribes, Tagging, linking, Feeds, Notifications etc. etc. etc. etc. etc. etc. 4
  5. 5. a Semantic Data Dictionary 5
  6. 6. Semantic Steroids• Searchable • Faceted Search • Drilldown• Interlinked• Semantic Browsing• Queryable• Query Results Formats ~3.5M facts~950 datasets/views 6
  7. 7. NYCFacets Spider v0.5• Crawls NYC Open Data Catalog every weekend• RESTFul API• Extracts metadata & derive extrametadata• Pumps the data into NYCFacets 7
  8. 8. MetadataTop Level Metadata Detail Metadata • Name/ID • Column Names • Category • Datatype • Dataset Type • Width, etc. • Attribution • Owner ID, etc. 8
  9. 9. 9
  10. 10. ExtraMetadata?• Derived using Algorithm & the Crowd” “Semantics, Statistics,• “Supercharacterize” by sampling the underlying not just the schema, but each dataset data as well• Score each dataset - Pediacities Rank• Virtuous Feedback Loop around the Data micro-conversations/contributions 10
  11. 11. ExtraMetadataTop Level DetailExtraMetadata ExtraMetadata • Number of Rows • Top Values • Pediacities Rank • Descriptive statistics • Freshness Score • Nulls/Non-nulls • Sparseness Score • Smallest Value • Social Score • Largest Value • Views Score • “Uniqueness” • Download Score • Rating Score • Simple Visualization 11
  12. 12. 12
  13. 13. 13
  14. 14. “Crowd”Microconversations/contributions • Overall Rating • Comments (comment rating) • Bug Reports (data quality) • Likes/Shares • Downloads 14
  15. 15. Crowdknowing Human-powered, Machine-accelerated, Collective Knowledge Systems Ontology, Inferencing, Semantic Curation, Comments, Mapping, Query Federation, Statistics, Feedback, Bug Reports, Pattern Recognition, MultivariateLikes, Shares, Profile, Votes, Analysis & Forecasting, Automated Subscribes, Tagging, linking, Feeds, Notifications etc. etc. etc. etc. etc. etc. 15
  16. 16. • More Datasources!• Not just Metadata!• Federated Queries!• SPARQL endpoint• Bugzilla Integration• Collaborative Ontology Modeling• Feeds• Microcontributions• Gamification• In time for NYCBigApps 4.0 16
  17. 17. We need your help & feedback A Smart Data Exchange for All Data NYC Find out more at http://nyc.pediacities.com/facets@jqnatividad @samimirzabaig @pediacities @ontodia 17
  18. 18. CREDITS• Flickr User Weston Price, Paleo-Caveman- Omnivore-LowCarb-Meat-Diet-Info (http:// www.flickr.com/photos/paleo-atkins-meat- diet-info/with/6718805047/)• Flickr User Gao Yi (http://www.flickr.com/ photos/gaoyi/178514677/) 18

×