Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Metadata, Extrametadata & Crowdknowing      Fostering Big Open Data in government            through Open Collaboration   ...
CROWDKNOWING                     Human-powered,                  Machine-accelerated,        Collective Knowledge Systems ...
0. Huge Open Data1. Extract Metadata2. Derive ExtraMetadata  (Semantics + Statistics + Algorithm + Crowd)3. Do Federated Q...
Crowdknowing     Human-powered, Machine-accelerated,        Collective Knowledge Systems                                  ...
a Semantic Data Dictionary                             5
Semantic Steroids• Searchable  • Faceted Search  • Drilldown• Interlinked• Semantic Browsing• Queryable• Query Results For...
NYCFacets Spider             v0.5• Crawls NYC Open Data Catalog every  weekend• RESTFul API• Extracts metadata & derive ex...
MetadataTop Level Metadata         Detail Metadata   •   Name/ID                •   Column Names   •   Category           ...
9
ExtraMetadata?• Derived using Algorithm & the Crowd”   “Semantics, Statistics,• “Supercharacterize” by sampling the underl...
ExtraMetadataTop Level                    DetailExtraMetadata                ExtraMetadata  •   Number of Rows           •...
12
13
“Crowd”Microconversations/contributions  •   Overall Rating  •   Comments (comment rating)  •   Bug Reports (data quality)...
Crowdknowing     Human-powered, Machine-accelerated,        Collective Knowledge Systems                                  ...
• More Datasources!• Not just Metadata!• Federated Queries!• SPARQL endpoint• Bugzilla Integration• Collaborative Ontology...
We need your help & feedback        A Smart Data Exchange for All Data NYC                  Find out more at          http...
CREDITS• Flickr User Weston Price, Paleo-Caveman-  Omnivore-LowCarb-Meat-Diet-Info (http://  www.flickr.com/photos/paleo-at...
Upcoming SlideShare
Loading in …5
×

NYCFacets: Metadata, Extrametadata and Crowdknowing

366 views

Published on

A quick "under-the-hood" look at Ontodia's winning entry for NYCBigApps 3.0 - NYCFacets.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

NYCFacets: Metadata, Extrametadata and Crowdknowing

  1. 1. Metadata, Extrametadata & Crowdknowing Fostering Big Open Data in government through Open Collaboration Ontolog - “Big Open Data” session 2 May 17, 2012 Joel Natividad, co-founder @jqnatividad 1
  2. 2. CROWDKNOWING Human-powered, Machine-accelerated, Collective Knowledge Systems 2
  3. 3. 0. Huge Open Data1. Extract Metadata2. Derive ExtraMetadata (Semantics + Statistics + Algorithm + Crowd)3. Do Federated Queries on both the Metadata AND the DataCrowdknowing 3
  4. 4. Crowdknowing Human-powered, Machine-accelerated, Collective Knowledge Systems Ontology, Inferencing, Semantic Curation, Comments, Mapping, Query Federation, Statistics, Feedback, Bug Reports, Pattern Recognition, MultivariateLikes, Shares, Profile, Votes, Analysis & Forecasting, Automated Subscribes, Tagging, linking, Feeds, Notifications etc. etc. etc. etc. etc. etc. 4
  5. 5. a Semantic Data Dictionary 5
  6. 6. Semantic Steroids• Searchable • Faceted Search • Drilldown• Interlinked• Semantic Browsing• Queryable• Query Results Formats ~3.5M facts~950 datasets/views 6
  7. 7. NYCFacets Spider v0.5• Crawls NYC Open Data Catalog every weekend• RESTFul API• Extracts metadata & derive extrametadata• Pumps the data into NYCFacets 7
  8. 8. MetadataTop Level Metadata Detail Metadata • Name/ID • Column Names • Category • Datatype • Dataset Type • Width, etc. • Attribution • Owner ID, etc. 8
  9. 9. 9
  10. 10. ExtraMetadata?• Derived using Algorithm & the Crowd” “Semantics, Statistics,• “Supercharacterize” by sampling the underlying not just the schema, but each dataset data as well• Score each dataset - Pediacities Rank• Virtuous Feedback Loop around the Data micro-conversations/contributions 10
  11. 11. ExtraMetadataTop Level DetailExtraMetadata ExtraMetadata • Number of Rows • Top Values • Pediacities Rank • Descriptive statistics • Freshness Score • Nulls/Non-nulls • Sparseness Score • Smallest Value • Social Score • Largest Value • Views Score • “Uniqueness” • Download Score • Rating Score • Simple Visualization 11
  12. 12. 12
  13. 13. 13
  14. 14. “Crowd”Microconversations/contributions • Overall Rating • Comments (comment rating) • Bug Reports (data quality) • Likes/Shares • Downloads 14
  15. 15. Crowdknowing Human-powered, Machine-accelerated, Collective Knowledge Systems Ontology, Inferencing, Semantic Curation, Comments, Mapping, Query Federation, Statistics, Feedback, Bug Reports, Pattern Recognition, MultivariateLikes, Shares, Profile, Votes, Analysis & Forecasting, Automated Subscribes, Tagging, linking, Feeds, Notifications etc. etc. etc. etc. etc. etc. 15
  16. 16. • More Datasources!• Not just Metadata!• Federated Queries!• SPARQL endpoint• Bugzilla Integration• Collaborative Ontology Modeling• Feeds• Microcontributions• Gamification• In time for NYCBigApps 4.0 16
  17. 17. We need your help & feedback A Smart Data Exchange for All Data NYC Find out more at http://nyc.pediacities.com/facets@jqnatividad @samimirzabaig @pediacities @ontodia 17
  18. 18. CREDITS• Flickr User Weston Price, Paleo-Caveman- Omnivore-LowCarb-Meat-Diet-Info (http:// www.flickr.com/photos/paleo-atkins-meat- diet-info/with/6718805047/)• Flickr User Gao Yi (http://www.flickr.com/ photos/gaoyi/178514677/) 18

×