SlideShare a Scribd company logo
1 of 18
Download to read offline
Metadata, Extrametadata & Crowdknowing
      Fostering 'Big Open Data' in government
            through Open Collaboration
             Ontolog - “Big Open Data” session 2
                        May 17, 2012




          Joel Natividad, co-founder
                     @jqnatividad
                                                   1
CROWDKNOWING




                     Human-powered,
                  Machine-accelerated,
        Collective Knowledge Systems
                                   2
0. Huge Open Data
1. Extract Metadata

2. Derive ExtraMetadata
  (Semantics + Statistics + Algorithm + Crowd)


3. Do Federated Queries on both the
   Metadata AND the Data



Crowdknowing
                                                 3
Crowdknowing
     Human-powered, Machine-accelerated,
        Collective Knowledge Systems




                                   Ontology, Inferencing, Semantic
   Curation, Comments,
                                 Mapping, Query Federation, Statistics,
  Feedback, Bug Reports,
                                   Pattern Recognition, Multivariate
Likes, Shares, Profile, Votes,
                                  Analysis & Forecasting, Automated
    Subscribes, Tagging,
                                     linking, Feeds, Notifications
        etc. etc. etc.
                                             etc. etc. etc.            4
a Semantic Data Dictionary




                             5
Semantic Steroids
• Searchable
  • Faceted Search
  • Drilldown
• Interlinked
• Semantic Browsing
• Queryable
• Query Results Formats
   ~3.5M facts
~950 datasets/views



                   6
NYCFacets Spider
             v0.5
• Crawls NYC Open Data Catalog every
  weekend
• RESTFul API
• Extracts metadata & derive extrametadata
• Pumps the data into NYCFacets
                                             7
Metadata
Top Level Metadata         Detail Metadata

   •   Name/ID                •   Column Names

   •   Category               •   Datatype

   •   Dataset Type           •   Width, etc.

   •   Attribution

   •   Owner ID, etc.



                                                 8
9
ExtraMetadata?
• Derived using Algorithm & the Crowd”
   “Semantics, Statistics,

• “Supercharacterize” by sampling the underlying
  not just the schema, but
                           each dataset
  data as well

• Score each dataset - Pediacities Rank
• Virtuous Feedback Loop around the Data
  micro-conversations/contributions
                                                   10
ExtraMetadata
Top Level                    Detail
ExtraMetadata                ExtraMetadata

  •   Number of Rows           •   Top Values

  •   Pediacities Rank         •   Descriptive statistics
      •   Freshness Score          •   Nulls/Non-nulls
      •   Sparseness Score         •   Smallest Value
      •   Social Score             •   Largest Value
      •   Views Score              •   “Uniqueness”
      •   Download Score
      •   Rating Score
                               •   Simple Visualization


                                                            11
12
13
“Crowd”

Microconversations/contributions
  •   Overall Rating

  •   Comments (comment rating)

  •   Bug Reports (data quality)

  •   Likes/Shares

  •   Downloads


                                   14
Crowdknowing
     Human-powered, Machine-accelerated,
        Collective Knowledge Systems




                                   Ontology, Inferencing, Semantic
   Curation, Comments,
                                 Mapping, Query Federation, Statistics,
  Feedback, Bug Reports,
                                   Pattern Recognition, Multivariate
Likes, Shares, Profile, Votes,
                                  Analysis & Forecasting, Automated
    Subscribes, Tagging,
                                     linking, Feeds, Notifications
        etc. etc. etc.
                                             etc. etc. etc.          15
• More Datasources!
• Not just Metadata!
• Federated Queries!
• SPARQL endpoint
• Bugzilla Integration
• Collaborative Ontology Modeling
• Feeds
• Microcontributions
• Gamification
• In time for NYCBigApps 4.0
                                    16
We need your help & feedback




        A Smart Data Exchange for All Data NYC

                  Find out more at
          http://nyc.pediacities.com/facets

@jqnatividad @samimirzabaig @pediacities @ontodia
                                                    17
CREDITS

• Flickr User Weston Price, Paleo-Caveman-
  Omnivore-LowCarb-Meat-Diet-Info (http://
  www.flickr.com/photos/paleo-atkins-meat-
  diet-info/with/6718805047/)
• Flickr User Gao Yi (http://www.flickr.com/
  photos/gaoyi/178514677/)


                                              18

More Related Content

What's hot

CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...
CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...
CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...
Crossref
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
dataminers.ir
 

What's hot (14)

Federated Search in a Disparate Environment
Federated Search in a Disparate EnvironmentFederated Search in a Disparate Environment
Federated Search in a Disparate Environment
 
hack4knowledge - Mendeley API
hack4knowledge - Mendeley APIhack4knowledge - Mendeley API
hack4knowledge - Mendeley API
 
Mendeley Open API
Mendeley Open APIMendeley Open API
Mendeley Open API
 
Presentation federated search
Presentation federated searchPresentation federated search
Presentation federated search
 
Federated Search: The Good, The Bad And The Ugly
Federated Search: The Good, The Bad And The UglyFederated Search: The Good, The Bad And The Ugly
Federated Search: The Good, The Bad And The Ugly
 
Federated Search Falls Short
Federated Search Falls ShortFederated Search Falls Short
Federated Search Falls Short
 
Working with ROR as a Crossref member: what you need to know
Working with ROR as a Crossref member: what you need to knowWorking with ROR as a Crossref member: what you need to know
Working with ROR as a Crossref member: what you need to know
 
Linked library data
Linked library dataLinked library data
Linked library data
 
Data quality problem and solution
Data quality problem and solutionData quality problem and solution
Data quality problem and solution
 
VictorCassen
VictorCassenVictorCassen
VictorCassen
 
Automatic Metadata Generation Charles Duncan
Automatic Metadata Generation Charles DuncanAutomatic Metadata Generation Charles Duncan
Automatic Metadata Generation Charles Duncan
 
Web Information Retrieval and Mining
Web Information Retrieval and MiningWeb Information Retrieval and Mining
Web Information Retrieval and Mining
 
CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...
CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...
CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 

Viewers also liked

Project VCF learning so far
Project VCF learning so far Project VCF learning so far
Project VCF learning so far
Anand Mangalam
 

Viewers also liked (18)

Project VCF learning so far
Project VCF learning so far Project VCF learning so far
Project VCF learning so far
 
CityMission
CityMissionCityMission
CityMission
 
Smart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 KeynoteSmart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
 
The Next Generation of Open Data
The Next Generation of Open DataThe Next Generation of Open Data
The Next Generation of Open Data
 
The Coming Web of Data
The Coming Web of DataThe Coming Web of Data
The Coming Web of Data
 
Microsoft word
Microsoft wordMicrosoft word
Microsoft word
 
Guia de illustrator 23 11-15
Guia de illustrator 23 11-15Guia de illustrator 23 11-15
Guia de illustrator 23 11-15
 
Smart Cities and Big Open Data
Smart Cities and Big Open DataSmart Cities and Big Open Data
Smart Cities and Big Open Data
 
NYCBigApps 2013 Expo/Hackathon Talk
NYCBigApps 2013 Expo/Hackathon TalkNYCBigApps 2013 Expo/Hackathon Talk
NYCBigApps 2013 Expo/Hackathon Talk
 
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012Ontodia Overview - Semantics and Wikis panel - SemTech West 2012
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012
 
Effortless Hr Offering Presentation
Effortless Hr Offering PresentationEffortless Hr Offering Presentation
Effortless Hr Offering Presentation
 
clase visual basic
clase visual basicclase visual basic
clase visual basic
 
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYCNYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
 
NYC Remapped
NYC RemappedNYC Remapped
NYC Remapped
 
Practica word
Practica wordPractica word
Practica word
 
Ejercicios practicos de excel ii
Ejercicios practicos de excel iiEjercicios practicos de excel ii
Ejercicios practicos de excel ii
 
Raw data in, Insights out - CKANcon 2015
Raw data in, Insights out - CKANcon 2015Raw data in, Insights out - CKANcon 2015
Raw data in, Insights out - CKANcon 2015
 
Open source in government
Open source in governmentOpen source in government
Open source in government
 

Similar to NYCFacets: Metadata, Extrametadata and Crowdknowing

Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysis
Open Analytics
 
Unit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptxUnit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptx
subhashchandra197
 

Similar to NYCFacets: Metadata, Extrametadata and Crowdknowing (20)

Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysis
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysis
 
Data Mining
Data MiningData Mining
Data Mining
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
Discovering Big Data in the Fog: Why Catalogs Matter
 Discovering Big Data in the Fog: Why Catalogs Matter Discovering Big Data in the Fog: Why Catalogs Matter
Discovering Big Data in the Fog: Why Catalogs Matter
 
Unit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptxUnit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptx
 
Göteborg university(condensed)
Göteborg university(condensed)Göteborg university(condensed)
Göteborg university(condensed)
 
Emerging Data Citation Infrastructure
Emerging Data Citation InfrastructureEmerging Data Citation Infrastructure
Emerging Data Citation Infrastructure
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Applications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayApplications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World Today
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorial
 
Big databigideasit4bc
Big databigideasit4bcBig databigideasit4bc
Big databigideasit4bc
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
BAS 250 Lecture 1
BAS 250 Lecture 1BAS 250 Lecture 1
BAS 250 Lecture 1
 
Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...
 
SMART Seminar Series: SMART Data Management
SMART Seminar Series: SMART Data ManagementSMART Seminar Series: SMART Data Management
SMART Seminar Series: SMART Data Management
 
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationNeo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
 
Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)
Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)
Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & Analysis
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 

NYCFacets: Metadata, Extrametadata and Crowdknowing

  • 1. Metadata, Extrametadata & Crowdknowing Fostering 'Big Open Data' in government through Open Collaboration Ontolog - “Big Open Data” session 2 May 17, 2012 Joel Natividad, co-founder @jqnatividad 1
  • 2. CROWDKNOWING Human-powered, Machine-accelerated, Collective Knowledge Systems 2
  • 3. 0. Huge Open Data 1. Extract Metadata 2. Derive ExtraMetadata (Semantics + Statistics + Algorithm + Crowd) 3. Do Federated Queries on both the Metadata AND the Data Crowdknowing 3
  • 4. Crowdknowing Human-powered, Machine-accelerated, Collective Knowledge Systems Ontology, Inferencing, Semantic Curation, Comments, Mapping, Query Federation, Statistics, Feedback, Bug Reports, Pattern Recognition, Multivariate Likes, Shares, Profile, Votes, Analysis & Forecasting, Automated Subscribes, Tagging, linking, Feeds, Notifications etc. etc. etc. etc. etc. etc. 4
  • 5. a Semantic Data Dictionary 5
  • 6. Semantic Steroids • Searchable • Faceted Search • Drilldown • Interlinked • Semantic Browsing • Queryable • Query Results Formats ~3.5M facts ~950 datasets/views 6
  • 7. NYCFacets Spider v0.5 • Crawls NYC Open Data Catalog every weekend • RESTFul API • Extracts metadata & derive extrametadata • Pumps the data into NYCFacets 7
  • 8. Metadata Top Level Metadata Detail Metadata • Name/ID • Column Names • Category • Datatype • Dataset Type • Width, etc. • Attribution • Owner ID, etc. 8
  • 9. 9
  • 10. ExtraMetadata? • Derived using Algorithm & the Crowd” “Semantics, Statistics, • “Supercharacterize” by sampling the underlying not just the schema, but each dataset data as well • Score each dataset - Pediacities Rank • Virtuous Feedback Loop around the Data micro-conversations/contributions 10
  • 11. ExtraMetadata Top Level Detail ExtraMetadata ExtraMetadata • Number of Rows • Top Values • Pediacities Rank • Descriptive statistics • Freshness Score • Nulls/Non-nulls • Sparseness Score • Smallest Value • Social Score • Largest Value • Views Score • “Uniqueness” • Download Score • Rating Score • Simple Visualization 11
  • 12. 12
  • 13. 13
  • 14. “Crowd” Microconversations/contributions • Overall Rating • Comments (comment rating) • Bug Reports (data quality) • Likes/Shares • Downloads 14
  • 15. Crowdknowing Human-powered, Machine-accelerated, Collective Knowledge Systems Ontology, Inferencing, Semantic Curation, Comments, Mapping, Query Federation, Statistics, Feedback, Bug Reports, Pattern Recognition, Multivariate Likes, Shares, Profile, Votes, Analysis & Forecasting, Automated Subscribes, Tagging, linking, Feeds, Notifications etc. etc. etc. etc. etc. etc. 15
  • 16. • More Datasources! • Not just Metadata! • Federated Queries! • SPARQL endpoint • Bugzilla Integration • Collaborative Ontology Modeling • Feeds • Microcontributions • Gamification • In time for NYCBigApps 4.0 16
  • 17. We need your help & feedback A Smart Data Exchange for All Data NYC Find out more at http://nyc.pediacities.com/facets @jqnatividad @samimirzabaig @pediacities @ontodia 17
  • 18. CREDITS • Flickr User Weston Price, Paleo-Caveman- Omnivore-LowCarb-Meat-Diet-Info (http:// www.flickr.com/photos/paleo-atkins-meat- diet-info/with/6718805047/) • Flickr User Gao Yi (http://www.flickr.com/ photos/gaoyi/178514677/) 18