SlideShare a Scribd company logo
1 of 18
Download to read offline
Metadata, Extrametadata & Crowdknowing
      Fostering 'Big Open Data' in government
            through Open Collaboration
             Ontolog - “Big Open Data” session 2
                        May 17, 2012




          Joel Natividad, co-founder
                     @jqnatividad
                                                   1
CROWDKNOWING




                     Human-powered,
                  Machine-accelerated,
        Collective Knowledge Systems
                                   2
0. Huge Open Data
1. Extract Metadata

2. Derive ExtraMetadata
  (Semantics + Statistics + Algorithm + Crowd)


3. Do Federated Queries on both the
   Metadata AND the Data



Crowdknowing
                                                 3
Crowdknowing
     Human-powered, Machine-accelerated,
        Collective Knowledge Systems




                                   Ontology, Inferencing, Semantic
   Curation, Comments,
                                 Mapping, Query Federation, Statistics,
  Feedback, Bug Reports,
                                   Pattern Recognition, Multivariate
Likes, Shares, Profile, Votes,
                                  Analysis & Forecasting, Automated
    Subscribes, Tagging,
                                     linking, Feeds, Notifications
        etc. etc. etc.
                                             etc. etc. etc.            4
a Semantic Data Dictionary




                             5
Semantic Steroids
• Searchable
  • Faceted Search
  • Drilldown
• Interlinked
• Semantic Browsing
• Queryable
• Query Results Formats
   ~3.5M facts
~950 datasets/views



                   6
NYCFacets Spider
             v0.5
• Crawls NYC Open Data Catalog every
  weekend
• RESTFul API
• Extracts metadata & derive extrametadata
• Pumps the data into NYCFacets
                                             7
Metadata
Top Level Metadata         Detail Metadata

   •   Name/ID                •   Column Names

   •   Category               •   Datatype

   •   Dataset Type           •   Width, etc.

   •   Attribution

   •   Owner ID, etc.



                                                 8
9
ExtraMetadata?
• Derived using Algorithm & the Crowd”
   “Semantics, Statistics,

• “Supercharacterize” by sampling the underlying
  not just the schema, but
                           each dataset
  data as well

• Score each dataset - Pediacities Rank
• Virtuous Feedback Loop around the Data
  micro-conversations/contributions
                                                   10
ExtraMetadata
Top Level                    Detail
ExtraMetadata                ExtraMetadata

  •   Number of Rows           •   Top Values

  •   Pediacities Rank         •   Descriptive statistics
      •   Freshness Score          •   Nulls/Non-nulls
      •   Sparseness Score         •   Smallest Value
      •   Social Score             •   Largest Value
      •   Views Score              •   “Uniqueness”
      •   Download Score
      •   Rating Score
                               •   Simple Visualization


                                                            11
12
13
“Crowd”

Microconversations/contributions
  •   Overall Rating

  •   Comments (comment rating)

  •   Bug Reports (data quality)

  •   Likes/Shares

  •   Downloads


                                   14
Crowdknowing
     Human-powered, Machine-accelerated,
        Collective Knowledge Systems




                                   Ontology, Inferencing, Semantic
   Curation, Comments,
                                 Mapping, Query Federation, Statistics,
  Feedback, Bug Reports,
                                   Pattern Recognition, Multivariate
Likes, Shares, Profile, Votes,
                                  Analysis & Forecasting, Automated
    Subscribes, Tagging,
                                     linking, Feeds, Notifications
        etc. etc. etc.
                                             etc. etc. etc.          15
• More Datasources!
• Not just Metadata!
• Federated Queries!
• SPARQL endpoint
• Bugzilla Integration
• Collaborative Ontology Modeling
• Feeds
• Microcontributions
• Gamification
• In time for NYCBigApps 4.0
                                    16
We need your help & feedback




        A Smart Data Exchange for All Data NYC

                  Find out more at
          http://nyc.pediacities.com/facets

@jqnatividad @samimirzabaig @pediacities @ontodia
                                                    17
CREDITS

• Flickr User Weston Price, Paleo-Caveman-
  Omnivore-LowCarb-Meat-Diet-Info (http://
  www.flickr.com/photos/paleo-atkins-meat-
  diet-info/with/6718805047/)
• Flickr User Gao Yi (http://www.flickr.com/
  photos/gaoyi/178514677/)


                                              18

More Related Content

What's hot

Federated Search in a Disparate Environment
Federated Search in a Disparate EnvironmentFederated Search in a Disparate Environment
Federated Search in a Disparate EnvironmentHelen Mitchell
 
Mendeley Open API
Mendeley Open APIMendeley Open API
Mendeley Open APIBen Dowling
 
Federated Search: The Good, The Bad And The Ugly
Federated Search: The Good, The Bad And The UglyFederated Search: The Good, The Bad And The Ugly
Federated Search: The Good, The Bad And The Uglydorishelfer
 
Federated Search Falls Short
Federated Search Falls ShortFederated Search Falls Short
Federated Search Falls Shortslknight
 
Working with ROR as a Crossref member: what you need to know
Working with ROR as a Crossref member: what you need to knowWorking with ROR as a Crossref member: what you need to know
Working with ROR as a Crossref member: what you need to knowCrossref
 
Data quality problem and solution
Data quality problem and solutionData quality problem and solution
Data quality problem and solutionPunk Milton
 
Automatic Metadata Generation Charles Duncan
Automatic Metadata Generation Charles DuncanAutomatic Metadata Generation Charles Duncan
Automatic Metadata Generation Charles DuncanJISC CETIS
 
CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...
CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...
CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...Crossref
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Miningdataminers.ir
 

What's hot (14)

Federated Search in a Disparate Environment
Federated Search in a Disparate EnvironmentFederated Search in a Disparate Environment
Federated Search in a Disparate Environment
 
hack4knowledge - Mendeley API
hack4knowledge - Mendeley APIhack4knowledge - Mendeley API
hack4knowledge - Mendeley API
 
Mendeley Open API
Mendeley Open APIMendeley Open API
Mendeley Open API
 
Presentation federated search
Presentation federated searchPresentation federated search
Presentation federated search
 
Federated Search: The Good, The Bad And The Ugly
Federated Search: The Good, The Bad And The UglyFederated Search: The Good, The Bad And The Ugly
Federated Search: The Good, The Bad And The Ugly
 
Federated Search Falls Short
Federated Search Falls ShortFederated Search Falls Short
Federated Search Falls Short
 
Working with ROR as a Crossref member: what you need to know
Working with ROR as a Crossref member: what you need to knowWorking with ROR as a Crossref member: what you need to know
Working with ROR as a Crossref member: what you need to know
 
Linked library data
Linked library dataLinked library data
Linked library data
 
Data quality problem and solution
Data quality problem and solutionData quality problem and solution
Data quality problem and solution
 
VictorCassen
VictorCassenVictorCassen
VictorCassen
 
Automatic Metadata Generation Charles Duncan
Automatic Metadata Generation Charles DuncanAutomatic Metadata Generation Charles Duncan
Automatic Metadata Generation Charles Duncan
 
Web Information Retrieval and Mining
Web Information Retrieval and MiningWeb Information Retrieval and Mining
Web Information Retrieval and Mining
 
CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...
CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...
CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 

Viewers also liked

Project VCF learning so far
Project VCF learning so far Project VCF learning so far
Project VCF learning so far Anand Mangalam
 
Smart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 KeynoteSmart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 KeynoteJoel Natividad
 
The Next Generation of Open Data
The Next Generation of Open DataThe Next Generation of Open Data
The Next Generation of Open DataJoel Natividad
 
The Coming Web of Data
The Coming Web of DataThe Coming Web of Data
The Coming Web of DataJoel Natividad
 
Microsoft word
Microsoft wordMicrosoft word
Microsoft wordJosé Luis
 
Guia de illustrator 23 11-15
Guia de illustrator 23 11-15Guia de illustrator 23 11-15
Guia de illustrator 23 11-15José Luis
 
Smart Cities and Big Open Data
Smart Cities and Big Open DataSmart Cities and Big Open Data
Smart Cities and Big Open DataJoel Natividad
 
NYCBigApps 2013 Expo/Hackathon Talk
NYCBigApps 2013 Expo/Hackathon TalkNYCBigApps 2013 Expo/Hackathon Talk
NYCBigApps 2013 Expo/Hackathon TalkJoel Natividad
 
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012Ontodia Overview - Semantics and Wikis panel - SemTech West 2012
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012Joel Natividad
 
Effortless Hr Offering Presentation
Effortless Hr Offering PresentationEffortless Hr Offering Presentation
Effortless Hr Offering PresentationEffortlessHr1
 
clase visual basic
clase visual basicclase visual basic
clase visual basicJosé Luis
 
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYCNYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYCJoel Natividad
 
Ejercicios practicos de excel ii
Ejercicios practicos de excel iiEjercicios practicos de excel ii
Ejercicios practicos de excel iiJosé Luis
 
Raw data in, Insights out - CKANcon 2015
Raw data in, Insights out - CKANcon 2015Raw data in, Insights out - CKANcon 2015
Raw data in, Insights out - CKANcon 2015Joel Natividad
 
Open source in government
Open source in governmentOpen source in government
Open source in governmentJoel Natividad
 

Viewers also liked (18)

Project VCF learning so far
Project VCF learning so far Project VCF learning so far
Project VCF learning so far
 
CityMission
CityMissionCityMission
CityMission
 
Smart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 KeynoteSmart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
Smart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
 
The Next Generation of Open Data
The Next Generation of Open DataThe Next Generation of Open Data
The Next Generation of Open Data
 
The Coming Web of Data
The Coming Web of DataThe Coming Web of Data
The Coming Web of Data
 
Microsoft word
Microsoft wordMicrosoft word
Microsoft word
 
Guia de illustrator 23 11-15
Guia de illustrator 23 11-15Guia de illustrator 23 11-15
Guia de illustrator 23 11-15
 
Smart Cities and Big Open Data
Smart Cities and Big Open DataSmart Cities and Big Open Data
Smart Cities and Big Open Data
 
NYCBigApps 2013 Expo/Hackathon Talk
NYCBigApps 2013 Expo/Hackathon TalkNYCBigApps 2013 Expo/Hackathon Talk
NYCBigApps 2013 Expo/Hackathon Talk
 
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012Ontodia Overview - Semantics and Wikis panel - SemTech West 2012
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012
 
Effortless Hr Offering Presentation
Effortless Hr Offering PresentationEffortless Hr Offering Presentation
Effortless Hr Offering Presentation
 
clase visual basic
clase visual basicclase visual basic
clase visual basic
 
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYCNYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
 
NYC Remapped
NYC RemappedNYC Remapped
NYC Remapped
 
Practica word
Practica wordPractica word
Practica word
 
Ejercicios practicos de excel ii
Ejercicios practicos de excel iiEjercicios practicos de excel ii
Ejercicios practicos de excel ii
 
Raw data in, Insights out - CKANcon 2015
Raw data in, Insights out - CKANcon 2015Raw data in, Insights out - CKANcon 2015
Raw data in, Insights out - CKANcon 2015
 
Open source in government
Open source in governmentOpen source in government
Open source in government
 

Similar to NYCFacets: Metadata, Extrametadata and Crowdknowing

Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysisikanow
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisOpen Analytics
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...Carole Goble
 
Discovering Big Data in the Fog: Why Catalogs Matter
 Discovering Big Data in the Fog: Why Catalogs Matter Discovering Big Data in the Fog: Why Catalogs Matter
Discovering Big Data in the Fog: Why Catalogs MatterEric Kavanagh
 
Göteborg university(condensed)
Göteborg university(condensed)Göteborg university(condensed)
Göteborg university(condensed)Zenodia Charpy
 
Emerging Data Citation Infrastructure
Emerging Data Citation InfrastructureEmerging Data Citation Infrastructure
Emerging Data Citation InfrastructureMicah Altman
 
Applications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayApplications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayAmit Sheth
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialBarbara Starr
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysisPoonam Kshirsagar
 
Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...New York University
 
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationNeo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationTamikaTannis
 
Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)
Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)
Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)mars197365
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & AnalysisScott Sanders
 
Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...
Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...
Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...benaam
 

Similar to NYCFacets: Metadata, Extrametadata and Crowdknowing (20)

Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysis
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysis
 
Data Mining
Data MiningData Mining
Data Mining
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
Discovering Big Data in the Fog: Why Catalogs Matter
 Discovering Big Data in the Fog: Why Catalogs Matter Discovering Big Data in the Fog: Why Catalogs Matter
Discovering Big Data in the Fog: Why Catalogs Matter
 
Göteborg university(condensed)
Göteborg university(condensed)Göteborg university(condensed)
Göteborg university(condensed)
 
Emerging Data Citation Infrastructure
Emerging Data Citation InfrastructureEmerging Data Citation Infrastructure
Emerging Data Citation Infrastructure
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Applications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayApplications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World Today
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorial
 
Big databigideasit4bc
Big databigideasit4bcBig databigideasit4bc
Big databigideasit4bc
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
BAS 250 Lecture 1
BAS 250 Lecture 1BAS 250 Lecture 1
BAS 250 Lecture 1
 
Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...
 
SMART Seminar Series: SMART Data Management
SMART Seminar Series: SMART Data ManagementSMART Seminar Series: SMART Data Management
SMART Seminar Series: SMART Data Management
 
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationNeo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
 
Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)
Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)
Data, Metadata, GenAI (Seminar by IEEE, New Zealand North Section)
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & Analysis
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
 
Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...
Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...
Early Lessons from Building Sensor.Network: An Open Data Exchange for the Web...
 

Recently uploaded

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 

Recently uploaded (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 

NYCFacets: Metadata, Extrametadata and Crowdknowing

  • 1. Metadata, Extrametadata & Crowdknowing Fostering 'Big Open Data' in government through Open Collaboration Ontolog - “Big Open Data” session 2 May 17, 2012 Joel Natividad, co-founder @jqnatividad 1
  • 2. CROWDKNOWING Human-powered, Machine-accelerated, Collective Knowledge Systems 2
  • 3. 0. Huge Open Data 1. Extract Metadata 2. Derive ExtraMetadata (Semantics + Statistics + Algorithm + Crowd) 3. Do Federated Queries on both the Metadata AND the Data Crowdknowing 3
  • 4. Crowdknowing Human-powered, Machine-accelerated, Collective Knowledge Systems Ontology, Inferencing, Semantic Curation, Comments, Mapping, Query Federation, Statistics, Feedback, Bug Reports, Pattern Recognition, Multivariate Likes, Shares, Profile, Votes, Analysis & Forecasting, Automated Subscribes, Tagging, linking, Feeds, Notifications etc. etc. etc. etc. etc. etc. 4
  • 5. a Semantic Data Dictionary 5
  • 6. Semantic Steroids • Searchable • Faceted Search • Drilldown • Interlinked • Semantic Browsing • Queryable • Query Results Formats ~3.5M facts ~950 datasets/views 6
  • 7. NYCFacets Spider v0.5 • Crawls NYC Open Data Catalog every weekend • RESTFul API • Extracts metadata & derive extrametadata • Pumps the data into NYCFacets 7
  • 8. Metadata Top Level Metadata Detail Metadata • Name/ID • Column Names • Category • Datatype • Dataset Type • Width, etc. • Attribution • Owner ID, etc. 8
  • 9. 9
  • 10. ExtraMetadata? • Derived using Algorithm & the Crowd” “Semantics, Statistics, • “Supercharacterize” by sampling the underlying not just the schema, but each dataset data as well • Score each dataset - Pediacities Rank • Virtuous Feedback Loop around the Data micro-conversations/contributions 10
  • 11. ExtraMetadata Top Level Detail ExtraMetadata ExtraMetadata • Number of Rows • Top Values • Pediacities Rank • Descriptive statistics • Freshness Score • Nulls/Non-nulls • Sparseness Score • Smallest Value • Social Score • Largest Value • Views Score • “Uniqueness” • Download Score • Rating Score • Simple Visualization 11
  • 12. 12
  • 13. 13
  • 14. “Crowd” Microconversations/contributions • Overall Rating • Comments (comment rating) • Bug Reports (data quality) • Likes/Shares • Downloads 14
  • 15. Crowdknowing Human-powered, Machine-accelerated, Collective Knowledge Systems Ontology, Inferencing, Semantic Curation, Comments, Mapping, Query Federation, Statistics, Feedback, Bug Reports, Pattern Recognition, Multivariate Likes, Shares, Profile, Votes, Analysis & Forecasting, Automated Subscribes, Tagging, linking, Feeds, Notifications etc. etc. etc. etc. etc. etc. 15
  • 16. • More Datasources! • Not just Metadata! • Federated Queries! • SPARQL endpoint • Bugzilla Integration • Collaborative Ontology Modeling • Feeds • Microcontributions • Gamification • In time for NYCBigApps 4.0 16
  • 17. We need your help & feedback A Smart Data Exchange for All Data NYC Find out more at http://nyc.pediacities.com/facets @jqnatividad @samimirzabaig @pediacities @ontodia 17
  • 18. CREDITS • Flickr User Weston Price, Paleo-Caveman- Omnivore-LowCarb-Meat-Diet-Info (http:// www.flickr.com/photos/paleo-atkins-meat- diet-info/with/6718805047/) • Flickr User Gao Yi (http://www.flickr.com/ photos/gaoyi/178514677/) 18