The Knowledge Reengineering Bottleneck
Upcoming SlideShare
Loading in...5
×
 

The Knowledge Reengineering Bottleneck

on

  • 7,657 views

Keynote talk at CSHALS 2012 in Boston on the Knowledge Reengineering Bottleneck: knowledge engineering in the linked data age.

Keynote talk at CSHALS 2012 in Boston on the Knowledge Reengineering Bottleneck: knowledge engineering in the linked data age.

Statistics

Views

Total Views
7,657
Views on SlideShare
7,630
Embed Views
27

Actions

Likes
4
Downloads
55
Comments
0

5 Embeds 27

http://localhost 22
https://twitter.com 2
https://twimg0-a.akamaihd.net 1
http://qobesa-computer.blogspot.com 1
http://www.onlydoo.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The Knowledge Reengineering Bottleneck The Knowledge Reengineering Bottleneck Presentation Transcript

  • The Knowledge Reengineering Bottleneck Rinke Hoekstra rinke.hoekstra@vu.nl VU University Amsterdam/University of Amsterdamvrijdag 24 februari 12
  • Knowledge Engineering “Critical scientific problem [...] successful applied AI requires that knowledge move from the heads of experts into programs” FEIGENBAUM, E. A. (1984), Knowledge Engineering. Annals of the New York Academy of Sciences, 426: 91–107. doi: 10.1111/j.1749-6632.1984.tb16513.xvrijdag 24 februari 12
  • ‣ The lack of adequate and appropriate hardware ‣ Lack of cumulation of AI methods and techniques ‣ Shortage of trained knowledge engineers ‣ The problem of knowledge acquisition ‣ The development gap Problems of Knowledge Engineeringvrijdag 24 februari 12
  • Knowledge Acquisition Bottleneck “The problem of knowledge acquisition is the critical bottleneck problem in artificial intelligence” FEIGENBAUM, E. A. (1984), Knowledge Engineering. Annals of the New York Academy of Sciences, 426: 91–107. doi: 10.1111/j.1749-6632.1984.tb16513.xvrijdag 24 februari 12
  • The Dark Agesvrijdag 24 februari 12
  • Knowledge Elicitation Repertory Grids Think Aloud Method Cardsorting ...vrijdag 24 februari 12
  • Knowledge Elicitation Repertory Grids Think Aloud Method Cardsorting MYCIN and GUIDON ... Knowledge Typesvrijdag 24 februari 12
  • Knowledge Elicitation Repertory Grids Think Aloud Method Cardsorting MYCIN and GUIDON ... Knowledge Types CommonKADS Engineering Methodology Problem Solving Methods Domain Modelsvrijdag 24 februari 12
  • Knowledge Elicitation Repertory Grids Think Aloud Method Cardsorting MYCIN and GUIDON ... Knowledge Types CommonKADS Engineering Methodology Problem Solving Methods Domain Models Ontolingua “Explicit specification of a shared conceptualization” Sharing ontologiesvrijdag 24 februari 12
  • How to build the right ontology?vrijdag 24 februari 12
  • How to build the right Methodologies Middle Out Approach Documentation Ontology ontology? Identify Capture Purpose and Uschold & Gruninger Specify Scope Ontology METHONTOLOGY Guidelines Motivating Coding Evaluation KACTUS Scenarios Ontology Competency SENSUS Questions Integration (KA)2vrijdag 24 februari 12
  • How to build the right Methodologies Middle Out Approach Documentation Ontology ontology? Identify Capture Purpose and Uschold & Gruninger Specify Scope Ontology METHONTOLOGY Guidelines Motivating Coding Evaluation KACTUS Scenarios Ontology Competency SENSUS Questions Integration (KA)2 Top Ontology Ontology Types Representation Ontology Generic Ontology Top Foundation Core Ontology Generic Domain Domain Ontology Application Corevrijdag 24 februari 12
  • How to build the right Methodologies Middle Out Approach Documentation Ontology ontology? Identify Capture Purpose and Uschold & Gruninger Specify Scope Ontology METHONTOLOGY Guidelines Motivating Coding Evaluation KACTUS Scenarios Ontology Competency SENSUS Questions Integration (KA)2 Top Ontology Ontology Types Representation Ontology Generic Ontology Top Foundation Core Ontology Generic Domain Domain Ontology Application Core Principles OntoClean Ontology vs. Epistemologyvrijdag 24 februari 12
  • How to build the right Methodologies Middle Out Approach Documentation Ontology ontology? Identify Capture Purpose and Uschold & Gruninger Specify Scope Ontology METHONTOLOGY Guidelines Motivating Coding Evaluation KACTUS Scenarios Ontology Competency SENSUS Questions Integration (KA)2 Top Ontology Ontology Types Representation Ontology Generic Ontology Top Foundation Core Ontology Generic Domain Domain Ontology Application Core Principles OntoClean Ontology vs. Epistemology Ontology Reuse Merging & Alignment Modularization Ontology Design Patternsvrijdag 24 februari 12
  • Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/vrijdag 24 februari 12
  • Linked LOV User Slideshare tags2con Audio Feedback 2RDF delicious Moseley Scrobbler Bricklink Sussex Folk (DBTune) Reading St. GTAA Magna- Lists Andrews Klapp- tune stuhl- Resource NTU DB club Lists Resource Tropes Lotico Semantic yovisto John Music Man- Lists Music Tweet chester Hellenic Peel Brainz NDL (DBTune) (Data Brainz Reading subjects FBD (zitgist) Lists Open EUTC Incubator) Linked Hellenic Library Open t4gm Produc- Crunch- PD Surge RDF info tions Discogs base Library Radio Ontos Source Code Crime ohloh Plymouth (Talis) (Data News LEM Ecosystem Reading RAMEAU Reports business Incubator) Crime data.gov. Portal Linked Data Lists SH UK Music Jamendo (En- uk Brainz (DBtune) LinkedL Ox AKTing) FanHubz gnoss ntnusc (DBTune) SSW CCN Points Thesau- Last.FM Poké- Thesaur Popula- artists pédia Didactal us rus W LIBRIS tion (En- (DBTune) Last.FM ia theses. LCSH Rådata reegle research patents MARC AKTing) (rdfize) my fr nå! data.gov. data.go Codes Ren. NHS uk v.uk Good- Experi- Classical List Energy (En- win flickr ment (DB Pokedex Norwe- Genera- AKTing) Mortality BBC Family wrappr Sudoc PSH Tune) gian (En- tors Program MeSH AKTing) semantic mes BBC IdRef GND CO2 educatio OpenEI web.org SW Energy Sudoc ndlna Emission n.data.g Music Dog VIAF EEA (En- Chronic- Linked (En- ov.uk Portu- Food UB AKTing) ling Event MDB AKTing) guese Mann- Europeana BBC America Media DBpedia Calames heim Ord- Recht- Wildlife Deutsche Open Revyu DDC Openly spraak. Finder Bio- lobid Election nance legislation Local nl RDF graphie Resources NSZL Swedish Data Survey Tele- data Ulm EU New Book Project data.gov.uk graphis bnf.fr Catalog Open Insti- York Open Mashup Cultural tutions Times URI Greek P20 UK Post- Burner Calais Heritage codes DBpedia ECS Wiki statistics lobid GovWILD data.gov. Taxon iServe South- Organi- LOIUS BNB Brazilian uk Concept ECS ampton sations Geo World OS BibBase STW GESIS Poli- ESD South- ECS Names Fact- (RKB ticians stan- reference ampton data.gov.uk book Freebase Explorer) Budapest dards data.gov. NASA EPrints uk intervals Project OAI Lichfield transport (Data DBpedia data Guten- Pisa Spen- data.gov. Incu- dcs RESEX Scholaro- ISTAT ding bator) Fishes berg DBLP DBLP uk Geo meter Immi- Scotland of Texas (FU (L3S) Pupils & Uberblic DBLP gration Species Berlin) IRIT Exams Euro- dbpedia data- (RKB London TCM ACM stat lite open- Explorer) NVD Gazette (FUB) Gene IBM Traffic Geo ac-uk Scotland TWC LOGD Eurostat Daily DIT Linked UN/ Data UMBEL Med ERA Data LOCODE DEPLOY Gov.ie CORDIS YAGO New- lingvoj Disea- (RKB some SIDER RAE2001 castle LOCAH CORDIS Explorer) Linked Eurécom Eurostat Drug CiteSeer Roma (FUB) Sensor Data GovTrack (Ontology (Kno.e.sis) Open Bank Pfam Course- Central) riese Enipedia Cyc Lexvo LinkedCT ware Linked PDB UniProt VIVO EURES EDGAR dotAC US SEC Indiana ePrints IEEE (Ontology totl.net (rdfabout) Central) WordNet RISKS (VUA) Taxono UniProt US Census EUNIS Twarql HGNC Semantic Cornetto (Bio2RDF) (rdfabout) my VIVO FTS XBRL PRO- ProDom STITCH Cornell LAAS SITE KISTI NSF Scotland Geo- GeoWord LODE graphy Net WordNet WordNet JISC (W3C) (RKB Climbing Linked Affy- KEGG SMC Explorer) SISVU Pub VIVO UF Piedmont GeoData metrix Drug ECCO- Finnish Journals PubMed Gene SGD Chem Munici- Accomo- El AGROV Ontology TCP Media dations Alpine bible palities Viajero OC Ski ontology Tourism KEGG Ocean Austria Enzyme PBAC Geographic Metoffice GEMET ChEMBL Italian Drilling OMIM KEGG Weather Open public Codices AEMET Linked MGI Pathway schools Forecasts Data Open InterPro GeneID Publications EARTh Thesau- KEGG Turismo rus Colors Reaction de Zaragoza Product Smart KEGG User-generated content Weather DB Link Medi Glycan Janus Stations Product Care KEGG AMP UniParc UniRef UniSTS Government Types Italian Homolo Com- Yahoo! Airports Museums pound Ontology Google Gene Geo Art Planet National wrapper Chem2 Cross-domain Radio- Bio2RDF activity UniPath JP Sears Open Linked OGOLOD way Life sciences Corpo- Amster- Reactome dam medu- Open rates Numbers Museum cator As of September 2011Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/vrijdag 24 februari 12
  • Linked LOV User Slideshare tags2con Audio Feedback 2RDF delicious Moseley Scrobbler Bricklink Sussex Folk (DBTune) Reading St. GTAA Magna- Lists Andrews Klapp- tune stuhl- Resource NTU DB club Lists Resource Tropes Lotico Semantic yovisto John Music Man- Lists Music Tweet chester Hellenic Peel Brainz NDL (DBTune) (Data Brainz Reading subjects FBD (zitgist) Lists Open EUTC Incubator) Linked Hellenic Library Open t4gm Produc- Crunch- PD Surge RDF info tions Discogs base Library Radio Ontos Source Code Crime ohloh Plymouth (Talis) (Data News LEM Ecosystem RAMEAU Reports Crime 400 business data.gov. Incubator) Music Jamendo Portal Linked Data Reading Lists SH UK (En- uk Brainz (DBtune) LinkedL Ox AKTing) FanHubz gnoss ntnusc (DBTune) SSW CCN Points Thesau- Last.FM Poké- Thesaur Popula- artists pédia Didactal us rus W LIBRIS tion (En- (DBTune) Last.FM ia theses. LCSH Rådata reegle research patents MARC AKTing) (rdfize) my fr nå! data.gov. data.go Codes Ren. NHS uk v.uk Good- Experi- Classical List Energy (En- win flickr ment (DB Pokedex Norwe- Genera- AKTing) Mortality BBC Family wrappr Sudoc PSH Tune) gian (En- tors Program MeSH AKTing) Energy CO2 300 educatio OpenEI mes BBC semantic web.org SW IdRef Sudoc ndlna GND Emission n.data.g Music Dog VIAF EEA (En- Chronic- Linked (En- ov.uk Portu- Food UB AKTing) ling Event MDB AKTing) guese Mann- Europeana BBC America Media DBpedia Calames heim Ord- Recht- Wildlife Deutsche Open Revyu DDC Openly spraak. Finder Bio- lobid Election nance legislation Local nl RDF graphie Resources NSZL Swedish Data Survey Tele- data Ulm EU New Book Project data.gov.uk graphis bnf.fr Catalog Open Insti- York Open Mashup Cultural tutions Times URI Greek P20 UK Post- Burner Calais Heritage codes DBpedia ECS Wiki statistics lobid GovWILD uk 200 data.gov. LOIUS Taxon Concept iServe BNB South- ampton Organi- Brazilian Geo World BibBase ECS sations STW GESIS OS South- ECS Poli- ESD Names Fact- ampton (RKB ticians stan- reference book Budapest data.gov.uk Freebase EPrints Explorer) dards data.gov. NASA uk intervals Project OAI Lichfield transport (Data DBpedia data Guten- Pisa Spen- data.gov. Incu- dcs RESEX Scholaro- ISTAT ding bator) Fishes berg DBLP DBLP uk Geo meter Immi- Scotland of Texas (FU (L3S) Pupils & Uberblic DBLP gration Species Berlin) IRIT Exams Euro- dbpedia data- (RKB London TCM ACM stat lite open- Explorer) NVD Gazette (FUB) Gene IBM Traffic Geo ac-uk Data Scotland TWC LOGD 100 Eurostat Linked UMBEL Daily Med DIT ERA UN/ Data LOCODE DEPLOY Gov.ie CORDIS YAGO New- lingvoj Disea- (RKB some SIDER RAE2001 castle LOCAH CORDIS Explorer) Linked Eurécom Eurostat Drug CiteSeer Roma (FUB) Sensor Data GovTrack (Ontology (Kno.e.sis) Open Bank Pfam Course- Central) riese Enipedia Cyc Lexvo LinkedCT ware Linked PDB UniProt VIVO EURES EDGAR dotAC US SEC Indiana ePrints IEEE (Ontology totl.net (rdfabout) Central) WordNet RISKS UniProt Semantic US Census (rdfabout) 0 EUNIS Twarql (VUA) Cornetto Taxono my (Bio2RDF) HGNC VIVO 1 mei 2007 8 okt. 2007 7 nov. 2007 10 nov. 2007 28 feb. 2008 31 mrt. 2008 18 sep. 2008 5 mrt. 2009 27 mrt. 2009 14 jul. 2009 22 sep. 2010 19 sep. 2011 23 feb. 2012 FTS XBRL PRO- ProDom STITCH Cornell LAAS SITE KISTI NSF Scotland Geo- GeoWord LODE graphy Net WordNet WordNet JISC (W3C) (RKB Climbing Linked Affy- KEGG SMC Explorer) SISVU Pub VIVO UF Piedmont GeoData metrix Drug ECCO- Finnish Journals PubMed Gene SGD Chem Munici- Accomo- El AGROV Ontology TCP Media dations Alpine bible palities Viajero OC Ski ontology Tourism KEGG Ocean Austria Enzyme PBAC Geographic Metoffice GEMET ChEMBL Italian Drilling OMIM KEGG Weather Open public Codices AEMET Linked MGI Pathway schools Forecasts Data Open InterPro GeneID Publications EARTh Thesau- KEGG Turismo rus Colors Reaction de Zaragoza Product Smart KEGG User-generated content Weather DB Link Medi Glycan Janus Stations Product Care KEGG AMP UniParc UniRef UniSTS Government Types Italian Homolo Com- Yahoo! Airports Museums pound Ontology Google Gene Geo Art Planet National wrapper Chem2 Cross-domain Radio- Bio2RDF activity UniPath JP Sears Open Linked OGOLOD way Life sciences Corpo- Amster- Reactome dam medu- Open rates Numbers Museum cator As of September 2011Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/vrijdag 24 februari 12
  • Performance 2009: WebPIE 700 We are here!! 630 560 490 Throughput (Ktriples/sec) 420 BigOWLIM 350 Oracle 11g DAML DB 280 BigData WebPIE 210 140 70 0 0 10 20 30 40 50 60 70 80 90 100 Input size (Billions of statements) Monday 10 May 2010 Urbani J., Kotoulas, S., Maaseen J., van Harmelen, F. & Bal, H. (2010), OWL reasoning with WebPIE: calculating the closure of 100 billion triples, In Proceedings of ESWC 2010vrijdag 24 februari 12
  • Performance 2009: WebPIE 700 We are here!! 630 560 490 Throughput (Ktriples/sec) 420 2011: QueryPIE 350 Backward-chaining inference at query-time, over 1B triples, BigOWLIM Oracle 11g DAML DB 280 in milliseconds, on just 8 parallel machines. BigData WebPIE 210 Pre-computation in 8-300sec against 1-3 hours in WebPIE 140 70 0 0 10 20 30 40 50 60 70 80 90 100 Input size (Billions of statements) Monday 10 May 2010 Urbani J., Kotoulas, S., Maaseen J., van Harmelen, F. & Bal, H. (2010), OWL reasoning with WebPIE: calculating the closure of 100 billion triples, In Proceedings of ESWC 2010vrijdag 24 februari 12
  • Terminological Dataset Size Full Closure Ratio Closure FactForge 862M 89 sec 2h45min 1:111 LinkedLifeData 649M 332 sec 1h5min 1:11 LUBM 1.1B 8 sec 1h15min 1:562 QueryPIE: Backward reasoning for OWL Horst over very large knowledge bases. Urbani, J.; Harmelen, F. van ; Schlobach, S.; and Bal, H. 2011. In Proceedings of ISWC 2011, Volume 5823, 730-745, Springer.vrijdag 24 februari 12
  • Knowledge Knowledge Knowledge Executable Models Task Independent “Semantic” Data Domain Knowledge Knowledge Sharing Knowledge Sharing Knowledge Sharing Reusable System Ontology Reuse Data Interoperability Components 1980 - 1995 1995 - 2005 2005 - nowvrijdag 24 februari 12
  • Knowledge Reengineering Bottleneck The difficulty of the correct and continuous use of preexisting knowledge for a new taskvrijdag 24 februari 12
  • Challenge 1 Data Dependencyvrijdag 24 februari 12
  • Ontology Alignment Evaluation Initiativevrijdag 24 februari 12
  • Design Patternsvrijdag 24 februari 12
  • “Data” Driven Knowledge Engineering 18/02/2012 Semantic Web cube www.w3.org/Icons/SW/sw-cube-v.svg 1/2vrijdag 24 februari 12
  • “Data” Driven Knowledge Engineering 18/02/2012 Semantic Web cube www.w3.org/Icons/SW/sw-cube-v.svg 1/2vrijdag 24 februari 12
  • Challenge 2 Complexityvrijdag 24 februari 12
  • Neats Scruffies VSvrijdag 24 februari 12
  • Reuse Use Use = 1 - Reuse Slide by Frank van Harmelen, ISWC 2011 Keynote, http://www.cs.vu.nl/~frankh/spool/ISWC2011Keynote/vrijdag 24 februari 12
  • Challenge 3 Limited Controlvrijdag 24 februari 12
  • Data is Dirtyvrijdag 24 februari 12
  • Data is Dirty Verbosevrijdag 24 februari 12
  • Data is Dirty Verbose Inconsistentvrijdag 24 februari 12
  • Data is Dirty Verbose Inconsistent Redundantvrijdag 24 februari 12
  • Data is Dirty Verbose Inconsistent Redundant Disconnectedvrijdag 24 februari 12
  • Data is Dirty Verbose Inconsistent Redundant Disconnected Stalevrijdag 24 februari 12
  • vrijdag 24 februari 12
  • Semantically-Interlinked Online Communitiesvrijdag 24 februari 12
  • Semantically-Interlinked Online Communitiesvrijdag 24 februari 12
  • Pedantic Web http://pedantic-web.orgvrijdag 24 februari 12
  • LODStats http://stats.lod2.euvrijdag 24 februari 12
  • 40.745.554.078 Triples!vrijdag 24 februari 12
  • 40.745.554.078 Triples! (1.6 Billion)vrijdag 24 februari 12
  • Open PHACTS LOD Around the Clock Data2Semanticsvrijdag 24 februari 12
  • Challenge 4 Increasing Importancevrijdag 24 februari 12
  • Semantic Web Good News Quiz http://slideshare.net/Frank.van.Harmelen/semantic-web-good-newsvrijdag 24 februari 12
  • Semantic Web Good News Quiz http://slideshare.net/Frank.van.Harmelen/semantic-web-good-newsvrijdag 24 februari 12
  • ‣ New stakeholders ‣ No more fooling around ‣ Scary stuff...?vrijdag 24 februari 12
  • ‣ Bridging the development gap ‣ Data publishing licenses ‣ Access policies ‣ Attribution ‣ “Data Hoarding”vrijdag 24 februari 12
  • ‣ The lack of adequate and appropriate hardware V ‣ Lack of cumulation of AI methods and techniques V ‣ Shortage of trained knowledge engineers ? ‣ The problem of knowledge acquisition ? ‣ The development gap ? The Knowledge Reengineering Bottleneckvrijdag 24 februari 12
  • http://www.webont.org/owled/2012 @OWLED2012Workshvrijdag 24 februari 12