SlideShare a Scribd company logo
1 of 41
in collaboration with  Georgiana Ifrim, Gjergji Kasneci, Josiane Parreira, Maya Ramanath,  Ralf Schenkel, Fabian Suchanek, Martin Theobald
DB and IR: Two Parallel Universes canonical  application: accounting libraries data type: numbers, short strings text foundation: algebraic / logic based probabilistic / statistics based search paradigm: Boolean retrieval (exact queries, result sets/bags) ranked retrieval (vague queries, result lists) Database Systems Information Retrieval market leaders: Oracle, IBM DB2, MS SQL Server, etc. Google, Yahoo!, MSN, Verity, Fast, etc. parallel universes forever ?
Why DB&IR Now? – Application Needs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Simplify life for application areas like: Typical data: Disease (DId, Name,  Category , Pathogen …)   UMLS-Categories ( … ) Patient (… Age, HId, Date,  Report , TreatedDId)  Hospital (HId,  Address  …) Typical query:  symptoms of  tropical virus diseases  and  reported anomalies with young patients in  central Europe  in the last two weeks
Why DB&IR Now? – Platform Desiderata Structured data (records) Unstructured data (documents) Unstructured search (keywords) Structured search (SQL,XQuery) DB Systems IR Systems Search Engines Keyword Search on Relational Graphs (IIT Bombay, UCSD, MSR, Hebrew U, CU Hong Kong, Duke U, ...) Querying entities & relations from IE (MSR Beijing, UW Seattle, IBM Almaden, UIUC, MPI, … ) Platform desiderata (from app developer‘s viewpoint): ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Integrated DB&IR Platform
Why DB&IR Forever? Turn the Web, Web2.0, and Web3.0 into the world‘s  most comprehensive  knowledge base  („ semantic DB “) !   ,[object Object],[object Object],[object Object],  2000   2007 indexed Web  2 Bio.   20 Bio. Flickr photos   ---   100 Mio. digital photos   ?   150 Bio.  Wikipedia  8 000   1.8 Mio. OECD researchers  7.4 Mio.   8.4 Mio. patents world-wide   ?  60 Mio. US Library of Congres   115 Mio.   134 Mio. Google Scholar   ---   500 Mio.
Outline • Past • Future • Present : Matter, Antimatter, and Wormholes  : From Data to Knowledge : XML and Graph IR
Parallel Universes: A Closer Look Matter Antimatter ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DB IR 1990 1995 2000 2005 VAGUE (Motro) Proximal Nodes (Baeza-Yates et al.) WHIRL (Cohen) Prob. Datalog (Fuhr et al.) INEX XPath XPath Full-Text Prob. DB (Cavallo&Pittarelli) Prob. Tuples (Barbara et al.) Web Entity Search: Libra, Avatar, ExDB … Faceted Search: Flamenco … 1st Gen. XML IR: XXL, XIRQL, Elixir, JuruXML Multimedia IR Web Query Languages: W3QS, WebOQL, Araneus … Semistructured Data:  Lore, Xyleme … 2nd Gen. XML IR: XRank,Timber, TIJAH, XSearch, FleXPath, CoXML, TopX, MarkLogic, Fast … Uncertain & Prob. Relations: Mystiq, Trio … Struct. Docs Deep Web Search Digital Libraries Graph IR
WHIRL: IR over Relations  [W.W. Cohen: SIGMOD’98] Add text-similarity selection and join to relational algebra Example:  Select * From Movies M, Reviews R  Where M.Plot  ~   ” fight“ And M.Year > 1990 And R.Rating > 3 And M.Title  ~  R.Title And M.Plot  ~  R.Comment Title  Plot  …  Year Movies Title  Comment  …  Rating Reviews Matrix Hero Matrix 1 Matrix Reloaded Matrix Eigenvalues Ying xiong aka. Hero Shrek 2 …  matrix spectrum  …  orthonormal …  …  fight for peace … …  sword fight …  dramatic colors … …  In ancient China …  fights  …  sword fight … fights Broken Sword … In the near future …  computer hacker Neo … …  fight training … …  cool fights … new techniques … …  fights … and more fights … …  fairly boring … 1999 2002 2004 In Far Far Away … our lovely hero fights with cat killer … 4 1 5 5 ,[object Object],[object Object],[object Object],Scoring and ranking: s (<x,y>, q: A~B) = cosine (x.A, y.B)  s (<x,y>, q 1     …    q m ) =  x j  ~  tf  (word j in x)     idf  (word j) with dampening & normalization
XXL: Early XML IR  [Anja Theobald, GW: Adding Relevance toXML, WebDB’00] Which professors  from Saarbruecken (SB) are teaching IR and have research projects on XML? Union of  heterogeneous  sources  without global schema   Similarity-aware XPath: // ~ Professor   [//* =  ” ~ SB“] [ // ~ Course  [//* = ” ~ IR“]  ] [ // ~ Research  [//* =  ” ~ XML“]   ] Similarity-aware XPath: // ~ Professor   [//* =  ” ~ SB“] [ // ~ Course  [//* = ” ~ IR“]  ] [ // ~ Research  [//* =  ” ~ XML“]   ] Professor Name : Gerhard Weikum Address ... City : SB Country :  Germany Teaching Research   Course Title :  IR Description :  Information  retrieval ... Syllabus ... Book Article ... ... Project Title :  Intelligent Search of Heterogeneous XML Data Funding : EU ... Name : Ralf Schenkel Lecturer Address: Max-Planck Institute for Informatics, Germany Activities Seminar Contents: Ranked  retrieval … Literature:  … Scientific Name: INEX task coordinator (Initiative for the  Evaluation of XML …) Other Sponsor:  EU …
XXL: Early XML IR  [Anja Theobald, GW: Adding Relevance toXML, WebDB’00] ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Similarity-aware XPath: // ~ Professor   [//* =  ” ~ Saarbruecken“] [ // ~ Course  [//* = ” ~ IR“]  ] [ // ~ Research  [//* =  ” ~ XML“]   ] Which professors  from Saarbruecken (SB) are teaching IR and have research projects on XML? Motivation: Union of heterogeneous sources has no schema  Professor Name : Gerhard Weikum Address ... City : SB Country :  Germany Teaching Research   Course Title :  IR Description :  Information  retrieval ... Syllabus ... Book Article ... ... Project Title :  Intelligent Search of Heterogeneous XML Data Funding : EU ... Name : Ralf Schenkel Lecturer Address: Max-Planck Institute for Informatics, Germany Activities Seminar Contents: Ranked  retrieval … Literature:  … Scientific Name: INEX task coordinator (Initiative for the  Evaluation of XML …) Other Sponsor:  EU … Wu&Palmer: |path| through lca(x,y) Dice coeff.: 2 #(x,y) / (#x + #y) on Web query expansion model: disjunction of tags magician wizard intellectual artist alchemist director primadonna professor teacher scholar academic, academician, faculty member scientist researcher HYPONYM (0.749) investigator mentor RELATED (0.48) lecturer
The Past: Lessons Learned  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],precision recall //  ~Professor [...] //  { Professor, Researcher,  Lecturer, Scientist,  Scholar, Academic, ... }[...] element gold produce Golden Delicious entity food substance solid edible fruit apple pome
Outline  Past • Future • Present : Matter, Antimatter, and Wormholes  : From Data to Knowledge : XML and Graph IR
TopX: 2nd Generation XML IR ” Semantic“ XPath Full-Text query:  / Article  [ftcontains(// Person ,  ” Max Planck“)] [ftcontains(// Work ,  ” quantum physics“)] // Children [@ Gender  =  ” female“]// Birthdates supported by  TopX  engine:  http://infao5501.ag5.mpi-sb.mpg.de:8080/topx/ http://topx.sourceforge.net ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[Martin Theobald, Ralf Schenkel, GW: VLDB’05, VLDB Journal]
Commercial Break [Martin Theobald, Ralf Schenkel, GW: VLDB’95] TopX demo  today 3:30 – 5:30
Principled Ranking by Probabilistic IR odds for item d with terms d i  being relevant for  query q = {q 1 , …, q m } binary features, conditional independence of features [Robertson & Sparck-Jones 1976] ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],„ God does not play dice.“ (Einstein) IR does. with related to but different from statistical language models  Relationship to tf*idf ,[object Object],[object Object],[object Object]
Probabilistic Ranking for SQL SQL queries that return  many answers  need ranking ,[object Object],[object Object],[object Object],[object Object],[object Object],odds for tuple d with attributes X  Y  relevant for  query  q: X 1 =x 1    …     X m =x m Estimate prob‘s, exploiting  workload  W: [S. Chaudhuri, G. Das, V. Hristidis, GW: TODS‘06] ,[object Object],[object Object],[object Object],[object Object]
From Tables and Trees to Graphs Example:  Conferences (CId, Title, Location, Year) Journals (JId, Title) CPublications (PId, Title, CId) JPublications (PId, Title, Vol, No, Year)  Authors (PId, Person) Editors (CId, Person) Select * From * Where * Contains  ” Gray, DeWitt, XML, Performance “  And Year > 95 Schema-agnostic  keyword search  over  multiple tables : graph of tuples with foreign-key relationships as edges  [BANKS, Discover, DBExplorer, KUPS, SphereSearch, BLINKS] Result is  connected tree  with nodes that contain  as many query keywords as possible Ranking:  with  nodeScore  based on tf*idf or prob. IR and  edgeScore  reflecting importance of relationships (or confidence, authority, etc.) ,[object Object],[object Object],[object Object],[object Object],[object Object],Top-k querying:  compute best trees, e.g. Steiner trees (NP-hard)
The Present: Observations & Opportunities ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],actor movie movie plot director movie actor actor director plot ” life physicist Max Planck“ //article[//person ”Max Planck“] [//category ”physicist“] //biography
Outline  Past • Future  Present : Matter, Antimatter, and Wormholes  : From Data to Knowledge : XML and Graph IR
Knowledge Queries  Nobel laureate who survived both world wars and his children drama with three women making a prophecy  to a British nobleman that he will become king proteins that inhibit both protease and some other enzyme connection between Thomas Mann and Goethe differences in Rembetiko music from Greece and from Turkey neutron stars with Xray bursts > 10 40  erg s -1  & black holes in 10‘‘  market impact of Web2.0 technology in December 2006  sympathy or antipathy for Germany from May to August 2006 Turn the Web, Web2.0, and Web3.0 into the world‘s  most comprehensive  knowledge base  („ semantic DB “) !  Answer „knowledge queries“ such as:
Three Roads to Knowledge ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
High-Quality Knowledge Sources ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],growing with strong momentum
High-Quality Knowledge Sources General-purpose  thesauri  and concept networks:  WordNet  family enzyme  -- (any of several complex proteins that are produced by cells and  act as catalysts in specific biochemical reactions) =>  protein  -- (any of a large group of nitrogenous organic compounds  that are essential constituents of living cells; ...) => macromolecule, supermolecule  ... =>  organic compound  -- (any compound of carbon  and another element or a radical) ...  =>  catalyst, accelerator  -- ((chemistry) a substance that initiates or  accelerates a chemical reaction  without itself being affected) =>  activator  -- ((biology) any agency bringing about activation; ...) ,[object Object],[object Object],[object Object],[object Object]
High-Quality Knowledge Sources Wikipedia  and other lexical sources
Exploit Hand-Crafted Knowledge {{Infobox_Scientist | name = Max Planck | birth_date = [[April 23]], [[1858]]  | birth_place = [[Kiel]], [[Germany]] | death_date = [[October 4]], [[1947]] | death_place = [[Göttingen]], [[Germany]] | residence = [[Germany]]  | nationality = [[Germany|German]]  | field = [[Physicist]] | work_institution = [[University of Kiel]]</br>  [[Humboldt-Universität zu Berlin]]</br> [[Georg-August-Universität Göttingen]] | alma_mater = [[Ludwig-Maximilians-Universität München]] | doctoral_advisor = [[Philipp von Jolly]] | doctoral_students =  [[Gustav Ludwig Hertz]]</br> …  | known_for  = [[Planck's constant]],  [[Quantum mechanics|quantum theory]] | prizes =  [[Nobel Prize in Physics]] (1918) … Wikipedia, WordNet,  and other lexical sources
YAGO: Yet Another Great Ontology [F. Suchanek, G. Kasneci, GW: WWW 2007] ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],entity1 entity2 relation Max_Planck Kiel bornIn Kiel City isInstanceOf Examples:
YAGO Knowledge Representation Entity Max_Planck April 23, 1858 Person City Country subclass Location subclass instanceOf subclass subclass bornOn “ Max Planck” means “ Dr. Planck” means subclass October 4, 1947 diedOn Kiel bornIn Nobel Prize Erwin_Planck FatherOf hasWon Scientist means “ Max Karl Ernst Ludwig Planck” Physicist instanceOf subclass Biologist subclass concepts individuals words Online access and download at  http://www.mpi-inf.mpg.de/~suchanek/yago/   Accuracy: 97% Knowledge Base  # Facts KnowItAll   30 000 SUMO   60 000 WordNet   200 000 OpenCyc   300 000 Cyc    5 000 000 YAGO   6 000 000
NAGA: Graph IR on YAGO  [G. Kasneci et al.: WWW‘07] queries with regular expressions Ling $x scientist isa hasFirstName | hasLastName $y Zhejiang locatedIn * worksFor conjunctive queries Beng Chin Ooi (coAuthor | advisor) * Kiel $x scientist isa bornIn Graph-based search on YAGO-style knowledge bases  with built-in  ranking  based on  confidence  and  informativeness    statistical language model for result graphs
Ranking Factors ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],bornIn (Max Planck, Kiel)  from „ Max Planck was born in Kiel“ (Wikipedia) livesIn (Elvis Presley, Mars)  from „ They believe Elvis hides on Mars“ (Martian Bloggeria) q: isa (Einstein, $y) isa (Einstein, scientist) isa (Einstein, vegetarian) q: isa ($x, vegetarian) isa (Einstein, vegetarian) isa (Al Nobody, vegetarian) Einstein vegetarian Bohr Nobel Prize Tom Cruise 1962 isa isa bornIn diedIn won won
Information Extraction (IE): Text to Records combine NLP, pattern matching, lexicons, statistical learning Max Planck  4/23, 1858  Kiel Albert Einstein  3/14, 1879  Ulm  Mahatma Gandhi 10/2, 1869  Porbandar Person  BirthDate  BirthPlace  ... Person  ScientificResult Max Planck Quantum Theory Person  Collaborator Max Planck  Albert Einstein Max Planck  Niels Bohr Planck‘s constant  6.226  10 23   Js Constant  Value  Dimension
Knowledge Acquisition from the Web ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Existing approaches and tools (Snowball [Gravano et al. 2000], KnowItAll [Etzioni et al. 2004], …): almost-unsupervised pattern matching and learning: seeds (known facts)    patterns (in text)    (extraction) rule    (new) facts
Methods for Web-Scale Fact Extration city(Beijing)   plays(Coltrane, sax)   city(Beijing)   old center of Beijing plays(Coltrane, sax)   sax player Coltrane city(Beijing)   old center of Beijing old center of X plays(Coltrane, sax)   sax player Coltrane Y player X Example: city (Seattle)  in downtown Seattle  city (Seattle)  Seattle and other towns  city (Las Vegas)   Las Vegas and other towns plays (Zappa, guitar)  playing guitar: … Zappa plays (Davis, trumpet)  Davis … blows trumpet seeds     text       rules     new facts  Example: city (Seattle)  in downtown Seattle  in downtown X city (Seattle)  Seattle and other towns  X and other towns city (Las Vegas)   Las Vegas and other towns X and other towns plays (Zappa, guitar)  playing guitar: … Zappa playing Y: … X plays (Davis, trumpet)  Davis … blows trumpet X … blows Y Example: city (Seattle)  in downtown Seattle  in downtown X city (Seattle)  Seattle and other towns  X and other towns city (Las Vegas)   Las Vegas and other towns  X and other towns plays (Zappa, guitar)  playing guitar: … Zappa playing Y: … X plays (Davis, trumpet)  Davis … blows trumpet X … blows Y Example: city (Seattle)  in downtown Seattle   in downtown X city (Seattle)  Seattle and other towns   X and other towns city (Las Vegas)    Las Vegas and other towns X and other towns plays (Zappa, guitar)  playing guitar: … Zappa playing Y: … X plays (Davis, trumpet)  Davis … blows trumpet X … blows Y   in downtown Beijing city(Beijing)   Coltrane blows sax plays(C., sax) Assessment of facts & generation of rules based on statistics Rules can be more sophisticated:  playing NN: (ADJ|ADV)* NP & class(NN)=instrument & class(head(NP))=person     plays(head(NP), NN)
Performance of Web-IE State-of-the-art precision/recall results: Anecdotic evidence: invented (A.G. Bell, telephone) married (Hillary Clinton, Bill Clinton) isa (yoga, relaxation technique) isa ( zearalenone, mycotoxin) contains (chocolate,  theobromine) contains (Singapore sling, gin) invented (Johannes Kepler, logarithm tables) married (Segolene Royal, Francois Hollande) isa (yoga, excellent way) isa (your day, good one) contains (chocolate, raisins) plays (the liver, central role) makes (everybody, mistakes) relation precision  recall   corpus  systems countries 80%   90%   Web  KnowItAll cities 80%  ???   Web  KnowItAll scientists 60%   ???   Web KnowItAll headquarters 90%   50%   News  Snowball, LEILA birthdates 80%   70%   Wikipedia  LEILA instanceOf 40%   20%   Web Text2Onto, LEILA Open IE 80%   ???   Web TextRunner precision value-chain: entities 80%, attributes 70%, facts 60%, events 50%
Beyond Surface Learning with LEILA Almost-unsupervised Statistical Learning with Dependency Parsing Limitation of surface patterns: who discovered or invented what “ Tesla ’s work formed the basis of  AC electric power ”  Learning to Extract Information by Linguistic Analysis [F.Suchanek, G.Ifrim, GW: KDD‘06] ,[object Object],[object Object],[object Object],[object Object],“ Al Gore  funded more work for a better basis of the  Internet ” (Cologne, Rhine), (Cairo, Nile), …  (Cairo, Rhine), (Rome, 0911), (  ,   [0..9]*  ), … Paris  was founded on an island in the  Seine (Paris, Seine)  Ss Pv MVp Ds Js DG Js MVp NP VP VP PP NP NP PP NP NP Cologne  lies on the banks of the  Rhine Ss MVp DMc Mp Dg Js Jp NP PP VP NP PP NP NP NP People in  Cairo  like wine from the  Rhine  valley Mp Js Os Sp Mvp Ds Js AN NP NP PP VP PP NP NP NP NP
IE Efficiency and Accuracy Tradeoffs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],IE is cool, but what‘s in it for DB folks? [see also tutorials by Cohen, Doan/Ramakrishnan/Vaithyanathan, Agichtein/Sarawagi]
The Future: Challenges ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Outline  Past  Future  Present : Matter, Antimatter, and Wormholes  : From Data to Knowledge : XML and Graph IR
Major Trends in DB and IR malleable schema (later) deep NLP, adding structure record linkage info extraction graph mining entity-relationship graph IR  ontologies ranking Database Systems Information Retrieval statistical language models data uncertainty programmability search as Web Service dataspaces Web objects Web 2.0 Web 2.0
Conclusion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DB&IR:  Both Sides Now ,[object Object],[object Object],[object Object],Thank You ! DB&IR

More Related Content

What's hot

Instance-Based Ontological Knowledge Acquisition
Instance-Based Ontological Knowledge AcquisitionInstance-Based Ontological Knowledge Acquisition
Instance-Based Ontological Knowledge AcquisitionLihua Zhao
 
Mid-Ontology Learning from Linked Data @JIST2011
Mid-Ontology Learning from Linked Data @JIST2011Mid-Ontology Learning from Linked Data @JIST2011
Mid-Ontology Learning from Linked Data @JIST2011Lihua Zhao
 
Automatic Metadata Generation using Associative Networks
Automatic Metadata Generation using Associative NetworksAutomatic Metadata Generation using Associative Networks
Automatic Metadata Generation using Associative NetworksMarko Rodriguez
 
SSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open DataSSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open DataPolytechnic University of Bari
 
A Model of the Scholarly Community
A Model of the Scholarly CommunityA Model of the Scholarly Community
A Model of the Scholarly CommunityMarko Rodriguez
 
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...Jennifer D'Souza
 
(Semi-)Automatic analysis of online contents
(Semi-)Automatic analysis of online contents(Semi-)Automatic analysis of online contents
(Semi-)Automatic analysis of online contentsSteffen Staab
 
Programming with Semantic Broad Data
Programming with Semantic Broad DataProgramming with Semantic Broad Data
Programming with Semantic Broad DataSteffen Staab
 
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Polytechnic University of Bari
 
Deriving human readable labels from sparql queries
Deriving human readable labels from sparql queries Deriving human readable labels from sparql queries
Deriving human readable labels from sparql queries Basil Ell
 
Perspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from textPerspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from textJennifer D'Souza
 
Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1ErhardRahm
 
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Daniel Valcarce
 
Multidimensioal database
Multidimensioal  databaseMultidimensioal  database
Multidimensioal databaseTPO TPO
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information RetrievalDustin Smith
 

What's hot (17)

Instance-Based Ontological Knowledge Acquisition
Instance-Based Ontological Knowledge AcquisitionInstance-Based Ontological Knowledge Acquisition
Instance-Based Ontological Knowledge Acquisition
 
Mid-Ontology Learning from Linked Data @JIST2011
Mid-Ontology Learning from Linked Data @JIST2011Mid-Ontology Learning from Linked Data @JIST2011
Mid-Ontology Learning from Linked Data @JIST2011
 
Automatic Metadata Generation using Associative Networks
Automatic Metadata Generation using Associative NetworksAutomatic Metadata Generation using Associative Networks
Automatic Metadata Generation using Associative Networks
 
SSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open DataSSSW 2013 - Feeding Recommender Systems with Linked Open Data
SSSW 2013 - Feeding Recommender Systems with Linked Open Data
 
A Model of the Scholarly Community
A Model of the Scholarly CommunityA Model of the Scholarly Community
A Model of the Scholarly Community
 
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
 
(Semi-)Automatic analysis of online contents
(Semi-)Automatic analysis of online contents(Semi-)Automatic analysis of online contents
(Semi-)Automatic analysis of online contents
 
Programming with Semantic Broad Data
Programming with Semantic Broad DataProgramming with Semantic Broad Data
Programming with Semantic Broad Data
 
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
Tutorial - Recommender systems meet linked open data - ICWE 2016 - Lugano - 0...
 
Deriving human readable labels from sparql queries
Deriving human readable labels from sparql queries Deriving human readable labels from sparql queries
Deriving human readable labels from sparql queries
 
Perspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from textPerspectives on mining knowledge graphs from text
Perspectives on mining knowledge graphs from text
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1
 
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
 
Multidimensioal database
Multidimensioal  databaseMultidimensioal  database
Multidimensioal database
 
Heuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQLHeuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQL
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 

Similar to DB and IR Integration

osm.cs.byu.edu
osm.cs.byu.eduosm.cs.byu.edu
osm.cs.byu.edubutest
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webFabien Gandon
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptbutest
 
download
downloaddownload
downloadbutest
 
download
downloaddownload
downloadbutest
 
Aggregation for searching complex information spaces
Aggregation for searching complex information spacesAggregation for searching complex information spaces
Aggregation for searching complex information spacesMounia Lalmas-Roelleke
 
GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003butest
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Netgramana
 
Make your data great again - Ver 2
Make your data great again - Ver 2Make your data great again - Ver 2
Make your data great again - Ver 2Daniel JACOB
 
The Nature of Information
The Nature of InformationThe Nature of Information
The Nature of InformationAdrian Paschke
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Takeshi Morita
 
Tracing Networks: Ontology-based Software in a Nutshell
Tracing Networks: Ontology-based Software in a NutshellTracing Networks: Ontology-based Software in a Nutshell
Tracing Networks: Ontology-based Software in a NutshellTracingNetworks
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk
 
2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinalDeborah McGuinness
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemTrey Grainger
 
Multi-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsMulti-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsJiaheng Lu
 
Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...
Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...
Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...kevig
 
Searching Heterogenous E Learning Resources
Searching Heterogenous E Learning ResourcesSearching Heterogenous E Learning Resources
Searching Heterogenous E Learning Resourcesimranlatif
 

Similar to DB and IR Integration (20)

osm.cs.byu.edu
osm.cs.byu.eduosm.cs.byu.edu
osm.cs.byu.edu
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
download
downloaddownload
download
 
download
downloaddownload
download
 
Aggregation for searching complex information spaces
Aggregation for searching complex information spacesAggregation for searching complex information spaces
Aggregation for searching complex information spaces
 
GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003GATE, HLT and Machine Learning, Sheffield, July 2003
GATE, HLT and Machine Learning, Sheffield, July 2003
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Search Me: Using Lucene.Net
Search Me: Using Lucene.NetSearch Me: Using Lucene.Net
Search Me: Using Lucene.Net
 
1645 track 2 pafka
1645 track 2 pafka1645 track 2 pafka
1645 track 2 pafka
 
Make your data great again - Ver 2
Make your data great again - Ver 2Make your data great again - Ver 2
Make your data great again - Ver 2
 
The Nature of Information
The Nature of InformationThe Nature of Information
The Nature of Information
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...
 
Tracing Networks: Ontology-based Software in a Nutshell
Tracing Networks: Ontology-based Software in a NutshellTracing Networks: Ontology-based Software in a Nutshell
Tracing Networks: Ontology-based Software in a Nutshell
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
 
Multi-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsMulti-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing Paradigms
 
Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...
Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...
Top 5 MOST VIEWED LANGUAGE COMPUTING ARTICLE - International Journal on Natur...
 
Searching Heterogenous E Learning Resources
Searching Heterogenous E Learning ResourcesSearching Heterogenous E Learning Resources
Searching Heterogenous E Learning Resources
 

Recently uploaded

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 

Recently uploaded (20)

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 

DB and IR Integration

  • 1. in collaboration with Georgiana Ifrim, Gjergji Kasneci, Josiane Parreira, Maya Ramanath, Ralf Schenkel, Fabian Suchanek, Martin Theobald
  • 2. DB and IR: Two Parallel Universes canonical application: accounting libraries data type: numbers, short strings text foundation: algebraic / logic based probabilistic / statistics based search paradigm: Boolean retrieval (exact queries, result sets/bags) ranked retrieval (vague queries, result lists) Database Systems Information Retrieval market leaders: Oracle, IBM DB2, MS SQL Server, etc. Google, Yahoo!, MSN, Verity, Fast, etc. parallel universes forever ?
  • 3.
  • 4.
  • 5.
  • 6. Outline • Past • Future • Present : Matter, Antimatter, and Wormholes : From Data to Knowledge : XML and Graph IR
  • 7.
  • 8. DB IR 1990 1995 2000 2005 VAGUE (Motro) Proximal Nodes (Baeza-Yates et al.) WHIRL (Cohen) Prob. Datalog (Fuhr et al.) INEX XPath XPath Full-Text Prob. DB (Cavallo&Pittarelli) Prob. Tuples (Barbara et al.) Web Entity Search: Libra, Avatar, ExDB … Faceted Search: Flamenco … 1st Gen. XML IR: XXL, XIRQL, Elixir, JuruXML Multimedia IR Web Query Languages: W3QS, WebOQL, Araneus … Semistructured Data: Lore, Xyleme … 2nd Gen. XML IR: XRank,Timber, TIJAH, XSearch, FleXPath, CoXML, TopX, MarkLogic, Fast … Uncertain & Prob. Relations: Mystiq, Trio … Struct. Docs Deep Web Search Digital Libraries Graph IR
  • 9.
  • 10. XXL: Early XML IR [Anja Theobald, GW: Adding Relevance toXML, WebDB’00] Which professors from Saarbruecken (SB) are teaching IR and have research projects on XML? Union of heterogeneous sources without global schema Similarity-aware XPath: // ~ Professor [//* = ” ~ SB“] [ // ~ Course [//* = ” ~ IR“] ] [ // ~ Research [//* = ” ~ XML“] ] Similarity-aware XPath: // ~ Professor [//* = ” ~ SB“] [ // ~ Course [//* = ” ~ IR“] ] [ // ~ Research [//* = ” ~ XML“] ] Professor Name : Gerhard Weikum Address ... City : SB Country : Germany Teaching Research Course Title : IR Description : Information retrieval ... Syllabus ... Book Article ... ... Project Title : Intelligent Search of Heterogeneous XML Data Funding : EU ... Name : Ralf Schenkel Lecturer Address: Max-Planck Institute for Informatics, Germany Activities Seminar Contents: Ranked retrieval … Literature: … Scientific Name: INEX task coordinator (Initiative for the Evaluation of XML …) Other Sponsor: EU …
  • 11.
  • 12.
  • 13. Outline  Past • Future • Present : Matter, Antimatter, and Wormholes : From Data to Knowledge : XML and Graph IR
  • 14.
  • 15. Commercial Break [Martin Theobald, Ralf Schenkel, GW: VLDB’95] TopX demo today 3:30 – 5:30
  • 16.
  • 17.
  • 18.
  • 19.
  • 20. Outline  Past • Future  Present : Matter, Antimatter, and Wormholes : From Data to Knowledge : XML and Graph IR
  • 21. Knowledge Queries Nobel laureate who survived both world wars and his children drama with three women making a prophecy to a British nobleman that he will become king proteins that inhibit both protease and some other enzyme connection between Thomas Mann and Goethe differences in Rembetiko music from Greece and from Turkey neutron stars with Xray bursts > 10 40 erg s -1 & black holes in 10‘‘ market impact of Web2.0 technology in December 2006 sympathy or antipathy for Germany from May to August 2006 Turn the Web, Web2.0, and Web3.0 into the world‘s most comprehensive knowledge base („ semantic DB “) ! Answer „knowledge queries“ such as:
  • 22.
  • 23.
  • 24.
  • 25. High-Quality Knowledge Sources Wikipedia and other lexical sources
  • 26. Exploit Hand-Crafted Knowledge {{Infobox_Scientist | name = Max Planck | birth_date = [[April 23]], [[1858]] | birth_place = [[Kiel]], [[Germany]] | death_date = [[October 4]], [[1947]] | death_place = [[Göttingen]], [[Germany]] | residence = [[Germany]] | nationality = [[Germany|German]] | field = [[Physicist]] | work_institution = [[University of Kiel]]</br> [[Humboldt-Universität zu Berlin]]</br> [[Georg-August-Universität Göttingen]] | alma_mater = [[Ludwig-Maximilians-Universität München]] | doctoral_advisor = [[Philipp von Jolly]] | doctoral_students = [[Gustav Ludwig Hertz]]</br> … | known_for = [[Planck's constant]], [[Quantum mechanics|quantum theory]] | prizes = [[Nobel Prize in Physics]] (1918) … Wikipedia, WordNet, and other lexical sources
  • 27.
  • 28. YAGO Knowledge Representation Entity Max_Planck April 23, 1858 Person City Country subclass Location subclass instanceOf subclass subclass bornOn “ Max Planck” means “ Dr. Planck” means subclass October 4, 1947 diedOn Kiel bornIn Nobel Prize Erwin_Planck FatherOf hasWon Scientist means “ Max Karl Ernst Ludwig Planck” Physicist instanceOf subclass Biologist subclass concepts individuals words Online access and download at http://www.mpi-inf.mpg.de/~suchanek/yago/ Accuracy: 97% Knowledge Base # Facts KnowItAll 30 000 SUMO 60 000 WordNet 200 000 OpenCyc 300 000 Cyc 5 000 000 YAGO 6 000 000
  • 29. NAGA: Graph IR on YAGO [G. Kasneci et al.: WWW‘07] queries with regular expressions Ling $x scientist isa hasFirstName | hasLastName $y Zhejiang locatedIn * worksFor conjunctive queries Beng Chin Ooi (coAuthor | advisor) * Kiel $x scientist isa bornIn Graph-based search on YAGO-style knowledge bases with built-in ranking based on confidence and informativeness  statistical language model for result graphs
  • 30.
  • 31. Information Extraction (IE): Text to Records combine NLP, pattern matching, lexicons, statistical learning Max Planck 4/23, 1858 Kiel Albert Einstein 3/14, 1879 Ulm Mahatma Gandhi 10/2, 1869 Porbandar Person BirthDate BirthPlace ... Person ScientificResult Max Planck Quantum Theory Person Collaborator Max Planck Albert Einstein Max Planck Niels Bohr Planck‘s constant 6.226  10 23 Js Constant Value Dimension
  • 32.
  • 33. Methods for Web-Scale Fact Extration city(Beijing) plays(Coltrane, sax) city(Beijing) old center of Beijing plays(Coltrane, sax) sax player Coltrane city(Beijing) old center of Beijing old center of X plays(Coltrane, sax) sax player Coltrane Y player X Example: city (Seattle) in downtown Seattle city (Seattle) Seattle and other towns city (Las Vegas) Las Vegas and other towns plays (Zappa, guitar) playing guitar: … Zappa plays (Davis, trumpet) Davis … blows trumpet seeds  text  rules  new facts Example: city (Seattle) in downtown Seattle in downtown X city (Seattle) Seattle and other towns X and other towns city (Las Vegas) Las Vegas and other towns X and other towns plays (Zappa, guitar) playing guitar: … Zappa playing Y: … X plays (Davis, trumpet) Davis … blows trumpet X … blows Y Example: city (Seattle) in downtown Seattle in downtown X city (Seattle) Seattle and other towns X and other towns city (Las Vegas) Las Vegas and other towns X and other towns plays (Zappa, guitar) playing guitar: … Zappa playing Y: … X plays (Davis, trumpet) Davis … blows trumpet X … blows Y Example: city (Seattle) in downtown Seattle in downtown X city (Seattle) Seattle and other towns X and other towns city (Las Vegas) Las Vegas and other towns X and other towns plays (Zappa, guitar) playing guitar: … Zappa playing Y: … X plays (Davis, trumpet) Davis … blows trumpet X … blows Y in downtown Beijing city(Beijing) Coltrane blows sax plays(C., sax) Assessment of facts & generation of rules based on statistics Rules can be more sophisticated: playing NN: (ADJ|ADV)* NP & class(NN)=instrument & class(head(NP))=person  plays(head(NP), NN)
  • 34. Performance of Web-IE State-of-the-art precision/recall results: Anecdotic evidence: invented (A.G. Bell, telephone) married (Hillary Clinton, Bill Clinton) isa (yoga, relaxation technique) isa ( zearalenone, mycotoxin) contains (chocolate, theobromine) contains (Singapore sling, gin) invented (Johannes Kepler, logarithm tables) married (Segolene Royal, Francois Hollande) isa (yoga, excellent way) isa (your day, good one) contains (chocolate, raisins) plays (the liver, central role) makes (everybody, mistakes) relation precision recall corpus systems countries 80% 90% Web KnowItAll cities 80% ??? Web KnowItAll scientists 60% ??? Web KnowItAll headquarters 90% 50% News Snowball, LEILA birthdates 80% 70% Wikipedia LEILA instanceOf 40% 20% Web Text2Onto, LEILA Open IE 80% ??? Web TextRunner precision value-chain: entities 80%, attributes 70%, facts 60%, events 50%
  • 35.
  • 36.
  • 37.
  • 38. Outline  Past  Future  Present : Matter, Antimatter, and Wormholes : From Data to Knowledge : XML and Graph IR
  • 39. Major Trends in DB and IR malleable schema (later) deep NLP, adding structure record linkage info extraction graph mining entity-relationship graph IR ontologies ranking Database Systems Information Retrieval statistical language models data uncertainty programmability search as Web Service dataspaces Web objects Web 2.0 Web 2.0
  • 40.
  • 41.