Your SlideShare is downloading. ×

Semantic Search

607

Published on

Presentation at VfM Seminar 'Medieninformation und Mediendokumentation', Bonn, 12.03.2013

Presentation at VfM Seminar 'Medieninformation und Mediendokumentation', Bonn, 12.03.2013

Published in: Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
607
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
27
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Semantic Search (Semantische Suche) Bonn, 12. März 2013 Dr. Harald Sack Hasso-Plattner-Institut for IT-Systems Engineering University of PotsdamDonnerstag, 14. März 13
  • 2. Hasso Plattner Institute for IT Systems Engineering University of Potsdam2 ■ HPI was founded in October 1998 as a public-private-partnership ■ HPI research and teaching is focussed on IT Systems Engineering ■ 10 professors and 100 scientific coworkers ■ 450 bachelor / master students ■ Winner of CHE-ranking 2010vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut PotsdamDonnerstag, 14. März 13
  • 3. Research Group Semantic Technologies & Multimedia Retrieval ■ Research Topics ■ Semantic Web Technologies ■ Ontological Engineering ■ Information Retrieval ■ Multimedia Analysis & Retrieval ■ Social Networking ■ Data/Information Visualization ■ Research Projects:Donnerstag, 14. März 13
  • 4. Semantic Search Inhalt:4 ■ Introduction ■ Media Analysis ■ Semantic Analysis ■ Semantic Search ■ Explorative Search ■ Realization Albrecht Dürer: Melancholia I, 1514Donnerstag, 14. März 13
  • 5. 5 The ‘Google Dilemma‘ Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 6. 6 Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 7. Google Multimedia SearchHarald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, Workshop ,Corporate Semantic Web‘, XInnovations 2011, Berlin, 19. Sep. 2011Donnerstag, 14. März 13
  • 8. Content Based Retrieval is Based on Textual Metadata Google Multimedia SearchHarald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011Donnerstag, 14. März 13
  • 9. Seach by Media ContentHarald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011Donnerstag, 14. März 13
  • 10. The Ordinary Archive is a Small World... Jules VerneHarald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011Donnerstag, 14. März 13
  • 11. But, wouldn‘t it be nice, if..... ...but maybe you are also interested in - George Melies (2 videos) - Mark Twain (1 video) Jules Verne - H.G. Wells (2 videos) - science fiction (11 videos) - adventure (20 videos) - France (101 videos) - Moon (33 videos) - literature (434 videos) - art (1.205 videos)Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011Donnerstag, 14. März 13
  • 12. (Traditional) Information RetrievalHarald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011Donnerstag, 14. März 13
  • 13. (Simplified) Information Retrieval Model Information requests files of records13 Set of Queries Set of Documents similarity Query indexing Formulation indexing language (acc. to Salton,G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York 1983) Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 14. Evaluation of Information Retrieval Systems14 relevant documents that have been retrieved |R∩P| Recall= |R| |R∩P| Precision= P |P| R (1+α)⋅(Recall ⋅ Precision ) Fα= α⋅(Recall + Precision ) relevant documents retrieved documents Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 15. Search Engines in the World Wide Web15 • The World Wide Web is a distributed hypermedia system that • consists of multimedia documents and • is connected via hyperlinks Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 16. Search Engines in the WWW Information Gathering via Web Crawler (Robot)16 HTTP Request WWW-Server 2 4 http://www.xxxx.de/1234... http://www.xxxx.de/2234... http://www.xxxx.de/3234... http://www.xxxx.de/4234... <a href=“...“ .../> http://www.xxxx.de/5234... 1 http://www.xxxx.de/6234... http://www.xxxx.de/7234... <a href=“...“ .../> ... WWW server delivers requested HTML documents to the web 3 crawler HTML URL list documents Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 17. Search Engines in the WWW Preprocessing and Indexing17 Data Normalization Tokenization Speech Identification Data Analysis and creation of Word Stemming index data structures POS-Tagging Descriptor Generation Web Crawler Document Preprocessing Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 18. Search Engines in the WWW Efficient Index Data Structures Ananas18 DocID Pos Frequency Weight D123 1;13;77;132 4 9.4 D456 22;38 2 6.7 Aachen … … … … D998 15 1 1.2 Altavista Ananas Inverted File … … Zustand Location List D123 Zypern Frequency URL <H1> … <H6> <title> … text 4 1 1 0 1 … 1 Index D123 http://producers.ananas.org/index.htm <html> <head><title=“Ananas around the World“> </head> <body> … </body> </html> Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam FileDonnerstag, 14. März 13
  • 19. Search Engines in theWWW Relevance Ranking19 • based on Link Popularity (Google PageRank) Start Iteration of the PageRank computation resulting PageRank A B A B Nr. PR(A) PR(B) PR(C) PR(D) 1.0 1.0 1 1,0 1,0 1,0 1,0 1.49 0,78 2 1,0 0,575 2,275 0,15 3 2,083 0,575 1,191 0,15 2 … … … … … 1.0 1.0 1.57 0,15 n 1,49 0,7833 1,577 0,15 C D C D Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 20. 20 The Web is big. Really big. You just wont believe how vastly, hugely, mind-bogglingly big it is. (...according to Douglas Adams) Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 21. 21 fa llac ies... has its Lang uage Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 22. 22 icular, age in part the l angu don‘t know i f we Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 23. 23 Semantic Search Inhalt: ■ Introduction ■ Media Analysis ■ Semantic Analysis ■ Semantic Search ■ Explorative Search ■ RealizationDonnerstag, 14. März 13
  • 24. Searching a (Multi) Media Archive Step 1: Digitization of analog media Step 2: Annotation with (text-based) metadata Step 3: Content-based Retrieval based on available metadatavfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut PotsdamDonnerstag, 14. März 13
  • 25. Today: Manual Annotationvfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut PotsdamDonnerstag, 14. März 13
  • 26. Automated Media Analysis Audio-Mining structural Automated audio event analysis Speech detection Recognition Visual Face Concept Logo Detection Detection Face Detection Detection Visual Analysis Text RecognitionHarald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011Donnerstag, 14. März 13
  • 27. Automated Media Analysis ■ Result: multimedia data with spatio-temporal Annotations time temporal metadata (e.g. MPEG-7) ... <Video> <TemporalDecomposition> <VideoSegment> <TextAnnotation> <KeywordAnnotation> <Keyword>Astronaut</Keyword> </KeywordAnnotation> </TextAnnotation> <MediaTime> <MediaTimePoint> T00:05:05:0F25 </MediaTimePoint> <MediaDuration> PT00H00M31S0N25F </MediaDuration> </MediaTime> ... </VideoSegment> </TemporalDecomposition> </Video> ...vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut PotsdamDonnerstag, 14. März 13
  • 28. Automated Media Analysis ■ Result: multimedia data with spatio-temporal Annotations time spatial metadata (e.g. MPEG-7) ... <SpatialDecomposition> <TextAnnotation> <KeywordAnnotation> <Keyword>Astronaut</Keyword> </KeywordAnnotation> </TextAnnotation> <SpatialMask> <SubRegion> <Polygon> <Coords> 480 150 620 480 </Coords> </Polygon> </SubRegion> </SpatialMask> ... </SpatialDecomposition> ...vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut PotsdamDonnerstag, 14. März 13
  • 29. vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut PotsdamDonnerstag, 14. März 13
  • 30. How to Determine the Meaning of Metadata? • Authoritative Metadata level of abstraction • structured data accura cy • semi-structured data • natural language text re liability • Non-authoritative Metadata Semantic • (free) user tags and comments Analysis • restricted vocabularies context • (Media) Analysis Metadata agm atics pr • low level features location • high level features dependency • etc. time dependencyHarald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011Donnerstag, 14. März 13
  • 31. 24 24 24 24 24 24 24 4 24 24 24 231 Semantic Search Inhalt: ■ Introduction ■ Media Analysis ■ Semantic Analysis ■ Semantic Search ■ Explorative Search ■ RealizationDonnerstag, 14. März 13
  • 32. 24 Semantic Metadata 24 24 24 24 Multimedia Ontologies 24 24 4 24 24 24 2 • MPEG-7 has been re-engineered to become an OWL-DL32 ontology (2007: Arndt et al., COMM model) • Localize a region → Draw a bounding box • Annotate the content → Interpret the content → Tag ,Astronaut‘ Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 33. 24 Semantic Metadata 24 24 24 24 Multimedia Ontologies 24 24 4 24 24 24 2 Example: Tagging with an MPEG-7 Ontology33 Reg1 mpeg7:StillRegion rdf:type decom position Reg1 mpeg7 :spatial_ mpeg7:image mpeg7:SpatialMask mpeg7:depicts mpeg7:depicts mpeg7:polygon “Neil Armstrong“ mpeg7:Coords “Man on the Moon“ Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 34. ,Neil Armstrong‘ is more than just a character string Juri Gagarin is a Neil Armstrong Entities is a is a Ontologies same as Kosmonaut Astronaut Person subClassOf is NOT a Science Occupation subClassOf has an EmploymentDonnerstag, 14. März 13
  • 35. Where does the knowledge come from...?Donnerstag, 14. März 13
  • 36. Web of Data = Linked Open DataDonnerstag, 14. März 13
  • 37. Where does the knowledge come from...? :Neil_Armstrong rdf:type dbpedia-owl:Astronaut . subject property object rdf:type :Neil_Armstrong dbpedia-owl:AstronautDonnerstag, 14. März 13
  • 38. Named Entity Mapping38 rdfs:label Neil Armstrong Neil Armstrong is a is a rdf:type dbpedia-owl:Astronaut Astronaut Person subClassOf rdf:type foaf:Person Science Occupation subClassOf Employment Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 39. 24 24 Semantic Multimedia Retrieval 4 2 24 24 24 24 4 24 24 24 239 Video Analysis / time Metadata Extraction metadata metadata metadata metadata Entity Recognition/ metadata Mapping e.g., person xy location yz N. Ludwig, H. Sack: Named Entity Recognition for User- event abc Generated Tags. In Proc. of the 8th Int. Workshop on Text-based Information Retrieval, IEEE CS Press, 2011 e.g., bibliographical data, geographical data, encyclopedic data, ..vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut PotsdamDonnerstag, 14. März 13
  • 40. 24 24 Semantic Analysis 24 24 24 24 Named Entity Mapping 24 4 24 24 24 240 „Armstrong landed the Eagle on the Moon.“ Text Entity Mapping http://dbpedia.org/resource/Neil_Armstrong How do I fi nd the rig ht entity? Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 41. Arms tron g Semantic Web Technologies , Dr. Harald Sack, Hasso Plattner Institute, University of PotsdamDonnerstag, 14. März 13
  • 42. 24 24 Semantic Analysis 24 24 24 24 Named Entity Mapping 24 4 24 24 Entity Candidate Generation 24 2 Text „Armstrong landed the Eagle on the Moon.“ Determine possible Entity Mapping Candidates Anton Armstrong Armstrong, Ontario Armstrong Tools Ian Armstrong Armstrong, Ontario Armstrong (car) Edward Armstrong Armstrong, Florida Neil Armstrong Armstrong (moon crater) Armstrong County, Texas Gary Armstrong The Armstrongs George Armstrong Armstrong Tunnel Armstrong Bridge Louis Armstrong Craig Armstrong The Armstrong Twins Lance Armstrong + 400 more...Donnerstag, 14. März 13
  • 43. 24 24 Semantic Analysis 24 24 24 24 Named Entity Mapping 24 4 24 24 Entity Candidate Generation 24 2 Text „Armstrong landed the Eagle on the Moon.“ Determine possible Entity Mapping Candidates • linguistic analysis (POS tagging) • normalization • encoding and spelling • special (language dependent) characters • language dependent spellings • abbreviations, acronyms • type dependent spellings • alternative names and synonyms • fuzzy string mapping • ...Donnerstag, 14. März 13
  • 44. 24 24 Semantic Analysis 24 24 24 24 Named Entity Mapping 24 4 24 24 Entity Selection Process 24 2 Text „Armstrong landed the Eagle on the Moon.“ Entity Selection is determined by • context • ambiguity of source data / mapping • accuracy /reliability of source data / mapping Anton Armstrong Armstrong, Ontario Armstrong Tools Ian Armstrong Armstrong, Ontario Armstrong (car) Edward Armstrong Armstrong, Florida Neil Armstrong Armstrong (moon crater) Armstrong County, Texas Gary Armstrong The Armstrongs George Armstrong Armstrong Tunnel Armstrong BridgeDonnerstag, 14. März 13 Louis Armstrong Craig Armstrong
  • 45. 24 24 Semantic Analysis 24 24 24 24 Named Entity Mapping 24 4 24 24 SEMEX Multimedia Context Model 24 2 Context Item Contextual Description Context Dimensions Class Level of Source Source Temporal Spatial Social Diversity Structure Reliability Diversity Context Context Context influences influences determines Ambiguity Accuracy Relevance Text „Armstrong landed the Eagle on the Moon.“ N.Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, 2013Donnerstag, 14. März 13
  • 46. Semantic Analysis Consider all entities within the same context Named Entity Mapping „Armstrong landed the Eagle on the Moon.“ Armstrong Eagle Moon 448 entities 95 entities 156 entities Man on the Moon (film) George Armstrong Custer Eagle (Bird) Moon (song) Neil Armstrong Eagle (heraldry) Moon Son-Ri The Armstrong Twins USCGC Eagle Moon 44 C Moon Armstrong, Florida The Eagle (2011 film) Eagle (comic) The Moon (Tarot card) Craig Armstrong Armstrong, Ontario Man on the Moon (soundtrack) Eagle (song) Moon Armstrong (Moon Crater) Eagle (lunar module) Armstrong Gun The Eagle (newspaper) Man on the Moon (musical) Armstrong‘s Theorem War Eagle Mr. Moon (song) Eagle (Moon Crater) Louis Armstrong International Airport Moon (Band) The Eagle (Pub) Armstrong County, Texass Moon OS Eagle TV Eagle Falls (Washington) Moon 83 Joe Armstrong Lottie Moon Ian Armstrong Eagle (racehorse) Edgar Moon Armstrong Tunnel Armstrong Tunnel Armstrong Automobile John H. Eagle Darvin Moon Sir Thomas Armstrong Eagle (typeface) Gary Moon William Moon Louis Armstrong Angela Eagle Francis Moon Armstrong (British Columbia) Linda Eagle Robert Charles Moon Karen Armstrong Allan Moon Curtis Armstrong James Philipp Eagle Fly me to the Moon (song) Hilary Armstrong Black Moon Ban-Ki Moon Gillian Armstrong William L. ArmstrongDonnerstag, 14. März 13
  • 47. Semantic Analysis Named Entity Recognition Entity Selection Process Select matching entities from all possible candidate entities: • Popularity based strategies • reference text corpus (wikipedia) • Linguistical strategies • link graph (wikipedia) • Statistical strategies • semantic graph • Semantic based strategies (dbpedia) General Approach 1. Make an assumption 2. Do the strategies support or contradict your assumption 3. Make decision according to logical and probabilistic rules/constraints N. Ludwig, H. Sack, “Named entity recognition for user-generated tags,TIR 2011Donnerstag, 14. März 13
  • 48. Semantic Analysis Entity Selection Process (Semantic) Graph Analysis Named Entity Recognition „Armstrong landed the Eagle on the Moon.“ Armstrong Eagle Moon 448 entities 95 entities 156 entities Man on the Moon (film) George Armstrong Custer Eagle (Bird) Moon (song) Neil Armstrong Eagle (heraldry) Moon Son-Ri The Armstrong Twins USCGC Eagle Moon 44 C Moon Armstrong, Florida The Eagle (2011 film) Eagle (comic) The Moon (Tarot card) Craig Armstrong Armstrong, Ontario Moon Man on the Moon (soundtrack) Eagle (song) Armstrong (Moon Crater) Eagle (lunar module) Armstrong Gun The Eagle (newspaper) Man on the Moon (musical) Armstrong‘s Theorem War Eagle Mr. Moon (song) Eagle (Moon Crater) Louis Armstrong International Airport Moon (Band) The Eagle (Pub) Armstrong County, Texass Moon OS Eagle TV Eagle Falls (Washington) Moon 83 Joe Armstrong Lottie Moon Ian Armstrong Eagle (racehorse) Edgar Moon Armstrong Tunnel Armstrong Tunnel Armstrong Automobile John H. Eagle Darvin Moon Sir Thomas Armstrong Eagle (typeface) Gary Moon William Moon Louis Armstrong Angela Eagle Francis Moon Armstrong (British Columbia) Linda Eagle Robert Charles Moon Karen Armstrong Allan Moon Curtis Armstrong James Philipp Eagle Fly me to the Moon (song) Hilary Armstrong Black Moon Ban-Ki Moon Gillian Armstrong William L. Armstrong N. Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, 2013Donnerstag, 14. März 13
  • 49. Semantic Search Inhalt: ■ Introduction ■ Media Analysis ■ Semantic Analysis ■ Semantic Search ■ Explorative Search ■ Realization Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13 Turmbau zu Babel, Pieter Brueghel, 1563
  • 50. How to use semantic metadata in retrieval?vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut PotsdamDonnerstag, 14. März 13 Turmbau zu Babel, Pieter Brueghel, 1563
  • 51. Semantic Search (One of many Definitions...)51 • Annotation of (text-based) metadata with semantic entities • Entity-based Information Retrieval • Make use of semantic relations, as e.g. content-based similarities of relationships • Interoperable metadata via semantic annotations • for content-based description • for structural / technical description (Multimedia Ontologies) Overall Goal: Quantitative and qualitative improvement of Information Retrieval Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 52. Semantic metadata enable improvement of traditional keyword-based retrieval by (1) Query String Extension/Refinement enables more precise or more complete search results (2) Cross Referencing enables to complement search results with additional associated or similar information (3) Exploratory Search enables visualization and navigation of the search space (4) Reasoning enables to complement search results with implicitly given information Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13 Turmbau zu Babel, Pieter Brueghel, 1563
  • 53. Semantic Search Query String Extension53 • Keyword-based search does not deliver all search results that are relevant for a query, because synonyms and metaphors might describe the queried content. • Extension of the original query string (Query Extension) • from dictionaries and thesauri • extend query with synonyms, hyponyms (specializations), etc. • from domain ontologies • extend query with meronyms (part-of), related concepts, etc. Original query string: Bank possible extensions: Bank ∨ depository financial institution ∨ credit union ∨ acquirer ∨ federal reserve ∨ ... increase recall Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 54. Semantic Search Query String Refinement54 • Keyword-based search does also deliver search results that are not relevant for a query, because query terms and document terms might be ambiguous. • Refinement of the original query string (Query Refinement) • from dictionaries and thesauri • disambiguate polysemic terms with hypernyms (generalizations) • from domain ontologies • disambiguate polysemic terms with holonyms Original query string: Bank possible refinements: (1) Bank ∧ financial institution (2) Bank ∧ incline ∧ slope ∧ side (3) Bank ∧ container (4) Bank ∧ deposit ∧ repository increase precision Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 55. Semantic Search Cross Referencing55 • Provide search results that do not literally contain the query string but are closely related to the query by content • Apply domain ontologies for determining related concepts • Appy statistical analysis of large (text) document corpora dbprop:mission dbpedia:Michael_Collins dbpedia:Apollo_11 dbprop:mission dbprop:mission Neil Armstrong dbpedia:Neil_Armstrong dbpedia:Buzz_Aldrin NER query string Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 56. Semantic Search Exploratory Search56 • Provide additional search results that do not necessarely contain 95 the query string but are related to the query by content or also are related to the search results achieved by the direct query • Apply domain ontologies and heuristics to determine the relevance of facts dcterms:subject category:Apollo_program dbpedia:Apollo_11 dcterms:subject dbpedia-owl:mission dbpedia:Apollo_13 rdf:type dbpedia:Neil_Armstrong yago:Space_accidents_and_incidents rdf:type dbpedia:Space_Shuttle_Challenger Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 57. Semantic Search Reasoning57 • Provide additional search results (and information) that do not 95 necessarely contain the query string but are related to the query by content, whereby the relation may not be a direct one, but can be derived via entailment. • Apply domain ontologies, resoning algorithms and heuristics to find new facts and determine the relevance of facts Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 58. Semantic Search Reasoning58 95 Example: query string= Neil Armstrong (Hard) questions to solve via reasoning: • Will there be the Moon or documents about the Moon in the search results? • How is Neil Armstrong related to the Moon? (is he?) • Was Neil Armstrong (really) on the Moon? • ... category:Missions_to_the_Moon dcterms:subject dcterms:subject category:Exploration_of_the_Moon dbpedia:Apollo_11 skos:broader skos:broader dbpedia-owl:mission category:Spaceflight dbpedia:Neil_Armstrong category:Moon dcterms:subject skos:broader dbpedia:Moon category:Animals_in_Space Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 59. 59 6 Semantic Search Inhalt: ■ Introduction ■ Media Analysis ■ Semantic Analysis ■ Semantic Search ■ Explorative Search ■ Realizationvfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn http://www.gocomics.com/calvinandhobbes/ Jörg Waitelonis, Hasso-Plattner-Institut PotsdamDonnerstag, 14. März 13
  • 60. Searching is not60 always just searching Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 61. I‘m looking for the book „Brave New World“ by Aldous Huxley in the first German edition...61 Brave Ne - The Al w World. (Hamburg batros C - Aldous H U X ontinent L E Y. 257 S. 8 usw., Albatros al Library, 47 “ Verlag, 1933) II 1, 25 06, 3454 8 Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 62. 62 I really liked „Brave New World“ by Aldous Huxley but how should I find what to read next...? Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 63. 63 Exploratory Search • What, if the user does not know, which query string to use? • What, if the user is looking for complex answers? • What, if the user does not know the domain he/she is looking for? • What, if the user wants to know all(!) about a specific topic? • ...,Browsing‘ instead of ,Searching‘ • ...to find something by chance, i.e. Serendipity • ...to get an overview • ...enable content based navigation Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 64. Enable Exploratory Search based on Linked Open Data64 http://dbpedia.org/page/Brave_New_World Gather knowledge about dbpedia:Brave_New_World and decide, which interesting fact to follow.... Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 65. 65 or uth :a wl r -o ho ut dia l :a w pe ia-o ed db p db dbpedia-owl:author dbpedia-owl:author dbpedia:Aldous_Huxley dbpedia:Brave_New_World Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 66. dbpedia:H._G._Wells66 dbpedia:George_Orwell es nc lue inf y/ log to s ce on en flu ia: in y/ ed g p olo nt db ia:o p ed db dbpedia-owl:author dbpedia:ontology/influences dbpedia:Aldous_Huxleydbpedia:Brave_New_World dbpedia:Michel_Houellebecq Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 67. dbpedia:H._G._Wells dbpedia:George_Orwell dbpedia:Michel_Houellebecq67 dbpedia-owl:notableWork dbpedia-owl:notableWork dbpedia-owl:notableWorkdbpedia:The_Time_Machine dbpedia:Nineteen_Eighty-Four dbpedia:Les_Particules_élémentaires Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 68. ...and now please surprise me.....SERENDIPITY68 Yago:EnglishExpatriatesInTheUnitedStates dbpedia:Tim_Berners-Lee dbpedia-owl:author rdf:type rdf:type rdf:type dbpedia-owl:starring dbpprop:inventordbpedia:Aldous_Huxley dbpedia:Patrick_Stewart dbpedia:World_Wide_Web dbpedia:Star_Trek:_The_Next_Generation Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 69. Explorative Search dbpedia:Michael_Collins69 dbpedia-owl:mission dbpedia:Apollo_11 dbpedia-owl:mission dcterms:subject dbpedia-owl:mission dbpedia:Neil_Armstrong dbpedia:Buzz_Collins dcterms:subject category:Apollo_program dbpedia:Apollo_13 rdf:type dbpedia:Space_Shuttle_Challenger yago:Space_accidents_and_incidents rdf:type Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 70. Exploratory Search and Serendipity • Find something that you were not looking for on purpose ... dbpedia:Buzz_Collins dbpedia:Cookie_Monster dbpedia:Strictly_Come_DancingDonnerstag, 14. März 13
  • 71. 71 Semantic Search Inhalt: ■ Introduction ■ Multimedia Analysis ■ Semantic Analysis ■ Semantic Search ■ Explorative Search ■ Realization Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 72. Entity Based Search http://www.yovisto.com/labs/autosuggestion/72• Query string refinement / extension • linguistic ambiguities of traditional keyword based• entity auto-suggestion search can be avoided• interpretation of natural language queries • enables high precision and high recall retrieval Vorlesung Semantic Waitelonis,Harald Sack, Hasso-Plattner-Institut, Universität Potsdam a rich and yet immediate Starting Point for Exploratory Search, IVDW 2012 J. Osterhoff, J. Web, Dr. H. Sack, Widen the Peepholes! Entity-Based Auto-Suggestion asDonnerstag, 14. März 13
  • 73. http://mediaglobe.yovisto.com:8080/mggui-dev2/73 search facets C. Hentschel, H. Sack, et al., Open up cultural heritage in video archives with mediaglobe, I2CS 2012 Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamDonnerstag, 14. März 13
  • 74. Donnerstag, 14. März 13
  • 75. Exploratory Search with yovisto75 http://mediaglobe.yovisto.com:8080/ Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universitätexploratory video search using linked data, MTAP Volume 59, Number 2 (2012), 645-672 J. Waitelonis, H. Sack: Towards PotsdamDonnerstag, 14. März 13
  • 76. Semantic Search Inhalt:76 ■ Introduction ■ Media Analysis ■ Semantic Analysis ■ Semantic Search ■ Explorative Search ■ Realization Albrecht Dürer: Melancholia I, 1514Donnerstag, 14. März 13
  • 77. Contact: Dr. Harald Sack Hasso-Plattner-Institut für Softwaresystemtechnik Universität Potsdam Prof.-Dr.-Helmert-Str. 2-3 D-14482 Potsdam Homepage: ttp://www.hpi.uni-potsdam.de/meinel/team/sack.html h Blog: http://yovisto.blogspot.com/ E-Mail: harald.sack@hpi.uni-potsdam.de Twitter: lysander07 / biblionomicon / yovisto Slides can be found at http://slideshare.com/lysander07/ ch more about Semantic Web Technologies t http://www.openhpi.de/ ve ry mu k y ou T han tio n! ur at ten f or yoHarald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011Donnerstag, 14. März 13

×