02 Semantic Multimedia - Einfuehrungs-Workshop SS2012
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

02 Semantic Multimedia - Einfuehrungs-Workshop SS2012

on

  • 775 views

EInführungs-Workshop zum Seminar Semantic Multimedia, Sommersemester 2012, Hasso-Plattner-Inbstitut, Universität Potsdam, Dr. Harald Sack

EInführungs-Workshop zum Seminar Semantic Multimedia, Sommersemester 2012, Hasso-Plattner-Inbstitut, Universität Potsdam, Dr. Harald Sack

Statistics

Views

Total Views
775
Views on SlideShare
764
Embed Views
11

Actions

Likes
0
Downloads
1
Comments
0

2 Embeds 11

http://semmul2012.blogspot.de 10
http://semmul2012.blogspot.com.es 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

02 Semantic Multimedia - Einfuehrungs-Workshop SS2012 Presentation Transcript

  • 1. Master Seminar SS2012 Semantic Multimedia Einführungsworkshop 16.04.2012 Dr. Harald Sack / Nadine SteinmetzDonnerstag, 3. Mai 12
  • 2. Überblick - Bausteine2 Linked Data - Einführung RDF / OWL / Kategorien- Bibliografie Linked Data SPARQL / systeme Daten Dumps JENA Text Mining - Einführung POS Information Disambi- Tagging/ NER Daten Retrieval guierung Stemming Einführungsworkshop Master Seminar SS 2012 - Semantic Multimedia, Dr. Harald Sack / Nadine Steinmetz, Hasso-Plattner-Institut, PotsdamDonnerstag, 3. Mai 12
  • 3. RDF / OWL / SPARQL / JENA3 <owl:Class rdf:about="http://dbpedia.org/ontology/Spacecraft"> <rdfs:label xml:lang="en">spacecraft</rdfs:label><rdfs:label OWL xml:lang="fr">vaisseau spatial</rdfs:label><rdfs:subClassOf rdf:resource="http://dbpedia.org/ontology/MeanOfTransportation"></owl:Class> <http://dbpedia.org/resource/Autism> <http://www.w3.org/1999/02/22-rdf-syntax- ns#type> <http://www.w3.org/2002/07/owl#Thing> . <http://dbpedia.org/resource/Aristotle> <http://www.w3.org/1999/02/22-rdf- syntax-ns#type> <http://dbpedia.org/ontology/Philosopher> . RDF select * where {<http://dbpedia.org/resource/Berlin> ?p ?o . ?o a <http://dbpedia.org/ontology/Person> . --> LIMIT 100 SPARQL com.hp.hpl.jena.query.ResultSet result = qexecw2.execSelect(); if (result != null) { while (result.hasNext()) { QuerySolution querysol = result.nextSolution(); ! Object aux2 = querysol.get("o"); JENA Einführungsworkshop Master Seminar SS 2012 - Semantic Multimedia, Dr. Harald Sack / Nadine Steinmetz, Hasso-Plattner-Institut, PotsdamDonnerstag, 3. Mai 12
  • 4. Kategoriensysteme4 OWL Lite dbpedia-Ontologie <http://dbpedia.org/resource/Alabama> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/AdministrativeRegion> . <http://dbpedia.org/resource/Alabama> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/PopulatedPlace> . <http://dbpedia.org/resource/Alabama> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Place> . Wikipedia-Kategorien SKOS <http://dbpedia.org/resource/Alabama> <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:Former_British_colonies> . <http://dbpedia.org/resource/Alabama> <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:Place_names_in_Alabama_of_Native_American_origin> . <http://dbpedia.org/resource/Alabama> <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:States_and_territories_established_in_1819> . Einführungsworkshop Master Seminar SS 2012 - Semantic Multimedia, Dr. Harald Sack / Nadine Steinmetz, Hasso-Plattner-Institut, PotsdamDonnerstag, 3. Mai 12
  • 5. Kategoriensysteme5 RDFS Yago <http://dbpedia.org/resource/Alabama> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/class/yago/StatesOfTheConfederateStatesOfAmerica> . <http://dbpedia.org/resource/Alabama> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/class/yago/SouthernUnitedStates> . <http://dbpedia.org/resource/Alabama> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/class/yago/StatesOfTheUnitedStates> . Einführungsworkshop Master Seminar SS 2012 - Semantic Multimedia, Dr. Harald Sack / Nadine Steinmetz, Hasso-Plattner-Institut, PotsdamDonnerstag, 3. Mai 12
  • 6. Bibliografie Daten6 www.bibsonomy.org Dump zum Download unter: http://mediaglobe.yovisto.com/semmul2012/ http:// citeseer.is t.psu.edu www.mendeley.com Einführungsworkshop Master Seminar SS 2012 - Semantic Multimedia, Dr. Harald Sack / Nadine Steinmetz, Hasso-Plattner-Institut, PotsdamDonnerstag, 3. Mai 12
  • 7. Bibliografie Daten als Linked Data7 DBLP on the Semantic Web Einführungsworkshop Master Seminar SS 2012 - Semantic Multimedia, Dr. Harald Sack / Nadine Steinmetz, Hasso-Plattner-Institut, PotsdamDonnerstag, 3. Mai 12
  • 8. Linked Data Dumps & Datenverarbeitung8 instance_types_en.nt freebase_links.nt labels_en.nt disambiguations_en.nt gawk BEGIN{FS=" -->"; -->{a[$1]++;b[$1]=$3","b[$1] -- >END{for(i in a) print(i" -->"gensub(/http://dbpedia.org/ ontology//,"","g",gensub(/,$/,"","g",b[i]))) --> instance_types_de_wTreeDepth_sorted.txt | sed s/ Person,Person/Person/g | sort -t --> -k1,1 > instance_types_de_concat.txt join -a 1 -1 1 -2 1 -e null -o1.1,2.2,1.2 instance_types_de_concat.txt ../ owlSameAs_all_urlEncoded_sorted.txt | awk -F --> {if(gsub(/ null /," null ",$2)>0) print($1" -->"$3); else print($2" -->"$3) --> > instance_types_de_concat_wOwlSameAs.txt Einführungsworkshop Master Seminar SS 2012 - Semantic Multimedia, Dr. Harald Sack / Nadine Steinmetz, Hasso-Plattner-Institut, PotsdamDonnerstag, 3. Mai 12
  • 9. POS Tagging / Stemming9 Arjen Robben hätte der Held im Spitzenspiel bei Borussia Dortmund werden können. Stattdessen steht der Bayern-Star nach seinem vergebenen Elfmeter als der große Verlierer da. Arjen_ADJA Robben_NN hätte_VAFIN der_ART Held_NN im_APPRART Spitzenspiel_NN bei_APPR Borussia_NE Dortmund_NE werden_VAFIN können._ADJA Stattdessen_NN steht_VVFIN der_ART Bayern-Star_NN nach_APPR seinem_PPOSAT vergebenen_ADJA Elfmeter_NN als_APPR der_ART große_ADJA Verlierer_NN da._XY Arjen Robb hatt der Held im Spitzenspiel bei Borussia Dortmund werd konnen. Stattdess steht der Bayern-Star nach sein vergeb Elfmet als der gross Verli da. Einführungsworkshop Master Seminar SS 2012 - Semantic Multimedia, Dr. Harald Sack / Nadine Steinmetz, Hasso-Plattner-Institut, PotsdamDonnerstag, 3. Mai 12
  • 10. Information Retrieval Ansätze10 Gewichtung der Relevanz eines Wortes für ein tf*idf Dokument in Bezug auf Dokumentenkorpus Distanzmaße zum Vergleich von Dokumenten Ranking Funktion für Relevanz von Okapi BM25 Dokumenten in Bezug auf Suchanfragen Zusammenhang von Dokumenten bezüglich Latent Semantic Analysis (LSA) der enthaltenen Terme bzw. generierten Konzepten zu den Termen Pattern Detektion bezüglich der enthaltenen Latent Semantic Indexing (LSI) Terme in einer unstrukturierten Menge von Text-Dokumenten Einführungsworkshop Master Seminar SS 2012 - Semantic Multimedia, Dr. Harald Sack / Nadine Steinmetz, Hasso-Plattner-Institut, PotsdamDonnerstag, 3. Mai 12
  • 11. NER Daten11 alternative Labels the ford mustang http://dbpedia.org/resource/Ford_Mustang s-197 s197 Original Label: „Ford Mustang“ ronaele mustang ronaele mustnag mustang svt cobra mustangs mustang ford mustang coupe http://mediaglobe.yovisto.com:8080/semex/ ford mustang convertible ford 1972 ford mustang Einführungsworkshop Master Seminar SS 2012 - Semantic Multimedia, Dr. Harald Sack / Nadine Steinmetz, Hasso-Plattner-Institut, PotsdamDonnerstag, 3. Mai 12
  • 12. NER Daten12 Redirects http://dbpedia.org/resource/1972_Ford_Mustang --> http://dbpedia.org/resource/Ford_Mustang http://dbpedia.org/resource/Ford_Mustang_GT --> http://dbpedia.org/resource/Ford_Mustang http://dbpedia.org/resource/Ford_Mustang_GT_Convertible --> http://dbpedia.org/resource/Ford_Mustan http://dbpedia.org/resource/Ford_Mustang_GT_Coupe --> http://dbpedia.org/resource/Ford_Mustang http://dbpedia.org/resource/Mustang_%28car%29 --> http://dbpedia.org/resource/Ford_Mustang http://dbpedia.org/resource/Mustang_GT --> http://dbpedia.org/resource/Ford_Mustang http://dbpedia.org/resource/Mustang_SVT_Cobra_R --> http://dbpedia.org/resource/Ford_Mustang http://dbpedia.org/resource/Ronaele --> http://dbpedia.org/resource/Ford_Mustang http://dbpedia.org/resource/Ronaele_Mustang --> http://dbpedia.org/resource/Ford_Mustang http://dbpedia.org/resource/S-197 --> http://dbpedia.org/resource/Ford_Mustang http://dbpedia.org/resource/S197 --> http://dbpedia.org/resource/Ford_Mustang http://dbpedia.org/resource/The_Ford_Mustang --> http://dbpedia.org/resource/Ford_Mustang http://dbpedia.org/resource/Mustnag --> http://dbpedia.org/resource/Mustang Begriffsklärungsseiten http://dbpedia.org/resource/Mustang --> http://dbpedia.org/resource/Ford_Mustang Einführungsworkshop Master Seminar SS 2012 - Semantic Multimedia, Dr. Harald Sack / Nadine Steinmetz, Hasso-Plattner-Institut, PotsdamDonnerstag, 3. Mai 12
  • 13. NER Daten13 Weitere Datenquellen http:// wordnet.princeton.edu/ http:// wortschatz.uni- leipzig.de/ http:// de.wiktionar y.org/ Einführungsworkshop Master Seminar SS 2012 - Semantic Multimedia, Dr. Harald Sack / Nadine Steinmetz, Hasso-Plattner-Institut, PotsdamDonnerstag, 3. Mai 12
  • 14. Disambiguierung14 Steve McQueen rast mit dem Mustang die Route 101 entlang. Entity Erkennung Steve McQueen rast mit dem Mustang die Route 101 entlang. Kandidaten Finden Steve McQueen rast mit dem Mustang die Route 101 entlang. ate n aten 3 Kandidate n 34 Kandid 23 Kandid Disambiguierung Steve McQueen rast mit dem Mustang die Route 101 entlang. http://dbpedia.org/resource/Steve_McQueen http://dbpedia.org/resource/U.S._Route_101_in_California http://dbpedia.org/resource/Ford_Mustang Einführungsworkshop Master Seminar SS 2012 - Semantic Multimedia, Dr. Harald Sack / Nadine Steinmetz, Hasso-Plattner-Institut, PotsdamDonnerstag, 3. Mai 12
  • 15. Disambiguierung15 dbp:Mustang_(Jeans) dbp:U.S._Route_101_in_California dbp:Ford_Mustang dbp:Mustang_(horse) dbp:Steve_McQueen dbp:Steve_McQueen_(artist) Einführungsworkshop Master Seminar SS 2012 - Semantic Multimedia, Dr. Harald Sack / Nadine Steinmetz, Hasso-Plattner-Institut, PotsdamDonnerstag, 3. Mai 12
  • 16. We count the entity candidates the processed cand of all the other terms in the context of the term currently linked to. For link types b) and c) we also count the n processed (subsequently, this analysis step is referred to of different paths between two candidates. We calcul as CA). The score for an entity candidate is calculated as score for direct links as follows: follows: Disambiguierung j = 1...k C(t) = {t }, j counterdlinksi = k XXm |uri(t)i uri(tj )m | j=1 l=1 W (uri(t)i ) = {wr }, r = 1...|W (uri(t)i )| |t tk |16 Cooccurence Analyse scoredlinksi = · counterdlinksi t is the term currently disambiguated. C(t) is the set of |C(t)| terms in the context in which t has to be disambiguated. counterdlinksi is the number of candidates the pro Mustang W (uri(t)i ) is the set of all terms in the Wikipedia article candidate (uri(t)i ) is linked to directly. http://dbpedia.org/resource/Ford_Mustang With this calculation we achieve to get higher sco for the current entity candidate uri(t)i of the term t. To calculate the CA score the number (countercooci ) of how entity candidates that are linked to only one of the can often all other terms of the context occur in the article for of the other terms. Such candidates have fewer lin the entity candidate is determined as: these links are more explicit. An entity candidate, context tags: linked to more than one of the candidates of a specifi k |W (uri(t)i )| X X in the context is much less relevant, because thes countercooci = (tj , wr ) might reveal ambiguity again. The ranking we achi Route 101 j=1 r=1 our score calculation is shown in Fig. 3. ”uri 1” is Steve McQueen to one entity candidate of every term in the contex with (x, y) = { 1: x=y . 0: else implies, that this entity candidate is strongly related Finally, the CA score is calculated as follows: this context. Also, relationships of this candidate to th terms in the context are not ambiguous as the candi |W (uri(t)i ) ⇥ C(t)| scoreCAi = countercooci · |C(t)| 7 http://dbpedia.org/resource/United States score: 2.0 Einführungsworkshop Master Seminar SS 2012 - Semantic Multimedia, Dr. Harald Sack / Nadine Steinmetz, Hasso-Plattner-Institut, PotsdamDonnerstag, 3. Mai 12
  • 17. Disambiguierung17 Linkanalyse Direkte Links Symmetrische Links Unidirektionale Links Einführungsworkshop Master Seminar SS 2012 - Semantic Multimedia, Dr. Harald Sack / Nadine Steinmetz, Hasso-Plattner-Institut, PotsdamDonnerstag, 3. Mai 12
  • 18. Disambiguierung18 Linkanalyse |t tk | scoredlinksi = |C(t)|·counterdlinksi Einführungsworkshop Master Seminar SS 2012 - Semantic Multimedia, Dr. Harald Sack / Nadine Steinmetz, Hasso-Plattner-Institut, PotsdamDonnerstag, 3. Mai 12
  • 19. Weitere mögliche Ansätze19 Recommender Systeme Content-Based Filtering Collaborative Filtering • Empfehlung anhand der • Empfehlung anhand des Verhaltens Eigenschaften von Dokumenten der Benutzer • Eigenschaftsanalyse zum algorithmischen • Ähnlichkeit von Benutzerprofilen Vergleich von Dokumenten • Profil bezüglich der Nutzung • Schlagworte vs. Schlüsselworte von Dokumenten Einführungsworkshop Master Seminar SS 2012 - Semantic Multimedia, Dr. Harald Sack / Nadine Steinmetz, Hasso-Plattner-Institut, PotsdamDonnerstag, 3. Mai 12
  • 20. Hands-On Workshop20 RDF / OWL / Daten- Disambi- SPARQL / oder oder verarbeitung guierung JENA Einführungsworkshop Master Seminar SS 2012 - Semantic Multimedia, Dr. Harald Sack / Nadine Steinmetz, Hasso-Plattner-Institut, PotsdamDonnerstag, 3. Mai 12