BOTTARI: Location based Social Media Analysis with Semantic Web Emanuele Della Valle Joint work with: CEFRIEL : Irene Celino, Daniele Dell ’ Aglio, Marco Balduini  SALTLUX : Tony Lee, Seonho Kim  S IEMENS : Volker Tresp, Yi Huang
Watch this first :-) 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany http://www.youtube.com/watch?v=c1FmZUz5BOo
An augmented reality application for personalized  recommendation of restaurants  in Seoul What have you seen? 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany
Yes and no! Same use case, more “democratic” We do “reality mining” by listening to the social media Yet another  ? 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany
Architecture 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany out Query Rewriter Query Evaluator RDF2Matrix Plug-in Streaming Linked Data Server SOR Invoker SOR  geo-spatial KB Social Media Crawler and Sentiment Miner HTTP PULL: Query Initiated PUSH: Data Initiated SPARQL androjena
Sentiment Mining Precision tests: Auto-generated rules ≈ 70% Manually-coded rules ≈ 90% Syllable kernel  ≈ 50~60% Our target > 85% 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany Micropost message Morphologically Analyzable? Rule based Analysis Auto generated rules Learned documents SVMs Syllable Kernel Sentiment of the tweet Yes No
SOR - Geo-Spatial KB 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany
C-SPARQL and  Streaming Linked Data Server 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany
A machine learning framework for  inductive materialization Detects interesting data patterns Predics RDF-triples i.e., which restaurant a user will tweet positively about Caractheristics  Capability to  deal with sparse, high-dimensional and incomplete data Multivariate latent space based approach Modularized approach for easily integrating contextual information SUNS (Statistical Unit Node Sets) 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany
SELECT DISTINCT ?poi ?name ?lat ?long ?numPos ?prob  WHERE {  ?poi a ns:NamedPlace ;  ns:name ?name ;  geo:lat ?lat ;  geo:long ?long .  FILTER (f:within_distance( 37.5 ,  126.9 , ?lat, ?long,  200 ))  FILTER (f:dest_point_viewing( 37.5 ,  126.9 , ?lat, ?long,  90 ,  200 ))  {  :someUser  sioc:creator_of ?tweet .  ?tweet twd:talksAboutPositively ?poi .  WITH PROBABILITY ?prob  ENSURE PROBABILITY [0.5..1)  }  ?poi twd:numberOfPositiveTweets ?numPos .  }  ORDER BY DESC(?numPos), ?prob,    f:distance( 37.5 ,  126.9 , ?lat, ?long) LIMIT 10 Query Processing 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany GEO-SPATIAL PROBABILISTIC STREAMING
LarKC At Work 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany PULL: Query Initiated PUSH: Data Initiated SPARQL androjena Probabilistic  part of the query to get personalized recommendations (the  “ for me ”  button in BOTTARI) Geo-Spatial  part of the query to get POIs closer to user location Streaming  part of the query to get trends in users' sentiment  (the  “ emerging ”  button in BOTTARI) Input user query is split Results of the  different computations are joined out Query Rewriter Query Evaluator RDF2Matrix Plug-in Streaming Linked Data Server SOR Invoker SOR  geo-spatial KB Social Media Crawler and Sentiment Miner HTTP
Evaluation - Efficacy 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany 5  10  15  20  25  30  0,7 random knnItem emerging (C-SPARQL) for me (SUNS) SUNS + C-SPARQL 0,6 0,5 0,4 0,3 0,2 0,1
Evaluation - Efficiency 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany Hardware: 2.66 GHz Intel Core 2 Duo with 8 GB RAM
Evaluation – Scalability  26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany Number of concurrent users Query Latency (sec)
End-user application Attractive and functional interface Real-world dynamic data Fully based on Semantic Web technologogies RDF as common data format between heterogenous components SPARQL as query language Rigorously evaluated Effective High throughput for handling dynamic data Scalable in number of concurrent users Commercial Potential Conclusions 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany
Emanuele Della Valle Joint work with: CEFRIEL : Irene Celino, Daniele Dell ’ Aglio, Marco Balduini  SALTLUX : Tony Lee, Seonho Kim  S IEMENS : Volker Tresp, Yi Huang Any question?

BOTTARI: Location based Social Media Analysis with Semantic Web

  • 1.
    BOTTARI: Location basedSocial Media Analysis with Semantic Web Emanuele Della Valle Joint work with: CEFRIEL : Irene Celino, Daniele Dell ’ Aglio, Marco Balduini SALTLUX : Tony Lee, Seonho Kim S IEMENS : Volker Tresp, Yi Huang
  • 2.
    Watch this first:-) 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany http://www.youtube.com/watch?v=c1FmZUz5BOo
  • 3.
    An augmented realityapplication for personalized recommendation of restaurants in Seoul What have you seen? 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany
  • 4.
    Yes and no!Same use case, more “democratic” We do “reality mining” by listening to the social media Yet another ? 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany
  • 5.
    Architecture 26.10.2011 -SW Challenge 2011, ISWC 2011, Bonn, Germany out Query Rewriter Query Evaluator RDF2Matrix Plug-in Streaming Linked Data Server SOR Invoker SOR geo-spatial KB Social Media Crawler and Sentiment Miner HTTP PULL: Query Initiated PUSH: Data Initiated SPARQL androjena
  • 6.
    Sentiment Mining Precisiontests: Auto-generated rules ≈ 70% Manually-coded rules ≈ 90% Syllable kernel ≈ 50~60% Our target > 85% 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany Micropost message Morphologically Analyzable? Rule based Analysis Auto generated rules Learned documents SVMs Syllable Kernel Sentiment of the tweet Yes No
  • 7.
    SOR - Geo-SpatialKB 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany
  • 8.
    C-SPARQL and Streaming Linked Data Server 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany
  • 9.
    A machine learningframework for inductive materialization Detects interesting data patterns Predics RDF-triples i.e., which restaurant a user will tweet positively about Caractheristics Capability to deal with sparse, high-dimensional and incomplete data Multivariate latent space based approach Modularized approach for easily integrating contextual information SUNS (Statistical Unit Node Sets) 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany
  • 10.
    SELECT DISTINCT ?poi?name ?lat ?long ?numPos ?prob WHERE { ?poi a ns:NamedPlace ; ns:name ?name ; geo:lat ?lat ; geo:long ?long . FILTER (f:within_distance( 37.5 , 126.9 , ?lat, ?long, 200 )) FILTER (f:dest_point_viewing( 37.5 , 126.9 , ?lat, ?long, 90 , 200 )) { :someUser sioc:creator_of ?tweet . ?tweet twd:talksAboutPositively ?poi . WITH PROBABILITY ?prob ENSURE PROBABILITY [0.5..1) } ?poi twd:numberOfPositiveTweets ?numPos . } ORDER BY DESC(?numPos), ?prob, f:distance( 37.5 , 126.9 , ?lat, ?long) LIMIT 10 Query Processing 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany GEO-SPATIAL PROBABILISTIC STREAMING
  • 11.
    LarKC At Work26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany PULL: Query Initiated PUSH: Data Initiated SPARQL androjena Probabilistic part of the query to get personalized recommendations (the “ for me ” button in BOTTARI) Geo-Spatial part of the query to get POIs closer to user location Streaming part of the query to get trends in users' sentiment (the “ emerging ” button in BOTTARI) Input user query is split Results of the different computations are joined out Query Rewriter Query Evaluator RDF2Matrix Plug-in Streaming Linked Data Server SOR Invoker SOR geo-spatial KB Social Media Crawler and Sentiment Miner HTTP
  • 12.
    Evaluation - Efficacy26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany 5 10 15 20 25 30 0,7 random knnItem emerging (C-SPARQL) for me (SUNS) SUNS + C-SPARQL 0,6 0,5 0,4 0,3 0,2 0,1
  • 13.
    Evaluation - Efficiency26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany Hardware: 2.66 GHz Intel Core 2 Duo with 8 GB RAM
  • 14.
    Evaluation – Scalability 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany Number of concurrent users Query Latency (sec)
  • 15.
    End-user application Attractiveand functional interface Real-world dynamic data Fully based on Semantic Web technologogies RDF as common data format between heterogenous components SPARQL as query language Rigorously evaluated Effective High throughput for handling dynamic data Scalable in number of concurrent users Commercial Potential Conclusions 26.10.2011 - SW Challenge 2011, ISWC 2011, Bonn, Germany
  • 16.
    Emanuele Della ValleJoint work with: CEFRIEL : Irene Celino, Daniele Dell ’ Aglio, Marco Balduini SALTLUX : Tony Lee, Seonho Kim S IEMENS : Volker Tresp, Yi Huang Any question?

Editor's Notes