Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Interaction with Linked Data


Published on

This presentation focuses on providing means for exploring Linked Data. In particular, it gives an overview of current visualization tools and techniques, looking at semantic browsers and applications for presenting the data to the end used. We also describe existing search options, including faceted search, concept-based search and hybrid search, based on a mix of using semantic information and text processing. Finally, we conclude with approaches for Linked Data analysis, describing how available data can be synthesized and processed in order to draw conclusions.

Published in: Technology

Interaction with Linked Data

  1. 1. Interaction with Linked DataPresented by:Barry NortonMichael Meier
  2. 2. Motivation: Music!2VisualizationModuleMetadataStreaming providersPhysical WrapperDownloadsDataacquisitionR2R Transf.LD WrapperMusical ContentApplicationAnalysis &Mining ModuleLDDatasetAccessLD WrapperRDF/XMLIntegratedDatasetInterlinking CleansingVocabularyMappingSPARQLEndpointPublishingRDFaOther contentEUCLID – Interaction with Linked Data
  3. 3. Motivation: Music! (2)EUCLID – Interaction with Linked Data 3• Our aim: build a music-based portal using LinkedData technologies• So far, we have studied different mechanisms toconsume Linked Data:• Executing SPARQL queries• Dereferencing URIs• Downloading RDF dumps• Extracting RDFa data• The output of these mechanisms corresponds todata in machine-readable formatsCH 2CH 3CH 1
  4. 4. Examples of machine-readable output:Motivation: Music! (3)EUCLID – Interaction with Linked Data 4
  5. 5. Visualizations techniques are needed in order totransform the machine-readable data into this:Motivation: Music! (4)EUCLID – Interaction with Linked Data 5Source:
  6. 6. In addition, visualization techniques allow for:Motivation: Music! (5)EUCLID – Interaction with Linked Data 6• Telling a story• Engaging our pattern matchingbrain• Identifying data characteristicswhich cannot be directly inferredfrom statistical properties:• Anscombe’s quartet: 4 datasets verydifferent, but with same statistical values.Image: Donaldson, I. and Lamere P. Using Visualizations for Music DiscoveryImage: Chan W., Qu. H, Mak, W. Visualizing theSemantic Structure in Classical Musical Works.
  7. 7. Agenda1. Linked Data visualization2. Linked Data search3. Methods for Linked Data analysis7EUCLID – Interaction with Linked Data
  8. 8. LINKED DATAVISUALIZATIONEUCLID – Interaction with Linked Data 8
  9. 9. LDVisualizationTechniques• Linked Data visualization techniques should providegraphical representations of the information withinthe LD datasets• Visualization techniques should be selectedaccordingly to:– The type of data: Specific types of data should bevisualized in a certain way– The purpose of the visualization: Depending on the typeof analysis/application to employ9EUCLID – Interaction with Linked Data
  10. 10. LDVisualizationTechniques (2)EUCLID – Interaction with Linked Data 10• (Raw) RDF data: Instance data, taxonomies,ontologies, vocabularies.• Analytically extracted data: Subset ofthe data denominated region of interest (ROI),obtained via data extraction mechanisms, forexample, SPARQL queries.• Visualization abstraction: It is obtained byapplying visualization transformations to render thedata into displayable information.• View: Final result. The visual mappingtransformations obtain a graphic representation ofthe data using the selected visualization technique.• User interaction: The user interacts (click,zoom, etc.) with the visualization, which may triggera new visualization process.RDF dataAnalyticallyextracted dataVisualizationabstractionViewData extractionVisualizationtransformationVisual mappingtransformationOverview of the Linked DataVisualization processProcess partially based on: Brunetti , J.M.; Auer, S.; García, R. The Linked Data Visualization Model.(Optional)Userinteraction
  11. 11. country releasesUnited Kingdom 225United States 140Germany 30Luxembourg 29LDVisualizationTechniques (3)EUCLID – Interaction with Linked Data 11Example of the Linked DataVisualization process…RDF dataAnalyticallyextracted data…VisualizationabstractionSELECT ?country (COUNT(?release) AS ?releases)WHERE {<> foaf:made?release .?release a mo:Release ;mo:label ?label .?label foaf:based_near ?country .}GROUP BY ?countryORDER BY DESC(?releases)Data extractionSPARQL query: Retrieve number of releases percountry of The Beatles#widget : HeatMap |input = country_code |output = {{ releases }}Visualizationtransformationcountry_code releasesGB 225US 140DE 30LU 29?country_code2 := REPLACE(str(?country), "", "", "i”)?country_code := REPLACE(?country_code2, "%", "", "i")Formatting the names of the countriesView Visual mappingtransformationSelecting the visualization technique (input, output)Can be performed in a single step… …
  12. 12. LDVisualizationTechniques (3)EUCLID – Interaction with Linked Data 12Example of the Linked DataVisualization processView
  13. 13. Challenges forLinked DataVisualizationEUCLID – Interaction with Linked Data 13• Enabling user interaction– Users must be able to navigate through the data by exploiting theconnections between Linked Data resources– The user might edit the underlying data to enrich it by:• Creating additional metadata• Highlighting or correcting errors• Validating data• Supporting data reusability– The output (the plotted data or the visualization itself) might beencoded using standard ontologies and vocabularies• Scalability– Linked Data visualization techniques should support the display oflarge amount of data in an efficient way
  14. 14. Challenges forLinked Open DataVisualizationEUCLID – Interaction with Linked Data 14• Extracting data from different repositories– A Linked Data set might be partitioned into several repositories– The region of interest (ROI) might include data from different datasets, requiring the access to distributed repositories• Handling heterogeneous data– The same data (concepts) might be modeled differently, for example,using different vocabularies– Certain values might have different formats, for example, datesrepresented as DD-MM-YYYY, MM-DD-YYYY or just YYYY• Dealing with missing values– Due to the semi-structuredness of Linked Data, some instances mighthave missing values for certain properties
  15. 15. Classification ofVisualizationTechniques15EUCLID – Interaction with Linked DataTask Visualization techniquesComparison of attributes /values• Bar/column and pie chart• Line charts• HistogramAnalysis of relationshipsand hierarchies• Graph• Arc diagram• Matrix• Node-link visualizations• Space-filling techniques: Treemaps, icicles and sunburst,circle packing and rose diagramsAnalysis of temporal orgeographical events• Timeline• MapsAnalysis of multi-dimensional data• Parallel coordinates• Radar/star chart• Scatter plot
  16. 16. Bar/column chartAllows the comparison of values ofdifferent categories.Pie chartUseful for performing comparisonof percentages or proportions.Comparison ofAttributes /Values16EUCLID – Interaction with Linked DataLine chartAllows visualizing data as a series ofdata points, where the measurementpoints (x-axis) are ordered.HistogramGraphical representation of thedistribution of the data.Image source: source: http://musicbrainz.fluidops.netImage source: source:
  17. 17. Arc diagramThe nodes are displayed in onedimension, and the arcs representthe connections.Analysis ofRelationships and HierarchiesGraphThe data entries are represented asnodes and the links as edges.17EUCLID – Interaction with Linked DataAdjacency Matrix diagramThe nodes are displayed as rows andcolumns, and the links between thenodes are entries in the matrix.Node-link visualizationsThe data is organized in hierarchies.Source of images:
  18. 18. Icicles and sunburstHierarchies are represented byadjacencies.Analysis ofRelationships and Hierarchies (2)TreemapsSubdivide area into rectangles.18EUCLID – Interaction with Linked DataCircle-packingContainment is used to represent thehierarchies.Rose diagramsAreas are equal angles and the datais represented bythe extension ofthe area.Source of images:
  19. 19. Analysis of Temporal orGeographical EventsTimeline19EUCLID – Interaction with Linked DataMapsSource: mapsAggregate data bygeographical areaLocation mapsDisplay geo-points on a mapDorling cartogramsAggregate data and replaceeach area with a circleDiscrete data points in time Continuous data in timeSource: http//musicbrainz.fluidops.netSource: Google Map API Source: http//
  20. 20. Scatter plotUseful for performing comparisonof percentages or proportions.Analysis ofMultidimensional DataRadar/star chartDisplays multivariate data as a two-dimensional chart. The axescorrespond to thevariables.20EUCLID – Interaction with Linked DataParallel coordinatesAllows visualizing high-dimensional data.Each vertical axis denotes a dimension, anda multidimensional point is represented asa polyline with vertices on the axes.Source:
  21. 21. OtherVisualizationTechniquesEUCLID – Interaction with Linked Data 21• Text-based visualizations: tag clouds• Some of the previously presented techniques can becombined to produce more complex datavisualizationsPhrase Net of Beatles LyricsDBpedia music genresSource: http://www.wordle.netSource:
  22. 22. • Get an overview of the data• Identification of relevant resources, classes or properties indatasets• Learning about certain underlying characteristics of the data,e.g., vocabularies or ontologies• Detecting missing links between nodes in an RDF graph• Discovering new paths between nodes in an RDF graph• Identifying hidden patterns in the data• Finding errors or atypical values (outliers)22EUCLID – Interaction with Linked DataApplications of Linked DataVisualization Techniques
  23. 23. Linked DataVisualizationTool RequirementsThe requirements for visualization tools that consume Linked Data can besummarized as follows:• Data navigation and exploration capabilities in order to understand thestructure and the content• Exploiting data structures:• Links to visualize hierarchies or graphs• Multi-dimensional• User interaction:• Basic and advanced querying• Filtering values• Interactive UI: responsive to the user input• Publication/syndication of the graphical representation of the data• Data extraction in order to export the data such that can be reused bythird parties23EUCLID – Interaction with Linked Data
  24. 24. Linked DataVisualizationToolTypes1. LD browsers with text-based representation• Dereference URIs to retrieve the resource description• Use a textual representation of LD resources• Display adequately texts and images• Mainly support exploratory browsing and knowledge discovery2. LD and RDF browsers with visualization options• Exploit picture, graphics, images and other visualrepresentations of the data• Support user interaction: allows for querying, filtering andjumping between resources• Suitable for browsing and knowledge discovery as well asanalytic activities24EUCLID – Interaction with Linked Data
  25. 25. Linked DataVisualizationToolTypes (2)3.Visualization toolkits• Frameworks providing a wide range of visualization techniques• General toolkits support LD visualization by applying a set oftransformations of the data• Some toolkits are specially designed to consume LD4. SPARQL visualization• These tools allow transforming the output of SPARQL queriesinto graphics• Contact SPARQL endpoints in order to evaluate the query• Suitable for analytical activities25EUCLID – Interaction with Linked Data
  26. 26. Linked DataVisualizationToolTypes (3)26EUCLID – Interaction with Linked DataLD browsers with text-based presentationsSig.maSindiceOpenLink RDF BrowserMarblesDisco Hyperdata BrowserPiggy Bank (SIMILE)Zitgist DataVieweriLODURI BurnerDipper – Talis Platform BrowserLD and RDF browserswith visualizationoptionsTabulatorIsaVizOpenLink Data ExplorerRDF GravityRelFinderDBpedia MobileLESSSIMILE ExhibitHaystackFoaF ExplorerHumboldtLENANoadsterVisualization toolkitsLinked Data tools:Information WorkbenchVisual RDF (by Graves)LOD LiveLOD VisualizationData-Driven Documents (D3)NetworkXMany EyesTableauPrefuseSPARQL visualizationInformation WorkbenchGoogle Visualization APISPARQL package for RGruff (for AllegroGraph)Linked Data:General data:
  27. 27. Linked DataVisualizationExamples (1)EUCLID – Interaction with Linked Data 27Sig.maSource: information fromdifferent LD sourcesKeywordsearchDisplaysvalues perpredicateDisplaysthe sourcefor eachvalue
  28. 28. Linked DataVisualizationExamples (2)EUCLID – Interaction with Linked Data 28Sig.maSource: perpredicate:May include (redundant)information in differentlanguages, for example: annésand annoSummary:• lists all the triples, and groupthem per predicate• Useful for browsing predicates andvalues within data sets• The meaning of the values is not evidentURIs are clickable, allowingnavigation through RDFresources
  29. 29. Linked DataVisualizationExamples (3)EUCLID – Interaction with Linked Data 29SindiceKeywordsearchFilteringper typeofdocumentRetrieves linksto documentsAllows accessingcache documentsAllows inspectingresourcesSource:
  30. 30. Linked DataVisualizationExamples (4)EUCLID – Interaction with Linked Data 30SindiceBoth interfaces display theset of triples related to theinspected resourceCache triplesLive triples
  31. 31. Linked DataVisualizationExamples (5)EUCLID – Interaction with Linked Data 31Information Workbench• Demo available at:• Displays human-readable content about Linked Dataresources• Supports visualization techniques (different types of charts,maps, timelines, etc.) to plot results from SPARQL queries• Allows the user to interact with the displayed data
  32. 32. Linked DataVisualizationExamples (6)EUCLID – Interaction with Linked Data 32Information Workbench: Browsing a music artist(1) Search options (2) Search results
  33. 33. Linked DataVisualizationExamples (7)EUCLID – Interaction with Linked Data 33Information Workbench: Browsing a music artist(3) Browsing the selected resource
  34. 34. Linked DataVisualizationExamples (8)EUCLID – Interaction with Linked Data 34Information Workbench: Visualization techniques(3) Browsing the selected resource
  35. 35. Linked DataVisualizationExamples (9)EUCLID – Interaction with Linked Data 35Information Workbench: User interactionLD visualizations must support navigation through the dataSource:
  36. 36. Linked DataVisualizationExamples (9)EUCLID – Interaction with Linked Data 36Information Workbench: SPARQLVisualizationImplements widgets which allow:• Retrieving ROI via SPARQL queries• Selecting the appropriate visualization technique• Configuring parameters of the visualization
  37. 37. Linked DataVisualizationExamples (10)EUCLID – Interaction with Linked Data 37Information Workbench: SPARQL visualizationSELECT ?release((SUM(xsd:double(?duration/60000))) AS ?avg)WHERE {<>foaf:made ?release .?release mo:record ?record .?record mo:track ?track .?track mo:duration ?duration .}GROUP BY ?releaseORDER BY DESC(?avg)LIMIT 10SPARQLQueryResult setTop ten The Beatles releases according to the sum of track durations in minutes
  38. 38. Linked DataVisualizationExamples (11)EUCLID – Interaction with Linked Data 38Information Workbench: SPARQL visualizationTop ten The Beatles releases according to the sum of track durations in minutesWidgetVisualization: Bar chart{{#widget: BarChart |query =SELECT (COUNT(?Release) AS ?COUNT)?label WHERE {<> foaf:made ?Release.?Release rdf:type mo:Release .?Release dc:title ?label .}GROUP BY ?labelORDER BY DESC(?COUNT)LIMIT 20| settings = Settings:barvertical_mb| asynch = true| input = label| output = COUNT| height = 300’}}
  39. 39. Linked DataVisualizationExamples (12)EUCLID – Interaction with Linked Data 39Information Workbench: SPARQL visualizationTop ten The Beatles releases according to the sum of track durations in minutesOther visualizations of the same result set …Line chart:Pie chart:
  40. 40. Linked DataVisualizationExamples (13)EUCLID – Interaction with Linked Data 40Information Workbench: Automated Widget SuggestionBar chartLine chartPie chart12 3TablePivotviewSelect a suggested visualization Visualizationautomatically built
  41. 41. Linked DataVisualizationExamples (14)EUCLID – Interaction with Linked Data 41Other toolsSource: Source: http://lodvisualization.appspot.comLODVisualizationLOD live• Graph visualizations• Interactive UI (the graph can beexpanded by clicking on the nodes)• Live access to SPARQL endpoints• Hierarchy visualizations: treemaps and trees• Live access to SPARQL endpoints(supporting JSON and SPARQL 1.1)
  42. 42. LinkingOpen Data CloudVisualization (1)42EUCLID – Interaction with Linked Data“The Linking Open Data cloud diagram”by Richard Cyganiak and Anja JentzschSource:• The nodes correspondto Linked Data sets• The edges representconnections betweenLinked Data sets• The size of the nodes isproportional to thenumber of triples ineach data set• The datasets arecategorized byknowledge domainsrepresented with colors
  43. 43. LinkingOpen Data CloudVisualization (2)43EUCLID – Interaction with Linked DataImage source:“Linked Open Data Cloud” generated by Gephis• The central cluster (green) displays DBpedia as a central focus• The size of the nodes reflect the size of the datasets• The length of the connections encode information about the data structureSource: A. Dadzie and M. Rowe. Approaches to Visualizing Linked Data: A Survey. 2011
  44. 44. LinkingOpen Data CloudVisualization (3)44EUCLID – Interaction with Linked Data“Linked Open Data Graph” by ProtovisSource:• The data to be displayed areretrieved using the CKAN API• The nodes represent Linked Datasets available in the Data Hub “lod-cloud” group• The size of the nodes is proportionalto the data set size• Edges are connections between datasets• The colors reflect the CKAN ratingand the intensity of the color reflectsthe number of received ratings• The nodes can be clicked to go to thedata set CKAN page
  45. 45. LD ReportingEUCLID – Interaction with Linked Data 45• Visualizations techniques are used in the creation of reportsincluded in data monitoring and management solutions• Provides and overview of the dataset by generating a low-leveldescriptive analysis:• Quantitative information about the dataset• Users may interact with the data via dashboards• Some systems support this feature over structured data:• Google Webmaster Tools (• Information Workbench (• eCloudManager (
  46. 46. GoogleWebmasterTool:Structure Data Dashboard (1)EUCLID – Interaction with Linked Data 46• Provides to webmasters information about the structureddata embedded in their websites (and recognized by Google)• The dashboard three levels:i. Site-level view: aggregates the data by classes defined inthe vocabulary schemaii. Item-type-level view: provides details per page for eachtype of resourceiii. Page-level view: shows the attributes of every type ofresource on a given web page
  47. 47. GoogleWebmasterTool:Structure Data Dashboard (2)EUCLID – Interaction with Linked Data 47Source: view
  48. 48. GoogleWebmasterTool:Structure Data Dashboard (3)EUCLID – Interaction with Linked Data 48Source: viewSite-level view
  49. 49. LINKED DATA SEARCHEUCLID – Interaction with Linked Data 49
  50. 50. Semantic Search ProcessUsing semantic models for the search process50EUCLID – Interaction with Linked DataFacetedSearchSemanticSearchImage based on: Tran, T., Herzig, D., Ladwig, G. SemSearchPro- Using semantics through the search processData graphs QueryResultvisualization/presentationUser query(e.g. keywords, NL)Query visualization(Optional) UserSystemRefinementPresentationAnalysisPresentation /RankingGraph matchingEntity Extraction /Semantic query analysis
  51. 51. Image Source: http://musicontology.comSemantic Search: Example (1)51EUCLID – Interaction with Linked DataUser query(NL)“songs written by members of the beatles”Entity extraction:Query expansion:songtrackmelodytunesynonymmo:TrackCandidates…song member (of)written by (the) beatlesEntity mapping:
  52. 52. Semantic Search: Example (2)52EUCLID – Interaction with Linked DataUser query(NL)“songs written by members of the beatles”Entity extraction:Query expansion:writercomposercreatorsynonymmo:composerImage Source: http://musicontology.comCandidateswritten byinverse of…song member (of)written by (the) beatlesEntity mapping:
  53. 53. Semantic Search: Example (3)53EUCLID – Interaction with Linked DataUser query(NL)“songs written by members of the beatles”Entity extraction: song member (of)written by (the) beatlesQuery expansion:member (of)mo:member_ofmo:memberinverse ofImage Source: http://musicontology.comEntity mapping:
  54. 54. Semantic Search: Example (4)54EUCLID – Interaction with Linked DataUser query(NL)“songs written by members of the beatles”Entity extraction: song member (of)written by (the) beatlesEntity mapping:(the) beatlesCandidatesBeatles(Book)The Beatles(Music Group)Beatle(Animal)Beatle(Automobile)How to identify the right “Beatle”? Examine the context (Contextual Analysis)
  55. 55. Semantic Search: Example (5)55EUCLID – Interaction with Linked DataUser query(NL)“songs written by members of the beatles”Entity extraction: song member (of)written by (the) beatlesEntity mapping:(the) beatlesContextual Analysisfoaf:Agentmo:composermo:Trackmo:MusicArtistrdfs:subClassOfmo:MusicGroupmo:memberrdfs:subClassOfThis subgraph is part of the queryThe Beatles(Music Group)dbpedia:The_BeatlesEntity mapping:
  56. 56. Semantic Search: Example (6)56EUCLID – Interaction with Linked DataUser query(NL)“songs written by members of the beatles”Entity extraction: song member (of)written by (the) beatles?yMo:Track?xdbpedia:The_BeatlesResults(I want to) Come HomeAngel in DisguiseAnother Day…Answers presented to the userThe results could be rankedQueryfoaf:Agent
  57. 57. Semantic Search• Aims at understanding the meaning of the resources specifiedin the query• Different approaches to exploit semantics:• Query expansion using ontologiesSince ontologies represent knowledge about specific domains, they canbe used to expand the query by incorporating related ontology terms intothe query.• Contextual analysisIn LD, this approach may explore the resources specified in the query and theiradjacent nodes in the RDF graph. Mainly applied to disambiguate query terms.• ReasoningIn some cases, the answer to a specific query is not explicitly contained in thedata, but it can be computed by using reasoning methods.57EUCLID – Interaction with Linked Data
  58. 58. Semantic Search & Linked Data58EUCLID – Interaction with Linked DataComponent Semantic search SPARQL queryKeyword or NL /concept matchingPerforms entity extractionand matching to formalconceptsNot supportedFuzzyconcepts/relation/logicsAllows the application offuzzy qualifiers as queryconstrainsNot supportedGraph patterns Uses the context andother semanticinformation to locateinteresting sub-graphsApplies pattern matchingPath discovery Finds new interestinglinks that may lead toadditional informationNot supportedSemantic Search vs. SPARQL query
  59. 59. Semantic Search: Google (1)59EUCLID – Interaction with Linked DataInput: query in NLOutput: List of answersGoogle performs semantic search on certain entities and queries!
  60. 60. Semantic Search: Google (2)60EUCLID – Interaction with Linked DataInput: question in NLOutput: List of web pagesranked using the algorithmGoogle PageRank to display themost relevant pages first
  61. 61. Semantic Search: DuckDuckGo (1)61EUCLID – Interaction with Linked DataInput: question in NLOutput: List of answers
  62. 62. Semantic Search: DuckDuckGo (2)62EUCLID – Interaction with Linked DataPerforms disambiguation of thequery terms.The 45 suggestions are grouped byclasses according to theircorresponding knowledge domain:This approach is denominatedFaceted Search
  63. 63. Faceted Search: ExampleInformationWorkbench: Searching for artists in categories63EUCLID – Interaction with Linked DataFacetFacetFacetSource: of artists
  64. 64. Faceted Search• Facets = properties• Suitable for browsing multi-dimensional taxonomies based onthe search attributes• Allows user to explore the data:• User submits a (keyword) query• Faceted system dynamically identifies the relevant facets (properties)for the given query and the constrains (values of those properties), anddisplay the search results• User may “drill down” by selecting specific constrains to the searchresults• Information can be accessed and ranked in multiple ways64EUCLID – Interaction with Linked Data
  65. 65. Faceted Search (2)Challenges for supporting Faceted Search• Identifying which facets to surface:• In heterogeneous datasets, data entries may have different facets• Dynamically identify the most appropriate facets for each query• Ordering the facets depending on the relevance to the query• Computing previews:• Accurately predicting counts, without examining all the results• Offering facet preview to give users an idea of what to expect65EUCLID – Interaction with Linked DataSource: Teevan , J., Dumais, S., Gutt. Z. Challenges for Supporting Faceted Search in Large, HeterogeneousCorpora like the Web
  66. 66. Faceted Search: LD Example (1)FacetedDBLP• Retrieves information from the DBLP collection• Shows the result set with different facets:• Publication years• Authors• Conferences• It is implemented upon the DBLP++ dataset (enhancement ofDBLP including additional keywords and abstracts):• DBLP ++ is stored in a MySQL database• Uses D2R server to consume RDF triples66EUCLID – Interaction with Linked Data
  67. 67. Faceted Search: LD Example (2)67EUCLID – Interaction with Linked DataInput: “crowdsourcing”Facets485 resultsFacetedDBLP
  68. 68. Classification of Search Engines68EUCLID – Interaction with Linked DataSemanticSearchSystemsFacetedSearchSystemsGoogle(GKG)BingKIMsig.maLOD cloud cache/facetLongwellmSpaceExhibit (SIMILE)PoolParty SemanticSearch ServerDuckDuckGoHakiaSenseBotPowerSetDeepDiveKosmixFactiblesLexxeInformation Workbench
  69. 69. Searching for Semantic Data69EUCLID – Interaction with Linked DataSearch for• Ontologies• Vocabularies• RDF documents
  70. 70. Semantic Data Search Engines (1)EUCLID – Interaction with Linked Data 70Searching for ontologiesSwoogle searchKeyword search
  71. 71. Semantic Data Search Engines (2)Searching for vocabularies: LOV Portal• Allows to search properties, classes or vocabularies inthe Linked Open Vocabulary (LOV) catalog• The LOV search engine implement faceted search on:• The knowledge domain• The role of the resource matched from the input query• The vocabulary containing the resource• Results are ranked according to a score considering:• Relevancy to the query (string)• Element labels matched importance• Number of LOV vocabularies that refer to the element71EUCLID – Interaction with Linked Data
  72. 72. Semantic Data Search Engines (3)72EUCLID – Interaction with Linked DataFacets84 resultsInput: “artist”CH 3Searching for vocabularies: LOV Portal
  73. 73. Semantic Data Search Engines (4)EUCLID – Interaction with Linked Data 73Searching for documents http://sindice.comSemantic Web Search Engine Sindice
  74. 74. METHODS FOR LINKED DATAANALYSISEUCLID – Interaction with Linked Data 74
  75. 75. Features of Data Analysis75EUCLID – Interaction with Linked DataStatistical analysis• Allows describing the data via Exploratory Data Analysis (EDA) methods• Includes statistical inference and predictionData aggregation & filtering• One of the first steps in data analysis is pre-processing in order to select theappropriate data to studyVisualization techniques can be built on top of these as part of data analysisMachine learning• Focuses on prediction• Combines Artificial Intelligence and Statistics• Includes supervised and unsupervised learning (not covered in this course)
  76. 76. LD Data Aggregation & FilteringEUCLID – Interaction with Linked Data 76• Data aggregation refers to merging/summarizing severalvalues into a single a one• Filtering allows retrieving relevant data properties andselecting a particular range of data values• SPARQL is able to perform these features via SELECT queriesas follows:Features SPARQL capabilitiesAggregation Combining aggregate functions (COUNT, SUM, AVG, … ) andGROUP BY operatorFiltering Combining projection, FILTER and HAVING operators
  77. 77. LD Statistical AnalysisEUCLID – Interaction with Linked Data 77• Statistical analysis supports descriptive and predictiveoperations• SPARQL supports some descriptive operations (average,maximum, minimum) but does not offer more sophisticatedstatistical features like:• Fitting distributions• Linear regressions• Analysis of variance• …• Some approaches are able to consume data retrieved fromSPARQL endpoints:– “R for SPARQL” by Willen Robert van Hage & Tomi Kauppinen– “Performing Statistical Methods on Linked Data” by Zapilko & Mathiak
  78. 78. R – Statistical ComputingEUCLID – Interaction with Linked Data 78• R is a language and environment for statistical computing• R provides a wide variety of statistical and graphicaltechniques• Linear and nonlinear modeling• Classical statistical tests• Time-series analysis• Classification (Machine Learning)• Clustering (Machine Learning)• Extensible with further functionalities• R is available as Free Software (under the terms of theGNU general public license)
  79. 79. Statistical Analysis with REUCLID – Interaction with Linked Data 79
  80. 80. R for SPARQLEUCLID – Interaction with Linked Data 80• The R for SPARQL Package enables to:• Connect a SPARQL endpoint over HTTP• Pose a SELECT query or an UPDATE operation (LOAD, INSERT, DELETE)• If given a SELECT query, it returns the results as a data frame• The results can directly be mapped and visualized• Posing requests:• If the parameter query is given, it is assumed that the input is a SELECT queryand a GET request will be performed to get the results from the URL of theendpoint• If the parameter update is given, it is assumed that the input is an UPDATEoperation and a POST request will be submit to the URL of the endpoint.Nothing is returnedSource:
  81. 81. R for SPARQL: Example (1)EUCLID – Interaction with Linked Data 811. Download the R package and load it:• library(SPARQL)• Library(sp) #user for plotting spatial data2. Define the endpoint with the triples• endpoint = ""3. Define the query• q = "SELECT ?cell ?row ?col ?polygon ?DEFOR_2002WHERE {?cell a <> ;<> ?row ;<> ?col;<> ?polygon .<>?DEFOR_2002 .}"Source:
  82. 82. R for SPARQL: Example (2)EUCLID – Interaction with Linked Data 824. Link the result to an object• res <- SPARQL(endpoint,q)$results5. Handling the results• res$row <- -res$row• coordinates(res) <- ~col - row6. Chose the graphical format and plot the results• spplot(res,"DEFOR_2002",col.regions=rev(heat.colors(17))[-1], at=(0:16)/100, main="relativedeforestation per pixel during 2002")Source:
  83. 83. R for SPARQL: Example (3)EUCLID – Interaction with Linked Data 83Source:
  84. 84. Machine LearningEUCLID – Interaction with Linked Data 84• Machine Learning techniques allow to extract interestinginformation from data sources, and can be used to discoverhidden patterns within datasets by generalizing from examples• Different ML approaches can be applied:• Clustering: groups similar data into data partitions called clusters• Association rule learning: discovers relations between variables• Decision tree learning: analyses observations to build a predictivemodel represented as a tree• Many others …• Weka is a Data Mining framework commonly used to apply MLon tabular data:–
  85. 85. Machine Learning on LDEUCLID – Interaction with Linked Data 85Challenges for applying Machine Learning on LD• LD heterogeneity introduces noise to the data:– Same LD resources, different URIs– Predicates with similar semantics, but different constraints• The data is not independent and identically distributed (iid):– It does not consist of only one type of objects– The entities are related to each other• LD rarely contains negative examples needed for MLalgorithms:– For example, owl:differentFromSource
  86. 86. Applications ofMachine Learning on LDEUCLID – Interaction with Linked Data 86• Node ranking:– Ranking nodes according to their relevance for a query• Link prediction:– Infer edges between LD resources– Predict the new edges that will be added to the RDF graph• Entity resolution:– Determine whether two URIs correspond to the same real-world object• Taxonomy learning:– Infer taxonomies or concept hierarchies from a givenvocabulary or ontology
  87. 87. SummaryEUCLID – Interaction with Linked Data 87• Linked Data visualization techniques:• Visualizations must be chosen according the type of the data• Wide variety of tools supporting SPARQL results’ visualization• Might be used in dashboards for supporting administrative tasks• Linked Data search• Semantic search: exploits the meaning of user queries (NL or set ofkeywords) to present useful results• Faceted search: allows browsing multi-dimensional data• Linked Data analysis:• Includes data manipulation such as aggregation & filtering• Applies statistical methods to get a better understanding of the data• Machine Learning techniques can be applied for predictive analysis• Visualization techniques can be built on top of the previous features
  88. 88. For exercises, quiz and further material visit our website:EUCLID - Providing Linked Data 88@euclid_project euclidproject euclidprojecthttp://www.euclid-project.euOther channels:eBook Course
  89. 89. Acknowledgements• Alexander Mikroyannidis• Alice Carpentier• Andreas Harth• Andreas Wagner• Andriy Nikolov• Barry Norton• Daniel M. Herzig• Elena Simperl• Günter Ladwig• Inga Shamkhalov• Jacek Kopecky• John Domingue
• Juan Sequeda• Kalina Bontcheva• Maria Maleshkova• Maria-Esther Vidal• Maribel Acosta• Michael Meier• Ning Li• Paul Mulholland• Peter Haase• Richard Power• Steffen Stadtmüller89