Presented in : JIST2015, Yichang, China
Prototype: http://rc.lodac.nii.ac.jp/rdf4u/
Video: https://www.youtube.com/watch?v=z3roA9-Cp8g
Abstract: It is known that Semantic Web and Linked Open Data (LOD) are powerful technologies for knowledge management, and explicit knowledge is expected to be presented by RDF format (Resource Description Framework), but normal users are far from RDF due to technical skills required. As we learn, a concept-map or a node-link diagram can enhance the learning ability of learners from beginner to advanced user level, so RDF graph visualization can be a suitable tool for making users be familiar with Semantic technology. However, an RDF graph generated from the whole query result is not suitable for reading, because it is highly connected like a hairball and less organized. To make a graph presenting knowledge be more proper to read, this research introduces an approach to sparsify a graph using the combination of three main functions: graph simplification, triple ranking, and property selection. These functions are mostly initiated based on the interpretation of RDF data as knowledge units together with statistical analysis in order to deliver an easily-readable graph to users. A prototype is implemented to demonstrate the suitability and feasibility of the approach. It shows that the simple and flexible graph visualization is easy to read, and it creates the impression of users. In addition, the attractive tool helps to inspire users to realize the advantageous role of linked data in knowledge management.
Semantic Variation Graphs the case for RDF & SPARQLJerven Bolleman
Presentation given to the GA4GH dataworking group. It starts with an introduction to what RDF is followed by how one can model genomic variation graphs in RDF. Then we show how one can use SPARQL to query this data.
Knowledge Discovery tools using Linked Data techniques - {resentation for the Linked Data 4 Knowledge Discovery Workshop at ECML/PKDD2015 conference - http://events.kmi.open.ac.uk/ld4kd2015/ -
Semantic Variation Graphs the case for RDF & SPARQLJerven Bolleman
Presentation given to the GA4GH dataworking group. It starts with an introduction to what RDF is followed by how one can model genomic variation graphs in RDF. Then we show how one can use SPARQL to query this data.
Knowledge Discovery tools using Linked Data techniques - {resentation for the Linked Data 4 Knowledge Discovery Workshop at ECML/PKDD2015 conference - http://events.kmi.open.ac.uk/ld4kd2015/ -
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jConnected Data World
Dr. Jesús Barrasa's slides from his talk at Connected Data London. Jesús, who is a senior field engineer at Neo4j presented how semantic web principles can be used in a graph database.
A Generic Language for Integrated RDF Mappings of Heterogeneous Dataandimou
Despite the significant number of existing tools, incorporating data from multiple sources and didifferent formats into the Linked Open Data cloud remains complicated. No mapping formalization exists to define how to map such heterogeneous sources into RDF in an integrated and interoperable fashion.
This paper introduces the RML mapping language, a generic language based on an extension over R2RML, the W3C standard for mapping relational databases into RDF. Broadening R2RML's scope, the language becomes source-agnostic and extensible, while facilitating the definition of mappings of multiple heterogeneous sources. This leads to higher integrity within datasets and richer interlinking among resources.
Property graph vs. RDF Triplestore comparison in 2020Ontotext
This presentation goes all the way from intro "what graph databases are" to table comparing the RDF vs. PG plus two different diagrams presenting the market circa 2020
This invited keynote at the Social Computing Track at WI-IAT21 gives an introduction to Knowledge Graphs and how they are built collaboratively by us. It gives also presents a brief analysis of the links in Wikidata.
Knowledge graph embeddings are a mechanism that projects each entity in a knowledge graph to a point in a continuous vector space. It is commonly assumed that those approaches project two entities closely to each other if they are similar and/or related. In this talk, I give a closer look at the roles of similarity and relatedness with respect to knowledge graph embeddings, and discuss how the well-known embedding mechanism RDF2vec can be tailored towards focusing on similarity, relatedness, or both.
Over the last years, the Semantic Web has been growing steadily. Today, we count more than 10,000 datasets made available online following Semantic Web standards. Nevertheless, many applications, such as data integration, search, and interlinking, may not take the full advantage of the data without having a priori statistical information about its internal structure and coverage. In fact, there are already a number of tools, which offer such statistics, providing basic information about RDF datasets and vocabularies. However, those usually show severe deficiencies in terms of performance once the dataset size grows beyond the capabilities of a single machine. In this paper, we introduce a software component for statistical calculations of large RDF datasets, which scales out to clusters of machines. More specifically, we describe the first distributed inmemory approach for computing 32 different statistical criteria for RDF datasets using Apache Spark. The preliminary results show that our distributed approach improves upon a previous centralized approach we compare against and provides approximately linear horizontal scale-up. The criteria are extensible beyond the 32 default criteria, is integrated into the larger SANSA framework and employed in at least four major usage scenarios beyond the SANSA community.
Hacktoberfest 2020 'Intro to Knowledge Graph' with Chris Woodward of ArangoDB and reKnowledge. Accompanying video is available here: https://youtu.be/ZZt6xBmltz4
Linked Data Experiences at Springer NatureMichele Pasin
An overview of how we're using semantic technologies at Springer Nature, and an introduction to our latest product: www.scigraph.com
(Keynote given at http://2016.semantics.cc/, Leipzig, Sept 2016)
Efficient Practices for Large Scale Text Mining ProcessOntotext
Text mining is a need when managing large scale textual collections. It facilitates access to, otherwise, hard to organise unstructured and heterogeneous documents, allows for extraction of hidden knowledge and opens new dimensions in data exploration.
In this webinar, Ivelina Nikolova, PhD, shares best practices and text analysis examples from successful text mining process in domains like news, financial and scientific publishing, pharma industry and cultural heritage.
The RDF Report Card: Beyond the Triple CountLeigh Dodds
My talk from the Semtech Biz conference in London.
I argued that it is time to move beyond discussing size of datasets and encourage a more nuanced view to understand quality and utility.
The RDF Report Card is offered as one simple, high-level visualization.
Adventures in Linked Data Land (presentation by Richard Light)jottevanger
"Adventures in Linked Data Land: bringing RDF to the Wordsworth Trust" is a paper given by RIchard Light (http://uk.linkedin.com/pub/richard-light/a/221/ba5) to a Linked Data meeting run by the Collections Trust in February 2010. He runs through the basics of LD, how it relates to cultural heritage, and some of his experiments with it, specifically with the data of the Wordsworth Trust, finally listing a series of challenges that face museums in trying to get on board the Linked Data bus.
The Power of Semantic Technologies to Explore Linked Open DataOntotext
Atanas Kiryakov's, Ontotext’s CEO, presentation at the first edition of Graphorum (http://graphorum2017.dataversity.net/) – a new forum that taps into the growing interest in Graph Databases and Technologies. Graphorum is co-located with the Smart Data Conference, organized by the digital publishing platform Dataversity.
The presentation demonstrates the capabilities of Ontotext’s own approach to contributing to the discipline of more intelligent information gathering and analysis by:
- graphically explorinh the connectivity patterns in big datasets;
- building new links between identical entities residing in different data silos;
- getting insights of what type of queries can be run against various linked data sets;
- reliably filtering information based on relationships, e.g., between people and organizations, in the news;
- demonstrating the conversion of tabular data into RDF.
Learn more at http://ontotext.com/.
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jConnected Data World
Dr. Jesús Barrasa's slides from his talk at Connected Data London. Jesús, who is a senior field engineer at Neo4j presented how semantic web principles can be used in a graph database.
A Generic Language for Integrated RDF Mappings of Heterogeneous Dataandimou
Despite the significant number of existing tools, incorporating data from multiple sources and didifferent formats into the Linked Open Data cloud remains complicated. No mapping formalization exists to define how to map such heterogeneous sources into RDF in an integrated and interoperable fashion.
This paper introduces the RML mapping language, a generic language based on an extension over R2RML, the W3C standard for mapping relational databases into RDF. Broadening R2RML's scope, the language becomes source-agnostic and extensible, while facilitating the definition of mappings of multiple heterogeneous sources. This leads to higher integrity within datasets and richer interlinking among resources.
Property graph vs. RDF Triplestore comparison in 2020Ontotext
This presentation goes all the way from intro "what graph databases are" to table comparing the RDF vs. PG plus two different diagrams presenting the market circa 2020
This invited keynote at the Social Computing Track at WI-IAT21 gives an introduction to Knowledge Graphs and how they are built collaboratively by us. It gives also presents a brief analysis of the links in Wikidata.
Knowledge graph embeddings are a mechanism that projects each entity in a knowledge graph to a point in a continuous vector space. It is commonly assumed that those approaches project two entities closely to each other if they are similar and/or related. In this talk, I give a closer look at the roles of similarity and relatedness with respect to knowledge graph embeddings, and discuss how the well-known embedding mechanism RDF2vec can be tailored towards focusing on similarity, relatedness, or both.
Over the last years, the Semantic Web has been growing steadily. Today, we count more than 10,000 datasets made available online following Semantic Web standards. Nevertheless, many applications, such as data integration, search, and interlinking, may not take the full advantage of the data without having a priori statistical information about its internal structure and coverage. In fact, there are already a number of tools, which offer such statistics, providing basic information about RDF datasets and vocabularies. However, those usually show severe deficiencies in terms of performance once the dataset size grows beyond the capabilities of a single machine. In this paper, we introduce a software component for statistical calculations of large RDF datasets, which scales out to clusters of machines. More specifically, we describe the first distributed inmemory approach for computing 32 different statistical criteria for RDF datasets using Apache Spark. The preliminary results show that our distributed approach improves upon a previous centralized approach we compare against and provides approximately linear horizontal scale-up. The criteria are extensible beyond the 32 default criteria, is integrated into the larger SANSA framework and employed in at least four major usage scenarios beyond the SANSA community.
Hacktoberfest 2020 'Intro to Knowledge Graph' with Chris Woodward of ArangoDB and reKnowledge. Accompanying video is available here: https://youtu.be/ZZt6xBmltz4
Linked Data Experiences at Springer NatureMichele Pasin
An overview of how we're using semantic technologies at Springer Nature, and an introduction to our latest product: www.scigraph.com
(Keynote given at http://2016.semantics.cc/, Leipzig, Sept 2016)
Efficient Practices for Large Scale Text Mining ProcessOntotext
Text mining is a need when managing large scale textual collections. It facilitates access to, otherwise, hard to organise unstructured and heterogeneous documents, allows for extraction of hidden knowledge and opens new dimensions in data exploration.
In this webinar, Ivelina Nikolova, PhD, shares best practices and text analysis examples from successful text mining process in domains like news, financial and scientific publishing, pharma industry and cultural heritage.
The RDF Report Card: Beyond the Triple CountLeigh Dodds
My talk from the Semtech Biz conference in London.
I argued that it is time to move beyond discussing size of datasets and encourage a more nuanced view to understand quality and utility.
The RDF Report Card is offered as one simple, high-level visualization.
Adventures in Linked Data Land (presentation by Richard Light)jottevanger
"Adventures in Linked Data Land: bringing RDF to the Wordsworth Trust" is a paper given by RIchard Light (http://uk.linkedin.com/pub/richard-light/a/221/ba5) to a Linked Data meeting run by the Collections Trust in February 2010. He runs through the basics of LD, how it relates to cultural heritage, and some of his experiments with it, specifically with the data of the Wordsworth Trust, finally listing a series of challenges that face museums in trying to get on board the Linked Data bus.
The Power of Semantic Technologies to Explore Linked Open DataOntotext
Atanas Kiryakov's, Ontotext’s CEO, presentation at the first edition of Graphorum (http://graphorum2017.dataversity.net/) – a new forum that taps into the growing interest in Graph Databases and Technologies. Graphorum is co-located with the Smart Data Conference, organized by the digital publishing platform Dataversity.
The presentation demonstrates the capabilities of Ontotext’s own approach to contributing to the discipline of more intelligent information gathering and analysis by:
- graphically explorinh the connectivity patterns in big datasets;
- building new links between identical entities residing in different data silos;
- getting insights of what type of queries can be run against various linked data sets;
- reliably filtering information based on relationships, e.g., between people and organizations, in the news;
- demonstrating the conversion of tabular data into RDF.
Learn more at http://ontotext.com/.
https://www.eventbrite.com/e/talk-by-paco-nathan-graph-analytics-in-spark-tickets-17173189472
Big Brains meetup hosted by BloomReach, 2015-06-04
Case study / demo of a large-scale graph analytics project, leveraging GraphX in Apache Spark to surface insights about open source developer communities — based on data mining of their email forums. The project works with any Apache email archive, applying NLP and machine learning techniques to analyze message threads, then constructs a large graph. Graph analytics, based on concise Scala coding examples in Spark, surface themes and interactions within the community. Results are used as feedback for respective developer communities, such as leaderboards, etc. As an example, we will examine analysis of the Spark developer community itself.
Microservices, containers, and machine learningPaco Nathan
http://www.oscon.com/open-source-2015/public/schedule/detail/41579
In this presentation, an open source developer community considers itself algorithmically. This shows how to surface data insights from the developer email forums for just about any Apache open source project. It leverages advanced techniques for natural language processing, machine learning, graph algorithms, time series analysis, etc. As an example, we use data from the Apache Spark email list archives to help understand its community better; however, the code can be applied to many other communities.
Exsto is an open source project that demonstrates Apache Spark workflow examples for SQL-based ETL (Spark SQL), machine learning (MLlib), and graph algorithms (GraphX). It surfaces insights about developer communities from their email forums. Natural language processing services in Python (based on NLTK, TextBlob, WordNet, etc.), gets containerized and used to crawl and parse email archives. These produce JSON data sets, then we run machine learning on a Spark cluster to find out insights such as:
* What are the trending topic summaries?
* Who are the leaders in the community for various topics?
* Who discusses most frequently with whom?
This talk shows how to use cloud-based notebooks for organizing and running the analytics and visualizations. It reviews the background for how and why the graph analytics and machine learning algorithms generalize patterns within the data — based on open source implementations for two advanced approaches, Word2Vec and TextRank The talk also illustrates best practices for leveraging functional programming for big data.
Re-using Media on the Web: Media fragment re-mixing and playoutMediaMixerCommunity
A number of novel application ideas will be introduced based on the media fragment creation, specification and rights management technologies. Semantic search and retrieval allows us to organize sets of fragments by topical or conceptual relevance. These fragment sets can then be played out in a non-linear fashion to create a new media re-mix. We look at a server-client implementation supporting Media Fragments, before allowing the participants to take the sets of media they have selected and create their own re-mix.
The world has changed and having one huge server won’t do the job anymore, when you’re talking about vast amounts of data, growing all the time the ability to Scale Out would be your savior. Apache Spark is a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
This lecture will be about the basics of Apache Spark and distributed computing and the development tools needed to have a functional environment.
Vocabulary for Linked Data Visualization Model - Dateso 2015Jiří Helmich
There is already a vast amount of Linked Data on the web. What is missing is a convenient way of analyzing and visualizing the data that would benefit from the Linked Data principles. In our previous work we introduced the Linked Data Visualization Model (LDVM). It is a formal base that exploits the principles to ensure interoperability and compatibility of compliant components. In this paper we introduce a vocabulary for description of the components and an analytic and visualization pipeline composed of them. We demonstrate its viability on an example from the Czech Linked Open Data cloud.
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsChristophe Debruyne
Data processing is increasingly the subject of various internal and external regulations, such as GDPR which has recently come into effect. Instead of assuming that such processes avail of data sources (such as files and relational databases), we approach the problem in a more abstract manner and view these processes as taking datasets as input. These datasets are then created by pulling data from various data sources. Taking a W3C Recommendation for prescribing the structure of and for describing datasets, we investigate an extension of that vocabulary for the generation of executable R2RML mappings. This results in a top-down approach where one prescribes the dataset to be used by a data process and where to find the data, and where that prescription is subsequently used to retrieve the data for the creation of the dataset “just in time”. We argue that this approach to the generation of an R2RML mapping from a dataset description is the first step towards policy-aware mappings, where the generation takes into account regulations to generate mappings that are compliant. In this paper, we describe how one can obtain an R2RML mapping from a data structure definition in a declarative manner using SPARQL CONSTRUCT queries, and demonstrate it using a running example. Some of the more technical aspects are also described.
Reference: Christophe Debruyne, Dave Lewis, Declan O'Sullivan: Generating Executable Mappings from RDF Data Cube Data Structure Definitions. OTM Conferences (2) 2018: 333-350
Towards efficient processing of RDF data streamsAlejandro Llaves
Presentation of short paper submitted to OrdRing workshop, held at ISWC 2014 - http://streamreasoning.org/events/ordring2014.
In the last years, there has been an increase in the amount of real-time data generated. Sensors attached to things are transforming how we interact with our environment. Extracting meaningful information from these streams of data is essential for some application areas and requires processing systems that scale to varying conditions in data sources, complex queries, and system failures. This paper describes ongoing research on the development of a scalable RDF streaming engine.
Towards efficient processing of RDF data streamsAlejandro Llaves
In the last years, there has been an increase in the amount of real-time data generated. Sensors attached to things are transforming how we interact with our environment. Extracting meaningful information from these streams of data is essential for some application areas and requires processing systems that scale to varying conditions in data sources, complex queries, and system failures. This paper describes ongoing research on the development of a scalable RDF streaming engine.
Presented at OrdRing workshop, International Semantic Web Conference 2014.
http://streamreasoning.org/events/ordring2014
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
1. RDF GRAPH VISUALIZATION
BY INTERPRETING LINKED DATA AS KNOWLEDGE
Rathachai CHAWUTHAI & Prof.Hideaki TAKEDA
National Institute of Informatics , and SOKENDAI
RDF4U
JIST2015 Yichang, China 11-13 Nov 2015
4. THE ROLE OF SEMANTIC WEB IN KNOWLEDGE MANAGEMENT
DDaattaa ttiieerr
SSeerrvviiccee ttiieerr
VViissuuaalliissaattiioonn ttiieerr
SSPPAARRQQLL JJEENNAA eettcc..
4
AApppplliiccaattiioonn//PPrreesseennttaattiioonn//
At Visualisation Tier,
• RDF data are transformed into
Chart, Geographic Map, etc.
and then serve users.
It’s cool, but
• Users are far from RDF data, so
they do not understand the power
of Semantic Web and do not realise
how to contribute RDF data.
For this reason,
• It could be good if users can read
RDF data directly using node-link
diagram or concept-map diagram.
read
5. READING FROM A QUERY GRAPH
5
Querying the 2-hop neighbourhood (or more hops) of a given URI
gives wider information on the topic.
CCaaffffee
MMoocchhaa
EEsspprreessssoo CChhooccoollaattee
SSuuggaarr MMiillkk
CCooffffeeeettyyppee
sswweeeett
ttyyppee
ttaassttee
ssuuggaarrccaannee
mmaaddee ffrroomm
ccooww
pprroodduucceess
wwhhiittee
ccoolloorr
ccooccooaa
ccoonnttaaiinnss
aa sshhoott ooff
ttooppppeedd bbyyccoonnttaaiinnss
hhaass llaayyeerr ooff
ccaaffffeeiinnee
ccoonnttaaiinn
443300 mmgg//LL
bbllaacckk
ccoolloorr
bbiitttteerr
ttaassttee
6. PROBLEMS
1) A Query Graph is TOO Complicated to Read.
http://lod.ac/species/Bubohttp://dbpedia.org/resource/Tokyo
6
7. PROBLEMS
7
2) Lacking of Reading Flow of RDF Data
All triples are equal, so Background Content and Main Point
are NOT structured in any RDF graphs.
≠ TTooppiicc
8. GOAL
8
we prefer …….
✦ A Simply Readable Graph
✦ A Well-Reading-Flow Graph
TTooppiicc
TTooppiicc
Common Information
Topic-Specific Information
12. GRAPH SIMPLICATION
12
• Some well-prepared RDF repositories did reasoning on
ontologies in order to support a SPARQL service.
• One impact is that the inferred triples create giant
components in a graph.
• A closer look at the data indicates that the following
situations are commonly found in any complex RDF graph.
• equivalent or same-as instances (owl:sameAs),
• transitive properties (e.g. skos:broaderTransitive), and
• hierarchical classification (rdf:type & rdfs:subClassOf)
• Thus, this method aims to remove some redundant triples
by using the mechanism of Semantic Web rules.
15. TRIPLE RANKING
15
Since users have different background knowledge in a specific topic,
beginners may interested in reading common information before getting
topic-specific information, while experts may prefer to read only topic-
specific information.
• Concept Level (resources || properties)
• General Concepts are terms that are commonly known such as
“name”, “address”, and “class”, and they are always found in a corpus.
• Key Concepts are important terms that are always found in the query
result and not many in the whole dataset.
• Information Level (triples)
• Common Information explains background knowledge that supports
readers to understand the main content. (a lot of general concepts)
• Topic-Specific Information contains specific terms that are highly
relevance to the article. (a lot of key concepts)
16. TRIPLE RANKING
16
are General Concepts are Key Concepts
Identify
• General concepts
• Key concepts
Get an RDF graph 2211
17. TRIPLE RANKING
17
are General Concepts are Key Concepts
Common Information
Most of nodes and links
are general concepts
33 44
Topic-Specific Information
Most of nodes and links are
key concepts
18. α⋅w(s) + β⋅w(p) + γ⋅w(o)
3
α⋅w(s) + β⋅w(p) + γ⋅w(o)
α + β + γ
TRIPLE RANKING
18
w(uri)=
fQ(uri)
log( fD(uri) + 1)
vw(〈s,p,o〉)=
a number of a URI in a Query result
a logarithmic scale of a number of a URI
in a whole Dataset
Weight of a URI
Visualization-Weight of a Triple
The coefficients are 1.0 by default,
but they can be adjusted due to for specific purpose.
Concept Level
Information Level
high: key concept
low: general concept
high: topic-specific
low: common info
20. TRIPLE RANKING
20
Subject Predicate Object vw
dp:Hydrogen rdf:type owl:Thing 5.62
dp:Hydrogen rdf:type skos:Concept 6.01
dp:Hydrogen dct:subject dp:Chemical_elements 7.31
dp:Hydrogen dct:subject dp:Airship_technology 7.35
dp:Hydrogen rdf:type dp:Diatomic_nonmetals 7.48
H
For Example
http://dbpedia.org/resource/Hydrogen
Common
Topic-Specific
Information Level
21. TRIPLE RANKING
21
In case of sub-property (also sub-class)
ltk:higherTaxon
ltk:mergedInto
skos:broader
rdfs:subPropertyOf
rdfs:subPropertyOf
ltk:higherTaxon
ltk:mergedInto
a x
a y
skos:broader
a x
a y
skos:broader
more specific than
Raw Data Inferred Data
23. PROTOTYPE
23
http://rc.lodac.nii.ac.jp/rdf4u/
Thanks to
Client: D3js, Bootstrap, jQuery,
Server: SimpleRDF, SPARQL for PHP
• To simplify a graph by removing some
inferred triples.
• To give ranking scores to triples based on
common and topic-specific information.
• To filter a graph by selecting preferred
properties.
• To control an interactive graph diagram.
Features
bit.ly/rdf4u
24. DISCUSSION
Usefulness
Uniqueness
Novelty
Prospect
Some graph visualisation works: Motif,
Gephi, RDF Gravity, Fenfire, and
IsaViz,
• do not use the power of Semantic
Web to sparsity a graph, and
• do not mention to provide
different data for different user
levels
• TF-IDF is adapted for ordering
triple from common to topic-
specific level of information.
• The degree of commonness versus
specificity is calculated by
evaluating the nature of the
dataset with the algorithm.
• The triple ranking can be extended
by applying various algorithm in
order to satisfy diverse
characteristics of the data in other
domains such as Biodiversity
Informatics.
• Mashup tools should consider this
idea.
24
• A diagram is sparser and easier
to be read by human.
• Beginners can read common
information firstly.
• Expert can read topic-specific
information.
25. FUTURE PLAN
• To do critical evaluation
• Survey
• Number of cutting edge
• To find the precise border between
common information and topic-
specific information
• To find a better way to count the
number of URIs
(always timeout)
• To remove noisy triples
• To improve triple ranking algorithm
for other domains
25