This document provides an introduction to working with triples and datasets using the Jena API in Java. It discusses how to [1] create a dataset and named model, [2] create resources, properties, and literals that compose a triple, and [3] connect the triple components by adding them to a model. It also briefly covers serializing the model contents to file in different RDF formats like RDF/XML, TTL, and N-Triple. The goal is to demonstrate the basic steps for building, querying, and persisting RDF triples with Jena.
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesHolden Karau
This session of the workshop introduces Spark SQL along with DataFrames, Datasets. Datasets give us the ability to easily intermix relational and functional style programming. So that we can explore the new Dataset API this iteration will be focused in Scala.
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesHolden Karau
This session of the workshop introduces Spark SQL along with DataFrames, Datasets. Datasets give us the ability to easily intermix relational and functional style programming. So that we can explore the new Dataset API this iteration will be focused in Scala.
Start programming in a more functional style in Java. This is the second in a two part series on lambdas and streams in Java 8 presented at the JoziJug.
See all the steps involved to build a fully functional data mapper with object-oriented PHP5 using the power and simplicity of the Standard PHP Library. Based on the early development of a new PHP DataMapper project Vance Lucas started a few months ago as the Model layer for an MVC framework.
Johannes Schlüter's PHPNW08 slides:
The current PHP version, PHP 5.3 introduced a multitude of new language features, most notably namespaces and late static binding, new extensions such as phar, as well as numerous other improvements. Even so, this power-packed release boasts better performance than older PHP releases. This talk will give you a good overview about PHP 5.3 and show some less known features in detail.
Spring Data is a high level SpringSource project whose purpose is to unify and ease the access to different kinds of persistence stores, both relational database systems and NoSQL data stores.
A powerful ORM tool to design data base access layer.
Fills gaps of mismatches between OOPs and RDBMS paradigm. Also maintains adequate ultra performance of database access.
An automated, configurable persistence of java objects with tables in data base.
May not be a good solution for data-centric application which uses only stored procedures to implement business logic in the database.
PuppetConf 2017: Hiera 5: The Full Data Enchilada- Hendrik Lindberg, PuppetPuppet
Separating data from code is just as important as it has always been for creating reusable code that can be conveniently configured for different situations. This has been possible to do in Puppet for quite some time, using Hiera and automatic parameter lookup. The new release of Hiera 5, introduced late in the Puppet 4.x series, brings new capabilities for data management. Data is no longer just global — it can be defined in an environment and inside a module. Plus, data integration no longer requires special backends — the point of integration is now a function. There are also new ways to reference data files. And there's so much more in Hiera 5. This talk introduces all the features of Hiera 5 now available in Puppet 5, and shows how they can be used. Integrators who want to write their own backends will also learn how to do that.
Start programming in a more functional style in Java. This is the second in a two part series on lambdas and streams in Java 8 presented at the JoziJug.
See all the steps involved to build a fully functional data mapper with object-oriented PHP5 using the power and simplicity of the Standard PHP Library. Based on the early development of a new PHP DataMapper project Vance Lucas started a few months ago as the Model layer for an MVC framework.
Johannes Schlüter's PHPNW08 slides:
The current PHP version, PHP 5.3 introduced a multitude of new language features, most notably namespaces and late static binding, new extensions such as phar, as well as numerous other improvements. Even so, this power-packed release boasts better performance than older PHP releases. This talk will give you a good overview about PHP 5.3 and show some less known features in detail.
Spring Data is a high level SpringSource project whose purpose is to unify and ease the access to different kinds of persistence stores, both relational database systems and NoSQL data stores.
A powerful ORM tool to design data base access layer.
Fills gaps of mismatches between OOPs and RDBMS paradigm. Also maintains adequate ultra performance of database access.
An automated, configurable persistence of java objects with tables in data base.
May not be a good solution for data-centric application which uses only stored procedures to implement business logic in the database.
PuppetConf 2017: Hiera 5: The Full Data Enchilada- Hendrik Lindberg, PuppetPuppet
Separating data from code is just as important as it has always been for creating reusable code that can be conveniently configured for different situations. This has been possible to do in Puppet for quite some time, using Hiera and automatic parameter lookup. The new release of Hiera 5, introduced late in the Puppet 4.x series, brings new capabilities for data management. Data is no longer just global — it can be defined in an environment and inside a module. Plus, data integration no longer requires special backends — the point of integration is now a function. There are also new ways to reference data files. And there's so much more in Hiera 5. This talk introduces all the features of Hiera 5 now available in Puppet 5, and shows how they can be used. Integrators who want to write their own backends will also learn how to do that.
Semantic Integration with Apache Jena and StanbolAll Things Open
All Things Open 2014 - Day 1
Wednesday, October 22nd, 2014
Phillip Rhodes
Founder & President of Fogbeam Labs
Big Data
Semantic Integration with Apache Jena and Stanbol
Database-to-Ontology Mapping Generation for Semantic InteroperabilityRaji Ghawi
This is the powerpoint presentation of my paper: "Database-to-Ontology Mapping Generation for Semantic Interoperability", presented at the Third International Workshop on Database Interoperability (InterDB 2007), held in conjunction with VLDB 2007, Vienna, Austria, September 2007.
Talk given at the London Semantic Web Meetup February 17th 2015. Discusses projects that aim to bridge the gap between the RDF and the Hadoop ecosystems primarily focusing on Apache Jena Elephas.
Jena based implementation of a iso 11179 meta data registryA. Anil Sinaci
The ISO/IEC 11179 family of specifications introduces a standard model for meta-data registries to increase the interoperability of applications with the use of common data elements. Jena based implementation of a standard meta-data registry, brings semantic processing and reasoning capabilities on top of the common data elements and their consumer applications.
RuleML 2015: Ontology Reasoning using Rules in an eHealth ContextRuleML
Traditionally, nurse call systems in hospitals are rather simple:
patients have a button next to their bed to call a nurse. Which specific
nurse is called cannot be controlled, as there is no extra information
available. This is different for solutions based on semantic knowledge:
if the state of care givers (busy or free), their current position, and for
example their skills are known, a system can always choose the best
suitable nurse for a call. In this paper we describe such a semantic nurse
call system implemented using the EYE reasoner and Notation3 rules.
The system is able to perform OWL-RL reasoning. Additionally, we use
rules to implement complex decision trees. We compare our solution to
an implementation using OWL-DL, the Pellet reasoner, and SPARQL
queries. We show that our purely rule-based approach gives promising
results. Further improvements will lead to a mature product which will
significantly change the organization of modern hospitals.
Challenge@RuleML2015 Transformation and aggregation preprocessing for top-k r...RuleML
In this paper we describe the KTIML team approach to RuleML 2015 Rule-based Recommender Systems for the Web of Data Challenge Track. The task is to estimate the top 5 movies for each user separately in a semantically enriched MovieLens 1M dataset. We have three results. Best is a domain specif-ic method like "recommend for all users the same set of movies from Spiel-berg". Our contributions are domain independent data mining methods tailored for top-k which combine second order logic data aggregations and transfor-mations of metadata, especially 5003 open data attributes and general GAP rules mining methods.
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop PlatformApache Apex
Internet of Things (IoT) devices are becoming more ubiquitous in consumer, business and industrial landscapes. They are being widely used in applications ranging from home automation to the industrial internet. They pose a unique challenge in terms of the volume of data they produce, and the velocity with which they produce it, and the variety of sources they need to handle. The challenge is to ingest and process this data at the speed at which it is being produced in a real-time and fault tolerant fashion. Apache Apex is an industrial grade, scalable and fault tolerant big data processing platform that runs natively on Hadoop. In this deck, you will see how Apex is being used in IoT applications and also see how the enterprise features such as dimensional analytics, real-time dashboards and monitoring play a key role.
Presented by Pramod Immaneni, Principal Architect at DataTorrent and PPMC member Apache Apex, on BrightTALK webinar on Apr 6th, 2016
This deck was presented at the Spark meetup at Bangalore. The key idea behind the presentation was to focus on limitations of Hadoop MapReduce and introduce both Hadoop YARN and Spark in this context. An overview of the other aspects of the Berkeley Data Analytics Stack was also provided.
No more Three Tier - A path to a better code for Cloud and AzureMarco Parenzan
Three Tier is no more a model for Cloud and in general Scalable Web Applications. Scaffolding from SQL Database is, sure, the worst way to do. A road to do this is changing modeling activity, from scaffolded DB model to Command/DTO model, through ViewModel pattern. Command and DTOs are pillars of CQRS model.
Watch video: https://www.youtube.com/watch?v=SgmmoRCmIa4&list=PLIuWze7quVLDSxJKDj3pRSqvmHAzQ_9vd&index=6
Here is the summary of what you'll learn:
00:02:00 Welcome
00:03:32 Meet Chafik, CEO of Brainboard.co
00:05:00 Our goal at Brainboard
00:06:00 Terraform modules definition
00:20:00 Build your own modules
00:21:00 Azure
00:48:00 AWS
00:52:00 Best practices
00:56:00 Review some of the most used community modules
00:56:43 Lambda
01:00:30 AKS
01:04:00 Where to host your modules?
01:06:04 Challenges of maintaining modules within a team
01:09:00 Build your own modules’ catalog
Short and comprehensive manual to extend your local matlab with a high performance computing cluster of NVidia tesla's 2070 graphical processing units.
MACHINE LEARNING FOR OPTIMIZING SEARCH RESULTS WITH DRUPAL & APACHE SOLRDrupalCamp Kyiv
In this session I'll give you a summary of what machine learning is but more importantly how can you use it for a very common problem, namely the relevancy of your internal site search.
Recently, a client of ours shared with us their frustration that their website’s internal site search results didn’t display the most relevant items when searching for certain keywords. They had done their homework and provided us with a list of over 150 keywords and the expected corresponding search result performance. I'll take you on a roadshow how complex Search is and why we all came to rely on Google and came to expect similar quality from our other searches online.
You'll leave the session with a general understanding of not only machine learning concepts but also how search works and how you can use the toolkit of Solr/Lucene to improve your site search with minimal impact for your Drupal site. I'll try to keep it understandable for all audiences but do expect a high level of technical content and concepts.
https://drupalcampkyiv.org/node/80
Introduciton to Apache Cassandra for Java Developers (JavaOne)zznate
The database industry has been abuzz over the past year about NoSQL databases. Apache Cassandra, which has quickly emerged as a best-of-breed solution in this space, is used at many companies to achieve unprecedented scale while maintaining streamlined operations.
This presentation goes beyond the hype, buzzwords, and rehashed slides and actually presents the attendees with a hands-on, step-by-step tutorial on how to write a Java application on top of Apache Cassandra. It focuses on concepts such as idempotence, tunable consistency, and shared-nothing clusters to help attendees get started with Apache Cassandra quickly while avoiding common pitfalls.
A brief introduction to Spark ML with PySpark for Alpine Academy Spark Workshop #2. This workshop covers basic feature transformation, model training, and prediction. See the corresponding github repo for code examples https://github.com/holdenk/spark-intro-ml-pipeline-workshop
Miller Columns (used in iPhone and Mac Finder) are an elegant way of displaying and navigating a tree. This talk describes a JavaScript implementation of Miller Columns, and why JavaScript needs modules and a standard library.
Publishing Python to PyPI using Github Actions.pptxCraig Trim
This presentation provides a straightforward guide to publishing Python projects on PyPI using GitHub Actions. It's a practical walkthrough for developers on automating the release process of their Python packages. You'll learn how to set up a PyPI token, configure GitHub workflows, and push updates that trigger automatic package deployment. This resource is for anyone looking to eliminate manual uploads to PyPI with a straightforward approach to using GitHub's tools for continuous integration and deployment.
SAS University Edition - Getting StartedCraig Trim
Get Started with SAS University Edition on your local machine using Virtual Box to host a pre-installed instance. Work through the initial setup and configuration and run SAS code from the training modules.
Octave - Prototyping Machine Learning AlgorithmsCraig Trim
Octave is a high-level language suitable for prototyping learning algorithms.
Octave is primarily intended for numerical computations and provides extensive graphics capabilities for data visualization and manipulation. Octave is normally used through its interactive command line interface, but it can also be used to write non-interactive programs. The syntax is matrix-based and provides various functions for matrix operations. This tool has been in active development for over 20 years.
There are many words in english that end with the suffix "-nym" or "-nymy". This comes from the ancient Greek ὄνυμα, meaning "name" or "word", and could even be loosely translated as "state of being".
A categorization of onomastic terminology is a helpful step in understanding data. In the automated creation of a semantic model, it is neccessary to develop patterns. Semantic models are primarily composed of space (static information) and time (process / event oriented). Patterns built around onoma help is deriving the former.
This is not a complete list of all Onoma. In many respects, the class of words ending with "-nym" could be considered open. Neologism (the type of words belonging to the class "neonym") can be easily created to describe any category for any entity type.
The purpose of properties is to enable inference. For all the explicit information that has been modeled, what information can be implied?
RDFS provides a very limited set of inference capabilities. The Web Ontology Language (OWL) provides more elaborate constraints on how information can be specified. A subset of these constraints are discussed in this presentation.
An Ontology is a description of things that exist and how they relate to each other. Ontologies and Natural Language Processing (NLP) can often be seen as two sides of the same coin.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Accelerate your Kubernetes clusters with Varnish Caching
An Introduction to the Jena API
1. Jena API – Introduction
Table of Contents
Create a Dataset....................................................................................................................................2
Create a Named Model.........................................................................................................................3
Create a Triple......................................................................................................................................7
Let's get to the code already ... ........................................................................................................9
Namespaces in RDF.........................................................................................................................9
... back to the code.........................................................................................................................10
Connect the Nodes.........................................................................................................................10
Triple Store vs Relational Store..........................................................................................................12
Saving the Results..............................................................................................................................13
Serialization Types.........................................................................................................................13
RDF/XML.................................................................................................................................14
TTL...........................................................................................................................................14
N-TRIPLE.................................................................................................................................14
References..........................................................................................................................................15
Craig Trim
cmtrim@us.ibm.com
9:12 PM 2/22/2013
1 - 14
2. Introduction
What's the point?
I am having a hard time here. I get triples. I get that I want to work with a collection of
triples. But what are the main, important differences between a Model, DataSource,
Dataset, Graph, and DataSetGraph? What are their lifecycles? Which are designed to be
kept alive for a long time, and which should be transient? What triples get persisted, and
when? What stays in memory? And how are triples shared (if at all) across these five things?
I'll take an RTFM answer, if there's a simple summary that contrasts these five. Most
references I have found are down in the weeds and I need the forest view1.
Create a Dataset
How do I create a dataset?
Dataset ds = TDBFactory.createDataset("/demo/model-01/");
A Dataset contains a default model A Dataset can likewise contain 0.. named models.
A SPARQL query over the Dataset can be
1. over the default model
2. over the union of all models
3. over multiple named models
4. over a single named model
A Jena Dataset wraps and extends the functionality of a DatasetGraph.
The underlying DatasetGraph2 can be obtained from a Dataset at any time, though this is not likely
to be necessary in a typical development scenario.
1 http://answers.semanticweb.com/questions/3186/model-vs-datasource-vs-dataset-vs-graph-vs-datasetgraph
2 Via this call:
<Dataset>.asDatasetGraph() : DatasetGraph
Read the "Jena Architectural Overview" for a distinction on the Jena SPI (Graph Layer) and Jena API (Model Layer)
2 - 14
3. Create a Named Model
We're going to be loading data from the IBM PTI Catalog3, so let's create a named model for this
data:
Dataset ds = TDBFactory.createDataset("/demo/model-01/");
Model model = ds.getNamedModel("pti/software");
Triples always have to be loaded into a Model. We could choose to use the "default model" through
this call:
ds.getDefaultModel() : Model
But it's good practice to use a named model. A named model functions as a "logical partition" for
data. It's a way of logically containing data. One advantage to this approach is query efficiency. If
there are multiple models, any given model will represent a subset of all available triples. The
smaller the model, the faster the query.
Named models can also be merged in a very simple operation:
model.add(<Model>) : Model
You can merge as many models as you want into a single model for querying purposes.
Think of models like building blocks. A dataset might contain four models:
D 1
2 3
"D" represents the "Default Model". If you prefer not to work with named models, then you would
simply make this call:
Model defaultModel = ds.getDefaultModel();
The default model always exist (default) in the Dataset4. It will exist even if it is not used. I
3 http://pokgsa.ibm.com/projects/p/pti/consumer/public/test/sit/
4 The graph name is actually an RDF Node. A named graph is like a logical partition to the dataset. Some triples
belong to one named graph; other triples belong to another named graph, etc. Behind the scenes, Jena is creating a
"Quad".
Recall that a triple is: Subject, Predicate, Object
A quad is: Subject, Predicate, Object, Context
3 - 14
4. recommend ignoring the default model and focus on creating named models.
Note that a named model will not exist until you create it:
Model model1 = ds.getNamedModel("M1");
Model model2 = ds.getNamedModel("M2");
Model model3 = ds.getNamedModel("M3");
// etc ...
When you make a call to the "getNamedModel", the model will be located and returned. If the
model does not exist, it will be created and returned.
A query could be executed against the entire dataset:
Model union = ds.getNamedModel("urn:x-arq:UnionGraph")
D 1
2 3
This method call is computationally "free". Jena simply provides a model that contains all the
triples across the entire dataset.
A query could be executed against certain models in the dataset:
A quad functions just like a triple, with the addition of a context node. In this case, every triple we create in this
tutorial (just one in this example) will have an associated "4th node" - this is the node that represents the named
graph. Note that when you write a triple to the default model, you are creating a triple. There is no "fourth node" in
this case. Quads only apply where named models are present.
Consider this triple Shakespeare authorOf Hamlet:
If we request Jena to add this triple to
the named model called "plays", a
Quad will be created that looks like this: Shakespare authorOf Hamlet +plays
If we request Jena to add this to the
default model, it will look like this: Shakespare authorOf Hamlet
Each quad is stored in the DatasetGraph. Rather than using a more complex containment strategy, this is simply a
method of "indexing" each triple with a fourth node that provides context. Note that this is an implementation of
the W3 RDF standard, and not a Jena-specific extension. This does not affect how you have to think about the
"triple store", nor does it affect how you write SPARQL queries.
The SPARQL query: ?s ?p ?o
will work the same way against a named model (quads) as it will against a default model (triples).
4 - 14
5. D 1 3
2 3 1
Model merged = ds.getNamedModel("M1").add(
ds.getNamedModel("M2"));
Such a model can be either persisted to the file system if necesssary.
Let's return to our original code. Changes to the dataset (such as writing or deleting triples) are
surrounded with a try/finally pattern:
Dataset ds = TDBFactory.createDataset("/demo/model-01/");
Model model = ds.getNamedModel("pti/software");
try {
model.enterCriticalSection(Lock.WRITE);
// write triples to model
model.commit();
TDB.sync(model);
} finally {
model.leaveCriticalSection();
}
This pattern should be used when data is inserted or updated to a Dataset or Model.
There is a performance hit to this pattern. Don't use it a granular level, for each and every update
and/or insertion. Try to batch up inserts within a single try/finally block.
If this try/finally pattern is not used, the data will still be written to file. However, the model will be
inconsistent, and iteration over the model could provoke a "NoSuchElementException" when
querying the model, as the model has is basically inconsistent.
5 - 14
6. Create a Triple
Predicate
(Property)
Subject Object
(Resource) (RDFNode)
When using the model layer in Jena, a triple is composed of:
an instance of Resource the Subject
an instance of Property the Predicate
an instance of RDFNode the Object
Illustration 1: Class Hierarchy
The object of a triple can be any of the three types above:
1. Resource (extends RDFNode)
2. Property (extends Resource extends RDFNode)
3. Literal5 (extends RDFNode)
5 For example:
a String value, such as a person's name
a Long value, such as a creation timestamp
an Integer value, such as a sequence
6 - 14
7. The subject of a triple is limited to either of:
1. Resource (extends RDFNode)
6
2. Property (extends Resource)
etc
6 It is possible to make assertions about properties in the form of a triple. For example, if we create a predicate called
"partOf" we might want to make this a transitive property. We would do so by creating the triple:
partOf rdf:type owl:TransitiveProperty
On the other hand, such an assertion could be dangerous:
finger partOf hand partOf body partOf Craig partOf IBM
which might lead one to believe:
finger partOf IBM
(perhaps this is true)
7 - 14
8. Let's get to the code already ...
The following code will create three disconnected "nodes" in Jena.
String ns = "http://www.ibm.com/ontologies/2012/1/JenaDemo.owl#";
Resource subject = model.createResource(
ns.concat("Shakespeare"));
Property predicate = model.createProperty(
ns.concat("wrote"));
Resource object = model.createResource(
ns.concat("Hamlet"));
Namespaces in RDF
Note the use of
ns.concat("value")
Every resource in the model (should) be qualified by a namespace. This is pretty standard when
dealing with data – not just RDF. The reasons we might have for qualifying a resource with a
namespace in a triple store are the same reasons we might have for qualifying a resource with a
namespace in XML. The use of a qualified name helps ensure a unique name.
You might have to merge your triple store with a triple store that you found online, or from another
company. Two resources may have the same name, and may even (conceptually) have similar
meanings, but they will not necessarily be used the same way.
One developer might assert that the use of "partOf" is transitive. Another developer might assert
that the use of "partOf" is not transitive. Both properties mean the same thing, but clearly you
would want to have these properties qualified with namespaces, so that the correct property could
be used for each situation. For example, let us assume that
ns1:partOf rdf:type owl:TransitiveProperty
and that ns2:partOf is not transitive.
We could then correctly model this scenario:
finger ns1:partOf hand ns1:partOf body
ns1:partOf Craig ns2:partOf IBM
8 - 14
9. Craig is "part of" IBM and finger is "part of" Craig, but finger is not "part of" IBM.
... back to the code
So now we've created 3 RDFNodes in our Jena named model.
pti/software
wrote
Shakespeare Hamlet
Illustration 2: Three Disconnected RDFNodes in a Named Model
If you're thinking something doesn't look right here, you're right. These nodes are disconnected.
We haven't actually created a triple yet. We've just created two Resources and a Property7.
Connect the Nodes
In order to actually connect these values as a triple, we need to call this code:
connect(subject, predicate, object);
...
private Statement connect(
Resource subject,
Property predicate,
Resource object)
{
model.add(subject, property, object);
7 You might not find yourself in a situation where you are creating properties at runtime. A triple store could be
initialized with an Ontology model, which would itself explictly define the predicates and their usage. The triple
store would then reference these pre-existing properties.
However, there are valid situations where properties could created automatically. Text analytics on a large corpus
and finding verbs (actions) that connect entities; the verbs could be modeled as predicates, and the results queried
once complete.
9 - 14
10. return model.createStatement(subject, property, object);
}
Of course, you don't actually have to use my code above. But it is a lot easier to put a method
around these two Jena methods (add and createStatement). And of course, all of this occurs within
the context of the try/finally block discussed earlier.
And then we get this:
pti/software
wrote
Shakespeare Hamlet
Illustration 3: A "Triple"
It's perfectly valid to write resources to a model without connecting them to other resources. The
connections may occur over time.
10 - 14
11. Triple Store vs Relational Store
Relationships in a triple store can and should surprise you. You'll never design an Entity
Relationship Diagram (ERD) and use a Relational Database (RDBMS) – and wake up one morning
to find that there is a new relationship between table a and table b. This just doesn't happen.
Primary keys, Foreign keys, Alternate keys – these are all the result of foresight and careful design
of a well understood domain. The better the domain is understood, the better the relational database
will be designed. If the structure of a relational database change, this can have a severe impact on
the consumers of the data store.
But a triple store is designed for change. If the domain is so large, and so dynamic, that it can never
be fully understood, or fully embraced – then an ERD may not be the right choice. An Ontology
and Triple Store may be better suited. As more data is added, relationships will begin to occur
between nodes, and queries that execute against the triple store will return results where the
relationships between entities in the result set may not have been anticipated.
11 - 14
12. Saving the Results
Triples are held in a dataset which is either transient (in memory) or persisted (on disk). In the
example we've just completed, the created triple was stored in Jena TDB.
The first call we looked at:
Dataset ds = TDBFactory.createDataset("/demo/model-01/");
actually creates a triple store on disk, at the location specified. If a triple store already existed at
that location, this factory method would simply return the dataset for that triple store. Database
setup doesn't get any easier than this8. And TDB is a serious triple store – suitable for enterprise
applications that require scalability9 and performance.
But what if we want to see what is actually in the triple store? Actually look at the data? We need
the equivalent of a database dump. Fortunately, the Jena API makes it quite trivial to serialize
model contents to file:
model.write(
new BufferedWriter(
new OutputStreamWriter(
new FileOutputStream(
file, false
)
)
), "RDF/XML"
);
Notice the use of the string literal "RDF/XML" as the second parameter of the write() method.
There are multiple serialization types for RDF.
Serialization Types
Some of the more common ones are:
1. RDF/XML
2. RDF/XML-Abbrev
3. TTL (Turtle)
4. N-TRIPLE
5. N3
8 The setup for RDF support in DB2 is actually pretty simple. [REFERENCE SETUP PAGE]. And DB2-RDF out
performs TDB in many respects [SHOW LINK].
9 TDB supports ~1.7 billion triples.
12 - 14
13. TTL and N3 are among the easiest to read. RDF/XML is one of the original formats. If you cut
your teeth on RDF by reading the RDF/XML format (still very common for online examples and
tutorials) you may prefer that. But if you are new to this technology, you'll likely find TTL the most
readable of all these formats.
If we execute the above code on the triple we created, we'll end up with these serializations:
RDF/XML
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:j.0="http://www.ibm.com/ontologies/2012/1/JenaDemo.owl#" >
<rdf:Description rdf:about="http://www.ibm.com/ontologies/2012/1/JenaDemo.owl#Shakespeare">
<j.0:authorOf rdf:resource="http://www.ibm.com/ontologies/2012/1/JenaDemo.owl#Hamlet"/>
</rdf:Description>
</rdf:RDF>
TTL
<http://www.ibm.com/ontologies/2012/1/JenaDemo.owl#Shakespeare>
<http://www.ibm.com/ontologies/2012/1/JenaDemo.owl#authorOf>
<http://www.ibm.com/ontologies/2012/1/JenaDemo.owl#Hamlet> .
N-TRIPLE
<http://www.ibm.com/ontologies/2012/1/JenaDemo.owl#Shakespeare>
<http://www.ibm.com/ontologies/2012/1/JenaDemo.owl#authorOf>
<http://www.ibm.com/ontologies/2012/1/JenaDemo.owl#Hamlet> .
Note that RDF/XML-ABBREV will show nesting (similar to an XML document). Since we only
have a single triple in this demo, there's nothing to show for the serialization.
13 - 14
14. References
1. SPARQL Query Language for RDF. W3C Working Draft 21 July 2005. 22 February 2013
<http://www.w3.org/TR/2005/WD-rdf-sparql-query-20050721/#rdfDataset>.
2. Jena Users Group:
3. Jena/ARQ: Difference between Model, Graph and DataSets. August 8th, 2011.
4. Dokuklik, Yaghob, et al. Semantic Infrastructures. 2009. Charles University in Prague.
Czech Republic.
14 - 14