On-Demand RDF Graph Databases in the Cloud

On-Demand RDF Graph Databases in
the Cloud
A webinar with
Marin Dimitrov, CTO of Ontotext
Jun 11th, 2015
On-Demand RDF Graph Databases in the Cloud #1Jun 2015

• The Self-Service Semantic Suite (S4)
• RDF graph databases
• On-demand RDF databases in the Cloud
• Demo
• Roadmap
• Q&A session
Today’s topics
#2On-Demand RDF Graph Databases in the Cloud Jun 2015

About Ontotext
• Provides products & solutions for content
enrichment, metadata management & information
discovery
– 70 employees, headquarters in Sofia (Bulgaria)
– Sales presence in London & New York
• Major clients and industries
– Media & Publishing
– Health Care & Life Sciences
– Cultural Heritage & Digital Libraries
– Government
– Education

Some of our clients

Our vision for Smart Data management
Graph Database
• Flexible RDF graph
data model
• Ontology based
metadata layer
Semantic Search
• Semantic,
exploratory search
• Metadata driven
content
Text Mining & Interlinking
• Interlink people,
locations, organisations,
topics
• Discover implicit relations
• Reuse open knowledge
graphs

Ontotext and AstraZeneca
Profile
• Global, Bio-pharma company
• $28 billion in sales in 2012
• $4 billion in R&D across three continents
Goals
• Efficient design of new clinical studies
• Quick access to all of the data
• Improved evidence based decision-making
• Strengthen the knowledge feedback loop
• Enable predictive science
Challenges
• Over 7,000 studies and 23,000 documents
are difficult to obtain
• Searches returning 1,000 – 10,000 results
• Document repositories not designed for
reuse
• Tedious process to arrive at evidence
based decisions

Ontotext and the Financial Times
Profile
• Top 3 business media
• Focused both on B2C publishing and B2B
services
Goals
• Create a horizontal platform for content
enrichment and recommendation based on
semantics
Challenges
• Critical part of the entire workflow
• Move fast from inception to production
deployment
• GraphDB used not only for data, but for
content storage as well
• Horizontal platform with focus on
organizations, people and relations between
them
• Automatic extraction of all these concepts
and relationships
• Personalised recommendations of relevant
content across the entire media

Ontotext and LMI
Profile
• Established in 1961 to enable federal
agencies
• Specializes in logistics, financial,
infrastructure & information management
Goals
• Unlock large collections of complex
documents
• Improve analyst productivity
• Create an application they can sell to US
Federal agencies
Challenges
• Analysts taking hours to find, download
and search documents, using inaccurate
keyword searches
• Needed a knowledge base to search
quickly and guide the analysts – highly
relevant searches
• Extracts knowledge from collection of
documents
• Uses GraphDB to intuitively search and filter
• More than 90% savings in analyst time
• Accurate results

The Self-Service Semantic Suite
(S4)

• Capabilities for text analytics, content enrichment
and smart data management
– Text analytics for news, life sciences and social media
– RDF graph database as-a-service
– Access to large open knowledge graphs
• Available on-demand, anytime, anywhere
– Simple RESTful services
• Simple pay-per-use pricing
– No upfront commitments
What is S4?

What is S4?
Today’s
webinar
focus

• Enables quick prototyping
– Instantly available, no provisioning & operations
required
– Focus on building applications, don’t worry about
infrastructure
• Free tier!
• Easy to start, shorter learning curve
– Various add-ons, SDKs and demo code
• Based on enterprise semantic technology by
Ontotext
Benefits

Getting started in minutes
#13
1. Register a personal
account at s4.ontotext.com
2. Generate an
API key pair
3. Check out the docs,
demos & code at
docs.s4.ontotext.com
4. Contact us
with questions!
On-Demand RDF Graph Databases in the Cloud Jun 2015

• Text analytics services
– News annotation
– News categorisation
– Biomedical
– Twitter
• Entity linking & disambiguation
– Mappings to DBpedia & GeoNames instances
– Mappings to biomedical data sources (LinkedLifeData)
• HTML, MS Word, XML, plain text input
• Simple JSON output
Text analytics with S4

News analytics example
#15
S4 result

• SPARQL query endpoint to the FactForge semantic
data warehouse
– 500 million entities / 5 billion triples
• Key LOD datasets integrated
– DBpedia, Freebase/WikiData, GeoNames, WordNet
– Dublin Core, SKOS, PROTON ontologies and
vocabularies
Knowledge graphs with S4

Knowledge graph query example
#17
SPARQL query
using DBpedia
data

RDF Graph Data Management

• Schema-less data integration, easy querying of
diverse data
• Standards compliance
– Based on a mature set of W3C standards: RDF/S, OWL,
SPARQL
– Portability & interoperability across vendors
• Complex & exploratory queries
• Infer implicit relations in the graph
• Reuse open knowledge graphs (Linked Open Data)
RDF for smart data management

A visual view of RDF data
#20
Sub-properties
Sub-classes
Transitive relations
Inference

• High performance RDF database, 10s of billions of
triples
• Full SPARQL 1.1 support
• Various reasoning profiles, including custom rules
• Efficient data integration (“sameAs” optimisations)
and deletion of statements & their inferences
• Geo-spatial indexing & querying with SPARQL
• RDF Rank, full-text search, 3rd party plugins
• Connectors to Solr, ElasticSearch, NoSQL DBs
• GraphDB Workbench
GraphDB by Ontotext

“Despite all of this attention the
market is dominated by Neo4J
and Ontotext (GraphDB), which
are graph and RDF database
providers respectively. These are
the longest established vendors
in this space (both founded in
2000) so they have a longevity
and experience that other
suppliers cannot yet match.
How long this will remain the
case remains to be seen.”
Graph databases report by Bloor
Bloor Group whitepaper
Graph Databases, April 2015
http://www.bloorresearch.com/technology/graph-databases/

On-demand RDF Databases in
the Cloud

• Ideal for customers who are…
– still evaluating and testing RDF technology
– In the early phase of adoption / PoC
• Enterprise grade RDF database in the Cloud
– No need for upfront payments for licenses & hardware
– Pay only for what you use, when you use it
– Instantly operational within minutes
– No need for complex planning - use as many DB
instances for as long as needed
– Timely upgrades to the latest version
• Self-managed and fully managed options
RDF database in the Cloud with S4

• Available from AWS Marketplace, “1-Click”
purchasing
• Variety of hardware configurations
– 2 to 8 CPU cores / 8 to 61 GB RAM
– IOPS performance & encryption (EBS)
• Manage large data volumes
• Pay-per-hour pricing
• Users take care of operations
– Backups, restores
Self-managed RDF DB in the Cloud

Self-managed RDF DB in the Cloud

• Low-cost graph DBaaS available 24/7
• Ideal for small & moderate data & query volumes
– database options: 1M, 10M, 50M, 250M & 1B triples
• Instantly deploy new databases when needed
• Zero administration
– automated operations, maintenance & upgrades
• Users pay only for the actual database utilisation
• Standard OpenRDF REST API
Fully managed RDF DB in the Cloud

#28
Database type Max triples
micro 1 million
XS 10 million
S 50 million
M 250 million
L 1 billion
FREE!

• Evaluate the technology
• Instant deployment, faster experimentation
• Faster application development
• Data services / Open Data publishing
• Reducing TCO & risk
Use cases for an RDF DBaaS

• Cloud native architecture, running on AWS
• Designed for elasticity & high availability
– More resources added whenever needed
– Failed nodes replaced immediately
• GraphDB is the RDF DB engine
– OpenRDF REST API
• Isolation of the multi-tenant databases
– Docker containers
– Private NAS volumes (EBS) for data storage

OpenRDF REST API
#32
resource operations comments
/repositories GET Get info on DB repos
/repositories/<REPOSITORY> GET, POST, PUT, DELETE Create*, delete, query a
repository
/repositories/<REPOSITORY>/size GET Gets the number of triples in a
repository
/repositories/<REPOSITORY>/statements GET, POST, PUT, DELETE Add, read, update, delete
statements
repositories/<REPOSITORY>/rdf-graphs/<GRAPH> GET, POST, PUT, DELETE Same as above
/settings GET, PUT Configure the DBaaS*

Uploading data (OpenRDF Workbench)

Uploading data (curl)
#35
API_KEY=…
KEY_SECRET=…
USER=…
DATABASE=…
REPOSITORY=…
SERVICE_ENDPOINT="https://$API_KEY:$KEY_SECRET@rdf.s4.ontotext.com/$USER/$DATABASE"
curl -X POST -H "Content-Type:application/rdf+xml;charset=UTF-8" -T example.rdf
$SERVICE_ENDPOINT/repositories/$REPOSITORY/statements

Uploading data (Java / OpenRDF SDK)
#36
String dbaasURL = "<dbaas URL>";
String repositoryId="<repository ID>";
String pathToTheFile="<pathToTheFile>";
String ApiKey = "<api-key>";
String ApiPass = "<api-pass>";
//The base URI to resolve any relative URIs that are in the data against. String
baseURI="http://www.example.org";
// Create a RemoteRepositoryManager
RemoteRepositoryManager manager = RemoteRepositoryManager.getInstance(dbaasURL, ApiKey,
ApiPass);
// Open a connection to the repository
Repository repository = manager.getRepository(repositoryId);
RepositoryConnection repositoryConnection = repository.getConnection();
// upload RDF data
File fileToUpload=new File(pathToTheFile);
repositoryConnection.add(fileToUpload, baseURI, RDFFormat.RDFXML);
// close the connection
repositoryConnection.close();

Querying data (OpenRDF Workbench)

Querying data (curl)
#39
API_KEY=…
KEY_SECRET=…
USER=…
DATABASE=…
REPOSITORY=…
SERVICE_ENDPOINT="https://$API_KEY:$KEY_SECRET@rdf.s4.ontotext.com/$USER/$DATABASE"
SPARQL_QUERY="…"
curl -X POST -H "Accept:application/sparql-results+xml" -d "query=$SPARQL_QUERY"
$SERVICE_ENDPOINT/repositories/$REPOSITORY

Demo

• (Create a database)
• Create a repository
• Upload sample data
• Query the data
• Explore data with a 3rd party tool
Demo scenario

Create a database
Micro, XS, S, M, or L
R/O access to Open
Data services or
open knowledge
graphs

Create a repository
Inference ruleset
Cache distribution

Sample data (European country
populations)

Exploring data (Metreeca Graph Rover)

Roadmap

• Various improvements (backup & export)
• Gradually introduce XS, S, M and L databases
• Increased availability
– Cross-datacenter replication
• Integration with the GraphDB Workbench
Work in progress

GraphDB Workbench

Key Takeaways

• S4 provides an enterprise RDF DBaaS
• Free graph databases up to 1M triples
• Instantly available whenever needed
• Easy to use: OpenRDF REST services
• Zero administration: automated operations,
maintenance & upgrades
• Resilient design, high availability
• Check out http://s4.ontotext.com
Key Takeaways

• Online documentation
– http://docs.s4.ontotext.com/
• Helpdesk
– http://support.s4.ontotext.com/
• Sample code & demos on GitHub
– https://github.com/Ontotext-AD/S4
• Twitter
– @Ontotext_S4
Additional S4 resources

Thank you!
On-Demand RDF Graph Databases in the Cloud
A link to the recording will be sent out shortly
Jun 11th, 2015

DBaaS architecture on AWS

On-Demand RDF Graph Databases in the Cloud

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to On-Demand RDF Graph Databases in the Cloud

Similar to On-Demand RDF Graph Databases in the Cloud (20)

More from Marin Dimitrov

More from Marin Dimitrov (15)

Recently uploaded

Recently uploaded (20)

On-Demand RDF Graph Databases in the Cloud