Applying large scale text analytics with graph databases

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Applying Large-Scale Text Analytics with Graph Databases
to Visualize Entity and Relationship Inferences
Trung Diep Ronald Sujithan Zhe Wu
Architect Software Architect Architect
Docomo Innovations Docomo Innovations Oracle

Outline
2
• Introduction and overview of graph technologies and graph database
• RDF semantic graph
• Property graph
• Overview of text analytics offered by Data Ninja Services
• Case Study #1: news mining application
• Case Study #2: insights from analyzing Amazon product reviews
• Summary

• Relational Model • Graph Model
Relational Model vs. Graph Model
Courtesy: Tom Sawyer 2016

Two Graph Models: RDF and Property Graph
RDF Data Model
• Data federation
• Knowledge representation
• Inferencing
Social Network
Analysis
 National Intelligence
 Public Safety
 Social Media search
 Marketing - Sentiment
Linked Data /
Semantic Mediation
Property Graph Model
• Graph Search & Analysis
• Big Data analytics
• Entity analytics
 Life Sciences
 Health Care
 Publishing
 Finance
Application Area Graph Model Industry Domain
Release 2 (12.2)
in Oracle Cloud

• World’s fastest data loading performance
• World’s fastest query performance
• Worlds fastest inference performance
• Massive scalability: 1.08 trillion edges
• Platform: Oracle Exadata X4-2 Database Machine
• Source: w3.org/wiki/LargeTripleStores, 9/26/2014
Oracle Database 12c can load, query and
inference millions of RDF graph edges
per second
0.00
0.50
1.00
1.50
2.00
Query Load Inference
1.13
1.42
1.52
Millions of triples per second
World’s Fastest Big Data Graph Benchmark
1 Trillion Triple RDF Benchmark with Oracle Spatial and Graph

What is RDF
 A graph data model for web resources
and their relationships
 The graph can be serialized into
- RDF/XML, N3, N-TRIPLE, …
 Construction unit: Triple
(or assertion, or fact)
<http://foobar> <:produces> <:mp3>
 Quads (named graphs) add context,
provenance, identification, etc. to
assertions
<http://foobar> <:produces> <:mp3 > <:ProductGraph>
Subject Predicate Object
http://www.foobar.com
“CA”
http://www.foobar.com/products/mp3
http://…/locatedIn
http://…/produce
http://www.oracle.com
http://www.oracle.com/products/RDF
http://…/producehttp://…/uses
6

RDF Semantic Graph Technologies Partners
Ontology Engineering & Visualization
Open Source Frameworks Standards
External Reasoners
Applications & Tools SI / Consulting
SesameJoseki
NLP Entity Extractors

 The advantage of Oracle RDF Triple store:
– Greater flexibility that single purpose triple stores
– SPARQL and SQL interaction with relationally stored data
– Use of SQL Hints, indexes and caching to increase performances
– Standard DB Administration : Backup/recovery/replication, etc…
– PL/SQL or Java programming
– Supports large volumes of data (100’s of billions to over a trillion)
– Good integration with standard RDF client tools such as Jena and Sesame
Why Oracle Spatial & Graph for Linked Data?
Oracle Semantic Graph in a scientific knowledge portal Date 16-09-2013

GeoSPARQL Support for Spatial Data
Enterprise Data Servers
Spatial Database
Population Statistics
Database
Relational Schema 2D Feature Schema
Web Analyst 1 Web Analyst 2
Linked Data Graphs
Pop_Stat_Graph Spatial_Graph
SPARQL/GeoSPARQL
Spatial Vocabularies
Rest

Enriching Text Using NLP and Domain Ontologies
NLP
Machine Learning
Genzyme
ontologies
Search, Presentation, Report,
Visualization, Query

Data Ninja Text Analytics Cloud Services
12
Text
Analytics
Ontology
(RDF)
Oracle Social Cloud
Unstructured Data
Semantic Extractor
Relational Table
Oracle Spatial and Graph
Graph
Analytics
Graph
Visualization
Structured Data
New Business
Insights
by making graph inferences that could not
be queried in a relational database

RDF Graph Roadmap
• SPARQL optimization with RDBMS kernel
• SNA Analysis: Cluster, path analysis, community detection, page rank...
• Manageable: Enterprise Developer integration
• R2RML Enhancements: Geospatial (vector) features
• Deeper RDBMS kernel: Graph computation
• Standards based: OWL QL
• Multi-type support: graph, relational, JSON, text, geospatial …
• Visualization: Richer graph visualization options
13

Property Graph
14

The Property Graph Data Model
• A set of vertices (or nodes)
– each vertex has a unique identifier.
– each vertex has a set of in/out edges.
– each vertex has a collection of key-value
properties.
• A set of edges (or links)
– each edge has a unique identifier.
– each edge has a head/tail vertex.
– each edge has a label denoting type of
relationship between two vertices.
– each edge has a collection of key-value
properties.
https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
15
3
1
6
4
2
5
weight=0.4
weight=1.0
weight=0.2
weight=0.4
9
8
7
weight=0.5
10
12
11
knows
knows
created
created
created
created
weight=1.0
name= “ripple”
lang = “java”
name= “lop”
lang = “java”
name= “peter”
age = 35
name=“josh”
age = 32
name = “vadas”
age = 27
name=“marko”
age = 29

Graph Analysis in Business
Purchase Record
customer items
Product Recommendation Influencer Identification
Communication
Stream (e.g. tweets)
Graph Pattern MatchingCommunity Detection
Recommend the most
similar item purchased by
similar people
Find out people that are
central in the given
network – e.g. influencer
marketing
Identify group of people
that are close to each other
– e.g. target group
marketing
Find out all the sets of
entities that match to the
given pattern – e.g. fraud
detection
16

Oracle Big Data Spatial and Graph
Data Access Layer
Architecture of Existing Property Graph Support
Graph Analytics
Apache Blueprints & Lucene/SolrCloud
RDF (RDF/XML, N-
Triples, N-Quads,
TriG,N3,JSON)
REST/WebService
Java,Groovy,Python,…
Java APIs
Java APIs/JDBC/SQL/PLSQL Property graph formats
supported
GraphML
GML
Graph-SON
Flat Files
CSV
Relational Data Sources
Oracle NoSQL
Database
Apache HBase
Parallel In-Memory Graph Analytics (PGX)
Oracle Database
12.2
Java SDK
Java APIs

Support for Cytoscape Open Source Visualization

Integration with Tom Sawyer Perspectives
via property graph REST APIs

In-Memory Analyst on 1 node is up to 2 orders of magnitude faster than
Spark GraphX distributed execution on 2 to 16 nodes
Oracle’s In-Memory Analyst vs Spark GraphX 1.1
20
0.1
1
10
100
1000
10000
Oracle
Spark(2)
Spark(4)
Spark(8)
Spark(16)
Oracle
Spark(2)
Spark(4)
Spark(8)
Spark(16)
Twitter Web
ExecutionTime(secs)
1
10
100
1000
10000
Oracle
Spark(2)
Spark(4)
Spark(8)
Spark(16)
Oracle
Spark(2)
Spark(4)
Spark(8)
Spark(16)
Twitter Web
ExecutionTime(secs)
Single-Source Shortest PathPagerank

…
Data Access Layer
Roadmap for Property Graph Support
Apache TinkerPop3 & Lucene/SolrCloud/ElasticSearch
RDF (RDF/XML, N-
Triples, N-Quads,
TriG,N3,JSON)
REST/WebService
Java,Groovy,Python,…
Java APIs
Java APIs/JDBC/SQL/PLSQL Property graph formats
supported
GraphML
GML
Graph-SON
Flat Files
CSV
Relational Data Sources
21
Oracle
NoSQL
Database
Apache
HBase
Oracle Database
12.2
Apache Spark
Integration
(ML lib, SPARK-SQL)
Deep Learning
(Neural Networks)
Graph Analytics
Parallel In-Memory Graph Analytics (PGX)
Apache
Cassandra
Java SDK

Case Study: News Mining Application
22

Text Analytics API in N-TRIPLE Format
23
Free-form Texts
Structured Data
DocumentsMessages
News
Concepts Categories Entities Sentiments
• Cloud-based web services
• Daily updated knowledge base
• Support for customization
• Scalable performance
Text Analytics

Text Analytics API for Constructing RDF Graphs
24
Free-form Texts
N-Triples
DocumentsTweets
News
Concepts Categories Entities Sentiments
Text Analytics API RDF Graphs
Concepts
Categories
Entities
Entity Categories
Texts
Ontology

News Mining Overview
newsID newsArticle newsSource
20160902_555 A new study says that parts of Africa and the Asia-Pacific region
may be vulnerable to outbreaks of the Zika virus, including some of
the world's most populous countries and many with limited
resources to identify and respond to the mosquito-borne disease.
[more]
http://www.newkerala.com/news
/2016/fullnews-113309.html
20160903_1317 Hurricane Hermine, set to cause flooding and damage when it hits
Florida overnight, will make it harder for the state to fight Zika, a
mosquito-borne virus shown to cause birth defects, experts in
infectious diseases and mosquitoes said on Thursday. [more]
http://kelo.com/news/articles/20
16/sep/01/hurricane-hermine-
will-complicate-floridas-zika-fight-
experts/
20160904_2209 Singapore confirmed 26 more cases of locally transmitted Zika
infections, the health ministry and National Environment Agency
(NEA) said in a joint statement on Saturday, bringing the tally to
215. Of the 26 new cases, 24 were linked to existing clusters while
two cases have no known links to any existing cluster, they said.
[more]
https://www.yahoo.com/news/sin
gapore-says-confirms-26-more-
local-transmission-zika-
052937119--finance.html
… … …
• Domain-specific, health-
related news crawling
• English language only
• Worldwide coverage
• Healthcare-related keywords
in news titles
25

RDF Graph Example of Extracted Entities
Subject Predicate Object
http://www.newkerala.com/news/2016/fullnews-
113309.html
http://dataninja.net/occurrence urn:uuid:e47c4916-e7c1-4a3b-b650-f243e0d7ba33
urn:uuid:e47c4916-e7c1-4a3b-b650-
f243e0d7ba33
http://dataninja.net/entity http://dataninja.net/entity/Zika+virus
f243e0d7ba33
http://dataninja.net/occurrence/entity/sentiment http://dataninja.net/entity/sentiment/negative
f243e0d7ba33
http://dataninja.net/occurrence/entity/count "12"^^xsd:integer
f243e0d7ba33
http://dataninja.net/occurrence/entity/sentiment_score “-1.0"^^xsd:float
f243e0d7ba33
http://dataninja.net/occurrence/entity/score "1.0"^^xsd:float
f243e0d7ba33
http://dataninja.net/occurrence/entity/text_locations "(135,145) (565,575) (777,787) (950,960) (1142,1152)
(1535,1545) (1696,1706) (1755,1765) (1887,1891)
(2191,2195) (2352,2362) (2376,2386)"
(265 more for same news article)
26

RDF Graphs for Extracted Entities (one news article)
27
http://www.newkerala.com/news/2016/fullnews-113309.html
urn:uuid:e47c4916-e7c1-4a3b-b650-f243e0d7ba33
http://dataninja.net/entity/Zika+virus
negative
12
…
http://dataninja.net/occurrence
http://dataninja.net/entity
http://dataninja.net/occurrence/entity/sentiment
http://dataninja.net/occurrence/entity/count
http://dataninja.net/entity/Philippines
http://dataninja.net/entity/Thailand
http://dataninja.net/entity/Nigeria
One occurrence-blank node for each
extracted entity

RDF Graphs for Extracted Entities (multiple articles)
28
urn:uuid:e47c4916-e7c1-4a3b-b650-f243e0d7ba33
urn:uuid:68282cbb-b70c-4f6e-8157-5ef6b1d34d31
https://www.yahoo.com/news/singapore-says-confirms-26-more-local-transmission-zika-
urn:uuid:ab7b9e43-710f-436e-b6ff-15abad71ca15
Same URI for same entity

Ontology for Extracted Entities
29
http://dataninja.net/entity/category/Location http://dataninja.net/entity/category/Country
http://dataninja.net/entity/category/Kingdom
rdfs:subClassOf
Ontology extracted for categories of entities
rdfs:subClassOf
rdfs:subClassOf

Ontology for Extracted Entities (with more categories)
30
http://dataninja.net/entity/category/Location
http://dataninja.net/entity/category/Country
rdfs:subClassOf
http://dataninja.net/category/Southeast+Asia
http://dataninja.net/entity/category/Kingdom
http://dataninja.net/category/Regions+of+Asia
Additional categories of entities added to ontology
http://dataninja.net/category/Africa
rdfs:subClassOf
rdfs:subClassOf
rdfs:subClassOf
rdfs:subClassOf
rdfs:subClassOf

RDF Graphs for Extracted Concepts (one news article)
31
urn:uuid:3f365159-2572-4c91-99ea-0f7ec7c0b7bc
http://dataninja.net/concept/Zika+virus
0.33
http://dataninja.net/concept
http://dataninja.net/occurrence/concept/score
http://dataninja.net/entity/Zika+fever
Same URI for same concepts, but not for entities
with same names
owl:sameAs

RDF Graphs for Extracted Concepts (with categories)
32
http://dataninja.net/concept/Zika+virus
http://dataninja.net/concept/Zika+feverhttp://dataninja.net/category/Flaviviruses
http://dataninja.net/category/Zoonoses http://dataninja.net/category/Viral+diseases
http://dataninja.net/category/Infectious+diseases
rdfs:subClassOf
rdfs:subClassOf
More categories of concepts added to improve
richness of ontology
rdfs:subClassOf
rdfs:subClassOfrdfs:subClassOf

RDF Graphs for Extracted Relationships
33
https://www.yahoo.com/news/singapore-says-confirms-26-more-local-transmission-zika-
http://dataninja.net/entity/Zika+virus http://dataninja.net/entity/Singapore
http://dataninja.net/entity
http://dataninja.net/relationship/Outbreak
http://dataninja.net/relationship/Mosquitoes
http://dataninja.net/relationship/Infections
New relationships
discovered over time to
enrich the ontology
further
owl:intersectionOf

Semantic Search using RDF Graphs
34
Documents
Documents
News
Articles
Oracle Spatial
and Graph
Concepts, related concepts,
categories, entities, entity
categories, keywords, relationships
Relevant
Matched
News
Articles
Oracle Graph
Analytics
Queries
RDF Graph

Case Study: Insights from analyzing Amazon
Product Reviews
35

Amazon Product Reviews – PG Data Model
36
A
1 5
Helpful
reviewText
Overall
Summary
reviewTime
Review
asin=“0000078”
name=“John” Raw JSON Format:
{"reviewerID": "A3AF8FFZAZYNE5",
"asin": "0000000078",
"helpful": [1, 1],
"reviewText": “…”,
"overall": 5.0,
"summary": "Impactful!",
"unixReviewTime": 1092182400,
"reviewTime": "08 11, 2004"}
B C
3
D
2
Review Review Review ReviewReview
name=“Sue” name=“buy1” name=“shopper”
asin=“10467328” asin=“00675434” asin=“20794378”

Amazon Product Reviews – Data Ninja Enrichment
37
A
1 5
helpful
reviewText
overall
summary
reviewTime
sentiment
sentimentScore
Review
asin=“0000078”
name=“John”
B C
3
D
2
Review Review Review ReviewReview
name=“Sue” name=“buy1” name=“shopper”
asin=“10467328” asin=“00675434” asin=“20794378”
JSON
Parser
Fetch
Sentiment
Create
Nodes
Create
Relationship
Oracle
Connector
Product Review
Oracle NoSQL DatabaseApache HBase
Product Review
Product Review

Demo — Data Ninja Integration
# Please sign-up at https://market.mashape.com/dataninja/smart-content
# and obtain your free Data Ninja API key.
# Alternatively, you can use the Amazon Web Services API Gateway
# using your AWS account): https://auth.dataninja.net/cart
smartcontent_url = 'https://smartcontent.dataninja.net/smartcontent/tag'
mashape_key = ‘YOUR_API_KEY_HERE’
headers = {'Content-Type': 'application/json',
'Accept': 'application/json',
'X-Mashape-User': user_name,
'X-Mashape-Key': mashape_key}

Demo — Data Ninja Integration
def getSmartSentiment(text):
payload = {'text': text}
r = requests.post(smartcontent_url, headers=headers,
data=json.dumps(payload))
data = r.json()
# Extract the sentiment and sentiment_score from output
sentiment = ''
if 'sentiment' in data:
sentiment = data['sentiment']
sentScore = 0.0
if 'sentiment_score' in data:
sentScore = data['sentiment_score']
return sentiment, sentScore

Demo — Initialization
# Log into the Oracle Big Data Lite VM
cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy/sh
gremlin-opg-nosql.sh
server = new ArrayList<String>();
server.add(“localhost:5000");
cfg = GraphConfigBuilder.forPropertyGraphNosql()
.setName(“aws_review").setStoreName("kvstore")
.setHosts(server)
.addVertexProperty("name", PropertyType.STRING, “EMPTY_NAME")
.addEdgeProperty("overall", PropertyType.DOUBLE, "0.0")
.addEdgeProperty("sentimentScore", PropertyType.DOUBLE, "0.0")
.addEdgeProperty("sentiment", PropertyType.STRING, "NO_SENTIMENT")
.addEdgeProperty("reviewText", PropertyType.STRING, "NO_REVIEW")
.setMaxNumConnections(2).build();

Demo — Create session
// Create an in-memory instance of our property graph using
// the configuration from the previous step
opg = OraclePropertyGraph.getInstance(cfg);
// Create a new Analyst session and read the graph from database
// into memory — this will allow us to perform PGQL queries
// efficiently and run built-in graph algorithms
session = Pgx.createSession("session1");
analyst = session.createAnalyst();
pgxGraph = session.readGraphWithProperties(cfg);

Demo — PGQL queries
// PGQL is a SQL-like query language for Property Graphs
// http://pgql-lang.org/
query1 = “SELECT n, e, e.overall, e.sentimentScore, m ” +
“WHERE (n) -[e]-> (m) LIMIT 10”;
pgxResultSet=pgxGraph.queryPgql(query1);
pgxResultSet.print(10);
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| n | e | m |
===============================================================================================
| PgxVertex[ID=-7340878287527889238] | PgxEdge[ID=5762] | PgxVertex[ID=-9102601091582098129] |
| PgxVertex[ID=4519911688218637303] | PgxEdge[ID=17019] | PgxVertex[ID=-8952286227085815033] |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Demo — Aggregate queries (1)
// Example1: Disagreement in polarity: high rating and low sentiment score
query2 = “SELECT n.name, e.overall, e.sentimentScore, e.reviewText, m “ +
“WHERE (n) -[e with overall > 4.0 and sentimentScore < -0.9]-> (m) “ +
“order by e.sentimentScore LIMIT 10”;
pgxResultSet=pgxGraph.queryPgql("query2");
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| n.name | e.overall | e.sentimentScore | e.reviewText | m |
===============================================================================================================================================================================================
| Kiwi | 5.0 | -0.90514827 | She climbed out of the cockpit of her Fairey Barracuda and became instantly famo | PgxVertex[ID=1878509548385937579] |
| Gary Selikow | 5.0 | -0.90514827 | The Holocaust A History of the Jews of Europe During the Second World War , by p | PgxVertex[ID=9122138607977681669] |
| Miss Calculation "Mathbaby" | 5.0 | -0.90514827 | There I was. Probably the only one in the movie theater above the age of thirtee | PgxVertex[ID=-611636155378504919] |
| Srinivas P. Ganti "prasad" | 5.0 | -0.90514827 | In a very exhaustive account of Middle Eastern politics, Friedman narrates, base | PgxVertex[ID=7872217753950946849] |
| Bluestalking Reader "Bluestalking Reader" | 5.0 | -0.90514827 | I guess the only way to do this is just plunge right in, though of all the books | PgxVertex[ID=4467821667800686818] |
| Bonnie Brody "Book Lover and Knitter" | 5.0 | -0.90514827 | Joyce Carol Oates has written a deeply felt memoir, `A Widow's Story', following | PgxVertex[ID=4467821667800686818] |
| Stephen Frater | 5.0 | -0.90514827 | Book reviewBy STEPHEN FRATER, author of HELL ABOVE EARTHLOST IN SHANGRI-LA: | PgxVertex[ID=5830558107292558467] |
| Cy B. Hilterman "Cy. Hilterman" | 5.0 | -0.90514827 | A true historic story of survival in the jungles of New Guinea amidst natives wh | PgxVertex[ID=5830558107292558467] |
| Cy B. Hilterman "Cy. Hilterman" | 5.0 | -0.90514827 | What a delightful read! Water for Elephants has got to be one of the best reads | PgxVertex[ID=5894498295248166816] |
| John Umland | 5.0 | -0.90514827 | I read Unbroken in two days. I will summarize the story, mention the author's ef | PgxVertex[ID=-5439053811866244671] |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Demo — Aggregate queries (2)
// Example2: Disagreement in polarity: low rating and high sentiment score
query3 = “SELECT n.name, e.overall, e.sentimentScore, e.reviewText, m “ +
“WHERE (n) -[e with overall < 2.0 and sentimentScore > 0.9]-> (m) “ +
“order by e.sentimentScore LIMIT 10”;
pgxResultSet=pgxGraph.queryPgql("query3");
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| n.name | e.overall | e.sentimentScore | e.reviewText | m |
=====================================================================================================================================================================================
| Amazon Customer | 1.0 | 0.90086615 | This title is deceptive-- no one knows what an "Annual" is, except for t | PgxVertex[ID=-5918987544460979951] |
| Galina | 1.0 | 0.90110934 | This book takes many, many pages to say in a remarkably roundabout and flowery w | PgxVertex[ID=-4968252386747415161] |
| Elizebeth Neumann | 1.0 | 0.90114343 | Unless you enjoy reading a book as interesting as the dictionary this book isnt | PgxVertex[ID=5415848389720693761] |
| Doug Rice | 1.0 | 0.90118825 | A dictionary should demonstrate good lexicographic technique and have an up-to-d | PgxVertex[ID=-8360052157946045560] |
| Doug Rice | 1.0 | 0.9011979 | A dictionary should demonstrate good lexicographic technique and have an up-to-d | PgxVertex[ID=-5498908216507816124] |
| Kindle Reader "Kindle Reader" | 1.0 | 0.90166533 | This was positively the most frustrating book I have ever read. Where others mi | PgxVertex[ID=-4463070554159192016] |
| Alessandro Bruno | 1.0 | 0.90180194 | I felt compelled to review this book in order to shake off that feeling of intel | PgxVertex[ID=3280190210596483762] |
| Hiwaycruzer | 1.0 | 0.9018065 | This book is a must read for all teenagers considering a career at nearby Disney | PgxVertex[ID=3280190210596483762] |
| Jackal | 1.0 | 0.9019033 | This is a boring book about traditional Russian cooking. If you want current Rus | PgxVertex[ID=2190854144543979320] |
| Amazon Customer "Sci-reader" | 1.0 | 0.90192723 | I just finished this book and I must ay that it was a spectacularly boring coll | PgxVertex[ID=-6659798236378008734] |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Demo — Graph Algoritms
// Personalized Pagerank
vertexSet = pgxGraph.createVertexSet();
vertex = pgxGraph.getVertex(4681900072665192241L);
vertexSet.add(vertex);
ppr = analyst.personalizedPagerank(pgxGraph, vertexSet);
it = ppr.getTopKValues(10); // iterate over the top-K values
// Community detection, Path Analysis, Clustering, …

Summary
• Introduction and overview of graph technologies and graph database
• RDF Semantic Graph
• Property Graph
• Integrating text analytics with graph technologies
• Construct graph out of text using Natural Language Understanding technologies
• Enrich graph data with text analytics
• Data Ninja Services Java client for Oracle Spatial and Graph available with
the Oracle Big Data Lite Virtual Machine
• Please try it and give us your feedback!
• Contact us at alan.wu@oracle.com or tdiep@dataninja.net

Resources
• Oracle Spatial and Graph
oracle.com/technetwork/database/options/spatialandgraph
• Oracle Big Data Spatial and Graph
oracle.com/database/big-data-spatial-and-graph/index.html
• Data Ninja Services
https://dataninja.net
• Java SDK for Oracle Spatial and Graph
https://github.com/DataNinjaAPI/dataninja-api-oracle-sdk-java

BACKUP
48

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved. 49
Semantic Alignment of Enterprise Metadata
Powering Enterprise Federation and Integration
Benefits:
– Existing relational data stays in place
and corresponding applications do not
need to change
– Use of virtual mapping eliminates
synchronization issues
– Common vocabulary helps with data
integration issues
Database Server
HR Schema Inventory Schema Sales Schema
Mid-Tier Server
Application 1
Application 2 Application 3
SQL RDF Graph
Inventory Graph Sales Graph
Shared Ontologies
SPARQL
HR Database Inventory Database Sales Database

The National Statistics Center
(NSTAC), an incorporated
administrative agency, forms a
part of the central statistical
organization in Japan.
The Database of IMISOS has
been Exadata X2-2 Half Rack
since 2013,with Active Data
Guard option and Database
Firewall. Oracle Japan
published customer case study.
NSTAC also bought Exadata X3-
2 Eighth Rack for the
Tabulation Work. (FY14Q4)
Other Exadata opportunity for
population census will be
closed by FY15Q3.
50
http://www.nstac.go.jp/en/index.html

Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved. 51
• Pattern matching on relational tables
• Supports W3C RDF & SPARQL standard
• Automatic and custom mapping
• RDF views: on tables, views, SQL query
results
• No duplication of data and storage
• Direct Mapping – Automatic
• R2RML - express customized mappings
RDF Semantic Graph
RDF Views on Relational Tables
EmpNo Ename Job Mgr DeptNo
7521 Ward Salesman 7698 10
7698 Blake Manager 7839 10
7839 King President 30
DeptNo LOC
10 NYC
30 CHI
Ward Blake King
Salesman Manager President
:emp7521 :emp7698 :emp7839
:dept10 :dept30
NYC CHI
:name :name :name
:job :job :job
:hasMgr :hasMgr
:worksAt :worksAt :worksAt
:location :location

Text Search through Apache Lucene/SolrCloud
• Integration with Apache Lucene & SolrCloud
• Support manual and auto indexing of Graph elements
• Manual index:
• oraclePropertyGraph.createIndex(“my_index", Vertex.class);
• indexVertices = oraclePropertyGraph.getIndex(“my_index” ,
Vertex.class);
• indexVertices.put(“key”, “value”, myVertex);
• Auto Index
• oraclePropertyGraph.createKeyIndex(“name”, Edge.class);
• oraclePropertyGraph.getEdges(“name”, “*hello*world”);
• Enables queries to use syntax like “*oracle* or *graph*”
52

Applying large scale text analytics with graph databases

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to Applying large scale text analytics with graph databases

Similar to Applying large scale text analytics with graph databases (20)

Recently uploaded

Recently uploaded (20)

Applying large scale text analytics with graph databases