Enterprise linked data clouds

Enterprise Linked Data Clouds
Dr. Giovanni Tummarello
DERI Institute
CEO SindiceTech

An “Intense” definition

Enterprise – Linked Data – Clouds

• Enterprise not all of them
• Linked Data is not exactly what you get when
you google up
• Cloud has a double meaning

Knowledge Intensive Enterprises

• Those that will live and dies by their ability to
incorporate new diversely structured
knowledge in their processes and products
– Examples:
• Health Care Life Science
• Scientific and Technical Publishing
• Defense, Intelligence
• …

Example story (Pharmaceutical company0
To stay competitive, Pharmaceutical companies need to leverage all the data
available from inside sources as well as from the increasingly many public
HCLS data sources available. Due to the diversity of this data with respect to
nature, formats, quality, there are complex integration issues . Goals:

• The ability to speed up “In silico” scientific workflows
• The ability to create large scale “data maps” or “aggregated views”
• The ability to receive recommendations and suggestions for new data
connections
• Provide their R&D departments with superior tools for investigating their
internal knowledge; search engines and data browsing tools
• The ability to leverage the ever increasing body of public, crowd curated
open data

4 of 16

A very simple HCLS data schema

Linked Data

• We here refer to the basic tools of the
“Semantic Web”
– RDF
– SPARQL
– Little more 

Tim Berners Lee
WWWSemantic Web

Tim Berners-Lee, CERN March 1989 Information Management: A Proposal

Data+Metadata, together.

Metadata + Data  RDF Stream 

And this data..

• IS BIG
• Can be Fast
• IS Extremely Variable

• Gartner’s 3v: Volume Velocity Variability

Scale is only 1 dimension

Multiple dimensions of WeD data integration
• RDF tool stack  flexibility
• Cluster scalable processing  scalability
• “Cloud” Pipelines  dynamicity

How we started : a search engine for
the web of data (Sindice.com)

Web of data
650,000,000 Knowledge Graphs  5 TB + of “Big Knowledge
Data”data.

SindiceTech
• Incorporating requirements from enterprises
– Scientific and Technical content companies
– Defense
– Pharma and Biotech
• Inheriting 5 years of IP with R&D on:
– Semantic Technologies  RDF and a pragmatic
stack around it
– Handle very large amount of Knowledge Data
• Hadoop/NOSQL
• Semantic Information Retrieval

Source
BI / DSS
Systems

RDBMS Pivot
Pipeline Composer UI Browser

S3 Semantic IR (SIRen)
SparQLed

Loaders / Outbox
Adaptors / Inbox
Integration
Transformati Solr
HDFS on &
Analytics No SQL
FTP Pipeline
RDBMS

Semantic Layer (RDF)

Event Logging (Splunk / Logstack)
3rd Party
Big Data Layer (Hadoop, Hive, Pig) / Cloudera BI / DSS
e.g. SAS
Other Cloud Layer (e.g. Amazon, Openstack) HPA

Middleware for Big Knowledge Processing

Cloud SpaceSemantic Sandboxes

16 of 16

Full Json Like Search.
On Solr.
All operators supported.

SIREn: Semantic IR Engine

• Extension to Enterprise Search Engine Solr
• Semantic, full-text, incremental updates,
distributed search
Semantic
SIREn
Databases

Constant time

Relational Faceted Browsing. At speed of light

Patent Pending

Thank you

With the contribution of

Enterprise linked data clouds

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Enterprise linked data clouds

Similar to Enterprise linked data clouds (20)

Recently uploaded

Recently uploaded (20)

Enterprise linked data clouds