3. 22
Accessing the right information is challenging
Diverse Range of Specialisations
Information Seeking Behaviour
Information is Silo’ed
Information Hierarchy
5. 44
Building a Linked Data Warehouse demo
Excel Reports
XML File
RDF
Management
Triple
Store
Model
UI
S O
ETL Platform
OData
+
OData4Sparql
Sparql
+
Linked Data Warehouse Data Access Exploration
6. Linked Data and Model
• Traditional approaches try to identify how the data is to be “captured”
upfront.
• You can do this with the linked data model
• But we don’t…..Why?
• Always leads to “Paralysis by Analysis”
• You will miss so much.
• And take a huge amount of time doing it.
• You will find that there is a huge amount of
information and relationships you never would
of thought if starting from the model.
• Then there are tricks you can do to add huge
value
• The data model evolves very rapidly from the
data and can be further tweaked at anytime.
Let the data express itself
• Source by source, row by row let the data tell
you what it is describing.
• What it is, what relationships and metadata it
has.
• You’ll find a lot more information that you
simply couldn’t describe in a RDMS
• Another source can add to an existing item
without you even having to think
9. ETL & Linked Data Creation & Management
In4mium Talend modules
• Semantic modules ready to use through
configuration in Talend
• No API knowledge required by users
• Range of modules (over 60 ) for all
aspects of linked data creation and
management
• Create fully semantic apps
• Or pick and mix with traditional
aspects
• Works seamlessly with existing Talend
environment and modules
• Model driven behaviours are now
possible
• Easily add sematic technologies into
existing service architectures
• All the benefits without the hassle
10. 99
OData4Sparql – Simplifying integration
+
• Brings together the strength of a ubiquitous RESTful
interface standard (OData) with the flexibility, federation
ability of RDF/SPARQL.
• SPARQL/OData Interop proposed W3C interoperation proxy
between OData and SPARQL (Kal Ahmed, 2013)
• Opens up many popular user-interface development
frameworks and tools such as Kendo UI, SAPUI5, etc.
• Acts as a Janus-point between application development and
data-sources.
• User interface developers are not, and do not want to be,
database developers. Therefore they want to use a
standardized interface that abstracts away the database,
even to the extent of what type of database: RDBMS,
NoSQL, or RDF/SPARQL
• By providing an OData4SPARQL server, it opens up any
SPARQL data-source to the C#/LINQ development world.
• Opens up many productivity tools such as
Excel/PowerQuery, and SharePoint to be consumers of
SPARQL data such as Dbpedia, Chembl, Chebi, BioPax
and any of the Linked Open Data endpoints!
• Microsoft has been joined by IBM and SAP using OData as
their primary interface method which means there will many
application developers familiar with OData as the means to
communicate with a backend data source.
11. 1010
Model Driven UI
Linklaters Data Model Northwind Data Model
Things
Sample Query Sample Query
Relationships
between
Things
Things
Relationships
between
Things
13. 1212
Strings to Things to Facts
Click on a ‘thing’
displays a ‘Lens’
about that ‘thing’
that shows different
fragments that
displays facts about
the thing
The ‘About’
fragment shows
most relevant
information.
Compare with the
Google
knowledge graph
The ‘Person
Involved’
fragment list all
persons involved
with the matter
The ‘Financial
Summary’
calculates a
financial
summary
… and we can find
associated deal
‘things’. If we want
more details about
any ‘thing’ we can
now navigate to its
‘lens’
14. 1313
Lens Discovery
Navigating through
‘Gerald Grant’, the
managing partner
for the Matter, takes
us to his Lens
Navigating through
the associated deal
takes us to that
deal’s Lens
Or show the Lens
on the client of the
matter
One is not limited to
facts within the
application. In the
case of a client we
can navigate to their
Companies House
page (or it could
have been D&B,
LinkDocs etc)
15. 1414
Composing Questions
Advanced Searches can
be selected from the list
which then displays a
query in a different format
that allows better control
over the search
Advanced Searches can
be selected from the list
which then displays a
query in a different format
that allows better control
over the search
The advanced search
allows conditions to be
added that link to other
‘things’ or limit the values
of ‘facts’ about the
associated ‘thing’. This
allows much more precise
searches to be executed
16. 1515
OData integration with Excel Power Query/Pivot
OData
OData4Sparql
Power Query Data Grabber/Shaper
• Build queries and utilise expand to traverse graph
• Limited data transformation can be incorporated into
the queries
• Create multiple views
Power Pivot Self Service BI
• Integrate across Power Queries and
other sources to build ROLAP models
• Explore model with Pivot tables
Power
View
Power
Map
Pivots, Charts
& Grids
Tableau,
etc.
Power Query
Power Pivot
18. 1717
Linked Data has delivered
• Elimination of silos through creation of logical
data warehouse that is extensible across internal
and external data sources
• Enabled “find and explore” information seeking
behaviours
• Separation of data modelling from integration
provides for easy addition of internal & external
data
• Ability to support diverse range of specialised
domain views onto data
• Introduces a Service Orientated Data
Architecture simplifying application
development
• Based on W3C web standards providing future
proofing and protection of firms IP (data
models)
19. 1818
Building a Linked Data Warehouse pilot
RDF
Management
Triple
Store
Model
UI
S O
ETL Platform
OData
+
OData4Sparql
Sparql
+
Matter
Time
People
Financials
Deal
Finder
Client
Book
Client
Engage
K_Docs
SAP
One FTE (2x0.5) and nine months delivered
• Integrated 3 years and 9 months of data from 9 sources
• 24 million triples
• 62 Things (People, Matters, Clients, etc.)
• 127 Relationships between Things
• 223 Data attributes
In this picture we show just two In4mium modules being used alongside standard Talend modules.
This workflow is showing filters, transformations and lookup joins before the data is converted to RDF.
It is the Rdfiser that converts the standard data on the flow to RDF.
The RDf can then be managed in triple stores or as in this case written to files.
The RDFizer is itself model driven as it uses an RDF r2rml configuration file.
The talend job can be deployed as a stand alone java executable or deployed as a web service within your architecture.
Foundation Platform: Talend
Gartner Magic Quadrant
Open Studio and enterprise versions
Composable visual java development environment
Solution frameworks for
Integration, BPM, MDM, ESB, Data Quality, Big data
Configuration
1000’s of module to configure into applications
ETL, Amazon Cloud, Hadoop, BI
Modules are java injection routines
Well supported community
Highly scalable efficient code generation
Deployable as within service architectures
Adds to your existing architecture
Not a rip and replace!
BUT Lacks any knowledge of Semantic data handling and management