SlideShare a Scribd company logo
1 of 30
Download to read offline
REUTERS / Danish Ismail
BUILDING A KNOWLEDGE GRAPH
DAN BENNETT - GRAPH DAY 2018
@nonodename
SEPTEMBER, 2018
AGENDA
• A little on TR
• What’s a knowledge graph?
• Quick reset on RDF - if needed
• Data engineering for our knowledge graph
• Lessons learned
• Q&A
A LITTLE ON THOMSON REUTERS
THOMSON REUTERS - THE ANSWER COMPANY
• Information, technology and
expertise for professionals
• Focus on finance, risk, media,
legal, tax and accounting
markets
• 87% recurring revenue, 93%
electronic, global footprint
• My role: big data & NLP within
central technology group
supporting market aligned
business units
REUTERS/Amit Dave
CONTENT, NOT DATA
WHAT’S A KNOWLEDGE GRAPH?
WHAT IS A KNOWLEDGE GRAPH?
• Open world representation of
information
• Every entry point is equal cost
• Underpin Cortana, Google
Assistant, Siri, Alexa
• Typically (but doesn’t have to
be) expressed in RDF
Score
Team
Team
Game
6-1
Venue
Panama
England
Nizhy Novgorod Stadium
Score
Score
Score
Stones8
hasName
hasLogo
hasFinalScore played
hasName
hasLogo
playedAt
hasName
hasQuarter
atTime
byPlayer
UH OH
QUICK RESET ON RDF
SCHEMA ON WRITE
• Fixed data model
• Slow to change
• Strong enforcement
SCHEMA ON READ
• Capture everything
• Apply logic (schema) on read
• No standards
RDF: SCHEMA ON READ, OPTIONAL ON WRITE
Schema on Read Schema on Write
Accuracy
Difficult & slow to
change
Anything goes
Federated
RDF
Standards
(potentially) verbose
Triggers/Stored Procs/IDs
Referential integrity
on write
Referential integrity
on read
Super flexible
Capture everything
Flexible
HOW CAN THAT BE? (SIMPLIFIED!)
ID Date Amount Customer
1 30-Aug-2016 56.84 1
2 31-Aug-2016 42.36 2
3 1-Sep-2016 98.45 1
4 1-Sep-2016 23.54 3
ID Name
1 Barack Obama
2 Richard Nixon
3 Ronald Reagan
4 Bill Clinton
Orders Customers
Subject Predicate Object
http://tr.com/orders/1 http://ont.tr.com/orders/order_date 20160830
http://tr.com/orders/1 http://ont.tr.com/orders/order_amount 56.84
http://tr.com/orders/1 http://ont.tr.com/orders/order_customer http://tr.com/
customers/1http://tr.com/orders/1 http://www.w3.org/1999/02/22-rdf-syntax-
ns#type
http://ont.tr.com/order
http://tr.com/orders/2 http://ont.tr.com/orders/order_date 20160831
http://tr.com/orders/2 http://ont.tr.com/orders/order_amount 42.36
http://tr.com/orders/2 http://ont.tr.com/orders/order_customer http://tr.com/
customers/2http://tr.com/orders/2 http://www.w3.org/1999/02/22-rdf-syntax-
ns#type
http://ont.tr.com/order
… … …
http://tr.com/
customers/1
http://ont.tr.com/customers/name Barack Obama
http://tr.com/
customers/1
http://www.w3.org/1999/02/22-rdf-syntax-
ns#type
http://ont.tr.com/
customerhttp://tr.com/
customers/2
http://ont.tr.com/customers/name Richard Nixon
http://tr.com/
customers/2
http://www.w3.org/1999/02/22-rdf-syntax-
ns#type
http://ont.tr.com/
customer… … …
RelationalRDF
• URI = primary key
• New column = new
rows
• Sparse if row missing
• Object a relation or
literal
SCHEMA, QUERY & FEDERATION
Subject Predicate Object
http://tr.com/orders/1 http://ont.tr.com/orders/order_date 20160830
http://tr.com/orders/1 http://ont.tr.com/orders/order_amount 56.84
http://tr.com/orders/1 http://ont.tr.com/orders/order_customer http://tr.com/customers/1
http://tr.com/orders/1 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://ont.tr.com/order
http://tr.com/orders/1 http://ont.salesforce.com/crm/customer_spend 9856.45
http://tr.com/customers/1 http://www.w3.org/2002/07/owl#sameAs http://en.wikipedia.org/wiki/Richard_Nixon
http://en.wikipedia.org/wiki/Richard_Nixon http://owl.wikipedia.org/born 19130109
Federated data
(spend from
CRM)
Relation to
external data
Schema (Ontology)
More than one can
apply to a subject
• Sparql - like SQL. Sum all orders:
SELECT sum(?amount)

WHERE {

?order <http://ont.tr.com/orders/order_amount> ?amount .

?order <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ont.tr.com/order>

}
WHY RDF FOR A KNOWLEDGE GRAPH?
RDF DB Property Graph DB
Open Yes Maybe
Incremental Load Via named graph or
SPARQL
Maybe
Federated Data Yes No
Modelling tools Yes Unlikely
Types/Classes/higher
abstractions Yes No
OUR ARCHITECTURE
PHYSICAL
Snaplogic
or Hadoop
ETL
Sources
(Relational,
Proprietary)
RDF
CM-Well:
RDF Store
Mart/

Products
Pull
Push
Batch
REST Based
Publishing
HTTPS
FTP/HTTP
JDBC
Warehouse
Remote
Read
Replicas
RDF
Full text
mining
RDBMS
Web services
Sed ut perspiciatis unde omnis iste
natus error sit voluptatem
accusantium doloremque laudantium,
totam rem aperiam, eaque ipsa quae
ab illo inventore veritatis et quasi
architecto beatae vitae dicta sunt
explicabo. Nemo enim ipsam
voluptatem quia voluptas sit
aspernatur aut odit aut fugit, sed quia
consequuntur magni dolores eos qui
ratione voluptatem sequi nesciunt.
Neque porro quisquam est, qui
dolorem ipsum quia dolor sit amet,
consectetur, adipisci velit, sed quia
non numquam eius modi tempora
incidunt ut labore et dolore magnam
aliquam quaerat voluptatem. Ut enim
ad minima veniam, quis nostrum
exercitationem ullam corporis suscipit
Neptune
Elastic
RDBMS
Filesystem
Filesystem
LOGICAL
SPARQL
SPARQL Triggers
As captured
• Mechanistic conversion
• Minimal validation
• Named graph for W3C
provenance & update
Target Model
• “Canonical Graph”
• Curated ontologies
• Normalized
representation
Selective
Product Models
• Slice & dice
• Store/retrieve using
whatever works
• Not necessarily graph
OUR GRAPH WAREHOUSE: CM-WELL
HA Proxy
…
REST/HTTP
REST/HTTP
• NOT a triple store!
Focus is on data
movement
• No master node
• Linear scaling
• Stateless
• JVM isolation
• Query based
subscription
• Logical replication
• Available on GitHub
CM-Well Node
Cassandra
Elastic Kafka
Web
Server
Background
Processing
User
workload
Health Control Layer
CM-Well Node
Cassandra
Elastic Kafka
Web
Server
Background
Processing
User
workload
Health Control Layer
Roaming
Grid
CM-Well Node
Cassandra
Elastic Kafka
Web
Server
Background
Processing
User
workload
Health Control Layer
POPULATING THE GRAPH
RELATIONAL
• Map primary keys into
own namespace (or
assign surrogate keys)
• Map dimensions to
existing entities if
possible
• Concentrate on the
relations and
attributes that matter
• Can always return to
the source for details
<https://permid.org/1-4297089638>

a tr-org:Organization ;

tr-common:hasPermId "4297089638"^^xsd:string ;

tr-org:isIncorporatedIn <http://sws.geonames.org/6252001/> ;

fibo-be-le-cb:isDomiciledIn <http://sws.geonames.org/6252001/> ;

vcard:hasURL <https://www.tesla.com/> ;

vcard:organization-name "Tesla Inc"^^xsd:string .
<https://permid.org/1-34421840245>

a tr-person:Person ;

vcard:family-name "Musk"^^xsd:string ;

vcard:given-name "Elon"^^xsd:string .
<https://permid.org/2-497b8953cd00ec12589126c0f1116e2ca8fb484b80722
person:hasPositionType o:1-10010134 ;

person:hasReportedTitle "Chairman of the Board" ;

person:isPositionIn o:1-4297089638 .
Surrogate Key
Relationship with
properties
Existing ontologies
Existing
dimension
FULL TEXT
• Link to source (Retain confidence)
• Provenance in quad for updates <https://data.tr.com/sc/4297089638_4295869694>

a tr-sc:SupplyChainAgreement ;

tr-sc:aggregateConfidence “0.9999976445274502”^^xs
tr-sc:supplier <https://permid.org/1-4297089638>;

tr-sc:customer <https://permid.org/1-4295869694>.
<https://data.tr.com/sc/snippet/4297089638_429586969
a tr-sc:Snippet ;

tr-sc:snippetText "~~~Tesla~~~ is supplying electr
tr-sc:confidence "0.999"^^xsd:float;

tr-sc:source “nL1N0IL11N-2013-10-31"^^xsd:string.
Article primary key
PROVENANCE IS INVALUABLE
• W3C Provenance applied by named graph:
• Can also be used to model bi-temporality if needed
• Example
Source A states
<S>, <P>, “O”
Source B states
<S>, <P>, <O>
Append unique
named Graph
on load
<S>, <P>, “O”, <G1>
<S>, <P>, <O>, <G2>
<G1>, <prov:wasGeneratedBy>, “Snaplogic”
<G1>, <prov:wasDerivedFrom>, “Database source”
etc.
Graph URI could be hash of
S,P,O or GUID, etc.
Consider idempotence and
determinism
MODELLING BI-TEMPORALITY
• Not inherently supported in RDF
• Possible solutions
• Ignore!
• Model for particular values (potentially
using blank nodes)
• Model on named graph
• Reification
• Use RDF* & SPARQL* (Reification Done
Right - only in BlazeGraph…)
Name

“Apple Computer”
From: 1977-03-01

To: 2007-09-01
Name

Apple Inc

From: 2007-09-01
Organization

Apple
Has Name
Has Name
Specific model approach
org:2-xyz {

org:1-4295905573 org:hasName "Apple Computer" .

}

org:2-xyz 

bt:effectiveFrom "1977-03-01"^^xsd:date ;

bt:effectiveTo "2007-09-01"^^xsd:date .
Named Graph Approach
Temporality on Named Graph
LESSONS LEARNED
RDF IS DIFFERENT, IA IS KING
• Early education is key
• Strong information architecture really helps
• Modeling tools
• OWL invaluable, consider SHACL
Closed world on
top of open world
Open world
MAPPING TO AUTHORITIES
Mapping approaches:
• Simple match
• Fuzzy match (Soundex,
Levenshtein)
• Full text search
• Normalize then search/
match
• Concordance (TAMR etc)
• Ensemble of the above
STILL BLEEDING EDGE
• …but now being used in real world solutions
• Have clear goals
• Be prepared to change direction & solutions
• Getting easier as vendor solutions increase and mature
DON’T OVERTHINK ETL
• Doesn’t have to be within Hadoop
• Does have to be repeatable
• Pervert existing ETL to treat as 3 column table
• A RDF REST API can be sufficient
• But
• Has to fit with overarching IA
• Need to accommodate idempotence & determinism (can’t be different named
graph on each run)
Dan Bennett
@nonodename
dan.bennett <at> tr.com
https://github.com/thomsonreuters/cm-well
permid.org
QUESTIONS?

More Related Content

What's hot

Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBaseCarol McDonald
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidDataWorks Summit
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupHadoop User Group
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
Deconstructing Lambda
Deconstructing LambdaDeconstructing Lambda
Deconstructing Lambdadarach
 
Streaming architecture patterns
Streaming architecture patternsStreaming architecture patterns
Streaming architecture patternshadooparchbook
 
Architectural Patterns for Streaming Applications
Architectural Patterns for Streaming ApplicationsArchitectural Patterns for Streaming Applications
Architectural Patterns for Streaming Applicationshadooparchbook
 
What no one tells you about writing a streaming app
What no one tells you about writing a streaming appWhat no one tells you about writing a streaming app
What no one tells you about writing a streaming apphadooparchbook
 
Data Aggregation At Scale Using Apache Flume
Data Aggregation At Scale Using Apache FlumeData Aggregation At Scale Using Apache Flume
Data Aggregation At Scale Using Apache FlumeArvind Prabhakar
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningSwiss Big Data User Group
 
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedGuido Schmutz
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 
Big Data Anti-Patterns: Lessons From the Front LIne
Big Data Anti-Patterns: Lessons From the Front LIneBig Data Anti-Patterns: Lessons From the Front LIne
Big Data Anti-Patterns: Lessons From the Front LIneDouglas Moore
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationshadooparchbook
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analyticskgshukla
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData
 
Advanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLPAdvanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLPDatabricks
 

What's hot (20)

Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and DruidOpen Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
Open Source Lambda Architecture with Hadoop, Kafka, Samza and Druid
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user group
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Deconstructing Lambda
Deconstructing LambdaDeconstructing Lambda
Deconstructing Lambda
 
Streaming architecture patterns
Streaming architecture patternsStreaming architecture patterns
Streaming architecture patterns
 
Architectural Patterns for Streaming Applications
Architectural Patterns for Streaming ApplicationsArchitectural Patterns for Streaming Applications
Architectural Patterns for Streaming Applications
 
What no one tells you about writing a streaming app
What no one tells you about writing a streaming appWhat no one tells you about writing a streaming app
What no one tells you about writing a streaming app
 
Bigdata : Big picture
Bigdata : Big pictureBigdata : Big picture
Bigdata : Big picture
 
Data Aggregation At Scale Using Apache Flume
Data Aggregation At Scale Using Apache FlumeData Aggregation At Scale Using Apache Flume
Data Aggregation At Scale Using Apache Flume
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time Learning
 
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
Big Data Anti-Patterns: Lessons From the Front LIne
Big Data Anti-Patterns: Lessons From the Front LIneBig Data Anti-Patterns: Lessons From the Front LIne
Big Data Anti-Patterns: Lessons From the Front LIne
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applications
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017
 
Big data clustering
Big data clusteringBig data clustering
Big data clustering
 
Advanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLPAdvanced Natural Language Processing with Apache Spark NLP
Advanced Natural Language Processing with Apache Spark NLP
 

Similar to Building a Knowledge Graph @ Graph Day 2018

Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Sid Anand
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to ProductionMostafa Majidpour
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
Bogdan Kecman Advanced Databasing
Bogdan Kecman Advanced DatabasingBogdan Kecman Advanced Databasing
Bogdan Kecman Advanced DatabasingBogdan Kecman
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream DataDataWorks Summit
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
Intro to Spark development
 Intro to Spark development  Intro to Spark development
Intro to Spark development Spark Summit
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineData Con LA
 
Introduction to Spark Training
Introduction to Spark TrainingIntroduction to Spark Training
Introduction to Spark TrainingSpark Summit
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Databasejavier ramirez
 
Bogdan Kecman INIT Presentation
Bogdan Kecman INIT PresentationBogdan Kecman INIT Presentation
Bogdan Kecman INIT Presentationarhismece
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
20181215 introduction to graph databases
20181215   introduction to graph databases20181215   introduction to graph databases
20181215 introduction to graph databasesTimothy Findlay
 
Managing data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudManaging data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudKaran Singh
 
Creating Modern Metadata Systems [FutureStack16 NYC]
Creating Modern Metadata Systems [FutureStack16 NYC]Creating Modern Metadata Systems [FutureStack16 NYC]
Creating Modern Metadata Systems [FutureStack16 NYC]New Relic
 
DIscover Spark and Spark streaming
DIscover Spark and Spark streamingDIscover Spark and Spark streaming
DIscover Spark and Spark streamingMaturin BADO
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...Chetan Khatri
 

Similar to Building a Knowledge Graph @ Graph Day 2018 (20)

Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)Big Data, Fast Data @ PayPal (YOW 2018)
Big Data, Fast Data @ PayPal (YOW 2018)
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Nike tech talk.2
Nike tech talk.2Nike tech talk.2
Nike tech talk.2
 
Bogdan Kecman Advanced Databasing
Bogdan Kecman Advanced DatabasingBogdan Kecman Advanced Databasing
Bogdan Kecman Advanced Databasing
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Intro to Spark development
 Intro to Spark development  Intro to Spark development
Intro to Spark development
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
Introduction to Spark Training
Introduction to Spark TrainingIntroduction to Spark Training
Introduction to Spark Training
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Database
 
Bogdan Kecman INIT Presentation
Bogdan Kecman INIT PresentationBogdan Kecman INIT Presentation
Bogdan Kecman INIT Presentation
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
20181215 introduction to graph databases
20181215   introduction to graph databases20181215   introduction to graph databases
20181215 introduction to graph databases
 
Managing data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudManaging data analytics in a hybrid cloud
Managing data analytics in a hybrid cloud
 
Creating Modern Metadata Systems [FutureStack16 NYC]
Creating Modern Metadata Systems [FutureStack16 NYC]Creating Modern Metadata Systems [FutureStack16 NYC]
Creating Modern Metadata Systems [FutureStack16 NYC]
 
DIscover Spark and Spark streaming
DIscover Spark and Spark streamingDIscover Spark and Spark streaming
DIscover Spark and Spark streaming
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Building a Knowledge Graph @ Graph Day 2018

  • 1. REUTERS / Danish Ismail BUILDING A KNOWLEDGE GRAPH DAN BENNETT - GRAPH DAY 2018 @nonodename SEPTEMBER, 2018
  • 2. AGENDA • A little on TR • What’s a knowledge graph? • Quick reset on RDF - if needed • Data engineering for our knowledge graph • Lessons learned • Q&A
  • 3. A LITTLE ON THOMSON REUTERS
  • 4. THOMSON REUTERS - THE ANSWER COMPANY • Information, technology and expertise for professionals • Focus on finance, risk, media, legal, tax and accounting markets • 87% recurring revenue, 93% electronic, global footprint • My role: big data & NLP within central technology group supporting market aligned business units REUTERS/Amit Dave
  • 7. WHAT IS A KNOWLEDGE GRAPH? • Open world representation of information • Every entry point is equal cost • Underpin Cortana, Google Assistant, Siri, Alexa • Typically (but doesn’t have to be) expressed in RDF Score Team Team Game 6-1 Venue Panama England Nizhy Novgorod Stadium Score Score Score Stones8 hasName hasLogo hasFinalScore played hasName hasLogo playedAt hasName hasQuarter atTime byPlayer
  • 10. SCHEMA ON WRITE • Fixed data model • Slow to change • Strong enforcement
  • 11. SCHEMA ON READ • Capture everything • Apply logic (schema) on read • No standards
  • 12. RDF: SCHEMA ON READ, OPTIONAL ON WRITE Schema on Read Schema on Write Accuracy Difficult & slow to change Anything goes Federated RDF Standards (potentially) verbose Triggers/Stored Procs/IDs Referential integrity on write Referential integrity on read Super flexible Capture everything Flexible
  • 13. HOW CAN THAT BE? (SIMPLIFIED!) ID Date Amount Customer 1 30-Aug-2016 56.84 1 2 31-Aug-2016 42.36 2 3 1-Sep-2016 98.45 1 4 1-Sep-2016 23.54 3 ID Name 1 Barack Obama 2 Richard Nixon 3 Ronald Reagan 4 Bill Clinton Orders Customers Subject Predicate Object http://tr.com/orders/1 http://ont.tr.com/orders/order_date 20160830 http://tr.com/orders/1 http://ont.tr.com/orders/order_amount 56.84 http://tr.com/orders/1 http://ont.tr.com/orders/order_customer http://tr.com/ customers/1http://tr.com/orders/1 http://www.w3.org/1999/02/22-rdf-syntax- ns#type http://ont.tr.com/order http://tr.com/orders/2 http://ont.tr.com/orders/order_date 20160831 http://tr.com/orders/2 http://ont.tr.com/orders/order_amount 42.36 http://tr.com/orders/2 http://ont.tr.com/orders/order_customer http://tr.com/ customers/2http://tr.com/orders/2 http://www.w3.org/1999/02/22-rdf-syntax- ns#type http://ont.tr.com/order … … … http://tr.com/ customers/1 http://ont.tr.com/customers/name Barack Obama http://tr.com/ customers/1 http://www.w3.org/1999/02/22-rdf-syntax- ns#type http://ont.tr.com/ customerhttp://tr.com/ customers/2 http://ont.tr.com/customers/name Richard Nixon http://tr.com/ customers/2 http://www.w3.org/1999/02/22-rdf-syntax- ns#type http://ont.tr.com/ customer… … … RelationalRDF • URI = primary key • New column = new rows • Sparse if row missing • Object a relation or literal
  • 14. SCHEMA, QUERY & FEDERATION Subject Predicate Object http://tr.com/orders/1 http://ont.tr.com/orders/order_date 20160830 http://tr.com/orders/1 http://ont.tr.com/orders/order_amount 56.84 http://tr.com/orders/1 http://ont.tr.com/orders/order_customer http://tr.com/customers/1 http://tr.com/orders/1 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://ont.tr.com/order http://tr.com/orders/1 http://ont.salesforce.com/crm/customer_spend 9856.45 http://tr.com/customers/1 http://www.w3.org/2002/07/owl#sameAs http://en.wikipedia.org/wiki/Richard_Nixon http://en.wikipedia.org/wiki/Richard_Nixon http://owl.wikipedia.org/born 19130109 Federated data (spend from CRM) Relation to external data Schema (Ontology) More than one can apply to a subject • Sparql - like SQL. Sum all orders: SELECT sum(?amount)
 WHERE {
 ?order <http://ont.tr.com/orders/order_amount> ?amount .
 ?order <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ont.tr.com/order>
 }
  • 15. WHY RDF FOR A KNOWLEDGE GRAPH? RDF DB Property Graph DB Open Yes Maybe Incremental Load Via named graph or SPARQL Maybe Federated Data Yes No Modelling tools Yes Unlikely Types/Classes/higher abstractions Yes No
  • 17. PHYSICAL Snaplogic or Hadoop ETL Sources (Relational, Proprietary) RDF CM-Well: RDF Store Mart/
 Products Pull Push Batch REST Based Publishing HTTPS FTP/HTTP JDBC Warehouse Remote Read Replicas RDF Full text mining RDBMS Web services Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit Neptune Elastic RDBMS Filesystem Filesystem
  • 18. LOGICAL SPARQL SPARQL Triggers As captured • Mechanistic conversion • Minimal validation • Named graph for W3C provenance & update Target Model • “Canonical Graph” • Curated ontologies • Normalized representation Selective Product Models • Slice & dice • Store/retrieve using whatever works • Not necessarily graph
  • 19. OUR GRAPH WAREHOUSE: CM-WELL HA Proxy … REST/HTTP REST/HTTP • NOT a triple store! Focus is on data movement • No master node • Linear scaling • Stateless • JVM isolation • Query based subscription • Logical replication • Available on GitHub CM-Well Node Cassandra Elastic Kafka Web Server Background Processing User workload Health Control Layer CM-Well Node Cassandra Elastic Kafka Web Server Background Processing User workload Health Control Layer Roaming Grid CM-Well Node Cassandra Elastic Kafka Web Server Background Processing User workload Health Control Layer
  • 21. RELATIONAL • Map primary keys into own namespace (or assign surrogate keys) • Map dimensions to existing entities if possible • Concentrate on the relations and attributes that matter • Can always return to the source for details <https://permid.org/1-4297089638>
 a tr-org:Organization ;
 tr-common:hasPermId "4297089638"^^xsd:string ;
 tr-org:isIncorporatedIn <http://sws.geonames.org/6252001/> ;
 fibo-be-le-cb:isDomiciledIn <http://sws.geonames.org/6252001/> ;
 vcard:hasURL <https://www.tesla.com/> ;
 vcard:organization-name "Tesla Inc"^^xsd:string . <https://permid.org/1-34421840245>
 a tr-person:Person ;
 vcard:family-name "Musk"^^xsd:string ;
 vcard:given-name "Elon"^^xsd:string . <https://permid.org/2-497b8953cd00ec12589126c0f1116e2ca8fb484b80722 person:hasPositionType o:1-10010134 ;
 person:hasReportedTitle "Chairman of the Board" ;
 person:isPositionIn o:1-4297089638 . Surrogate Key Relationship with properties Existing ontologies Existing dimension
  • 22. FULL TEXT • Link to source (Retain confidence) • Provenance in quad for updates <https://data.tr.com/sc/4297089638_4295869694>
 a tr-sc:SupplyChainAgreement ;
 tr-sc:aggregateConfidence “0.9999976445274502”^^xs tr-sc:supplier <https://permid.org/1-4297089638>;
 tr-sc:customer <https://permid.org/1-4295869694>. <https://data.tr.com/sc/snippet/4297089638_429586969 a tr-sc:Snippet ;
 tr-sc:snippetText "~~~Tesla~~~ is supplying electr tr-sc:confidence "0.999"^^xsd:float;
 tr-sc:source “nL1N0IL11N-2013-10-31"^^xsd:string. Article primary key
  • 23. PROVENANCE IS INVALUABLE • W3C Provenance applied by named graph: • Can also be used to model bi-temporality if needed • Example Source A states <S>, <P>, “O” Source B states <S>, <P>, <O> Append unique named Graph on load <S>, <P>, “O”, <G1> <S>, <P>, <O>, <G2> <G1>, <prov:wasGeneratedBy>, “Snaplogic” <G1>, <prov:wasDerivedFrom>, “Database source” etc. Graph URI could be hash of S,P,O or GUID, etc. Consider idempotence and determinism
  • 24. MODELLING BI-TEMPORALITY • Not inherently supported in RDF • Possible solutions • Ignore! • Model for particular values (potentially using blank nodes) • Model on named graph • Reification • Use RDF* & SPARQL* (Reification Done Right - only in BlazeGraph…) Name
 “Apple Computer” From: 1977-03-01
 To: 2007-09-01 Name
 Apple Inc
 From: 2007-09-01 Organization
 Apple Has Name Has Name Specific model approach org:2-xyz {
 org:1-4295905573 org:hasName "Apple Computer" .
 }
 org:2-xyz 
 bt:effectiveFrom "1977-03-01"^^xsd:date ;
 bt:effectiveTo "2007-09-01"^^xsd:date . Named Graph Approach Temporality on Named Graph
  • 26. RDF IS DIFFERENT, IA IS KING • Early education is key • Strong information architecture really helps • Modeling tools • OWL invaluable, consider SHACL Closed world on top of open world Open world
  • 27. MAPPING TO AUTHORITIES Mapping approaches: • Simple match • Fuzzy match (Soundex, Levenshtein) • Full text search • Normalize then search/ match • Concordance (TAMR etc) • Ensemble of the above
  • 28. STILL BLEEDING EDGE • …but now being used in real world solutions • Have clear goals • Be prepared to change direction & solutions • Getting easier as vendor solutions increase and mature
  • 29. DON’T OVERTHINK ETL • Doesn’t have to be within Hadoop • Does have to be repeatable • Pervert existing ETL to treat as 3 column table • A RDF REST API can be sufficient • But • Has to fit with overarching IA • Need to accommodate idempotence & determinism (can’t be different named graph on each run)
  • 30. Dan Bennett @nonodename dan.bennett <at> tr.com https://github.com/thomsonreuters/cm-well permid.org QUESTIONS?