SlideShare a Scribd company logo
© Copyright 2014 TopQuadrant Inc. Slide 1
Semantic Web standards and
the Variety “V” of Big Data
Bob DuCharme
August 20, 2014
© Copyright 2014 TopQuadrant Inc. Slide 2
Three Vs of Big Data
 Volume
 Velocity
 Variety
© Copyright 2014 TopQuadrant Inc. Slide 3
Gartner, September 2013
© Copyright 2014 TopQuadrant Inc. Slide 4
Which dimensions did people struggle with the
most?
 Volume 35%
 Velocity 16%
 Variety 49%
© Copyright 2014 TopQuadrant Inc. Slide 5
Why is variety hard?
Furniture
Inventory
Protein
Database
?
Customer
Database
Conference
Attendees?
Surname
GivenName
LastPurchase
ZipCode
Email
last_name
first_name
is_speaker
postal_code
email
© Copyright 2014 TopQuadrant Inc. Slide 6
Schemas
Good thing:
Ensure data quality
Make query writing* easier
Add efficiency
*And essentially, all application
development
Annoying thing:
 Can’t add property values
someone didn’t see coming
 Changing schema (and data
with it) slow and expensive
 Often tied too closely to
specific implementation
Inflexibility × 3.
© Copyright 2014 TopQuadrant Inc. Slide 7
Schemaless NoSQL databases
 Can’t add property values someone
didn’t see coming?
 Changing schema (and data with it) slow
and expensive?
 Often tied too closely to specific
implementation?
© Copyright 2014 TopQuadrant Inc. Slide 8
Schemaless: how do applications know
what properties are available?
 By any means necessary
 Documentation
 Query for properties that got used
 App possibly written by same person or team
 Responsibility shifted from database
(designer) to application (designer)
© Copyright 2014 TopQuadrant Inc. Slide 9
Schema: all or nothing?
Customer
Database
Conference
Attendees?
Surname
GivenName
LastPurchase
ZipCode
Email
last_name
first_name
is_speaker
postal_code
email
ETL (Extract-Transform-Load)?
© Copyright 2014 TopQuadrant Inc. Slide 10
RDF Schema (RDFS)
 W3C Standard since 2004
 Often overshadowed by superset standard
OWL
 Describes RDF, written using RDF syntaxes
Semantic
Web
Linked
Data
© Copyright 2014 TopQuadrant Inc. Slide 11
RDF
 www.w3.org/RDF (second sentence!):
“RDF has features that facilitate data merging even
if the underlying schemas differ, and it specifically
supports the evolution of schemas over time
without requiring all the data consumers to be
changed.”
© Copyright 2014 TopQuadrant Inc. Slide 12
Sample schema
@prefix cust: <http://companyX.com/ns/customer#> .
@prefix ca: <http://companyY.com/ns/confAttendees#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
cust:Surname a rdf:Property .
# or: cust:Surname rdf:type rdf:Property .
cust:GivenName a rdf:Property .
cust:ZipCode a rdf:Property .
cust:Email a rdf:Property .
ca:last_name a rdf:Property .
ca:first_name a rdf:Property .
ca:postal_code a rdf:Property.
ca:email a rdf:Property .
# LastPurchase and is_speaker: don't care (for now)!
Customer
Database
Conference
Attendees
© Copyright 2014 TopQuadrant Inc. Slide 13
Relating properties
# assuming prefix declarations from previous slide
@prefix schema: <http://schema.org/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
cust:Surname rdfs:subPropertyOf schema:familyName .
ca:last_name rdfs:subPropertyOf schema:familyName .
cust:GivenName rdfs:subPropertyOf schema:givenName .
ca:first_name rdfs:subPropertyOf schema:givenName .
cust:Email rdfs:subPropertyOf schema:email .
ca:email rdfs:subPropertyOf schema:email .
Cust:ZipCode rdfs:subPropertyOf schema:postalCode .
ca:postal_code rdfs:subPropertyOf schema:postalCode .
© Copyright 2014 TopQuadrant Inc. Slide 14
Using the combined data
# SPARQL query: where should we open
# a government relations office?
SELECT ?postalCode
WHERE {
?person schema:email ?email .
FILTER(strends(?email,".gov"))
?person schema:postalCode ?postalCode .
}
© Copyright 2014 TopQuadrant Inc. Slide 15
Middleware to treat RDBMS as RDF
Customers
Mapping Middleware (e.g. D2R, Ultrawrap)
Application
SPARQL
query
SQL
query
Relational
results
SPARQL
query
results
© Copyright 2014 TopQuadrant Inc. Slide 16
Middleware to treat RDBMS as RDF
Customers
Mapping Middleware (e.g. D2R, Ultrawrap)
Application
SPARQL
query
SQL
query
Relational
results
SPARQL
query
results
Conference
Attendees
SQL
query
Relational
results
Schema
metadata
triplestore
© Copyright 2014 TopQuadrant Inc. Slide 17
Further enhancement
ex:Person a rdfs:Class.
schema:familyName rdfs:domain ex:Person .
schema:givenName rdfs:domain ex:Person .
schema:email rdfs:domain ex:Person .
schema:postalCode rdfs:domain ex:Person .
schema:postalCode rdfs:label "postal code" .
Schema:postalCode rdfs:comment
"Zip code in the USA, postcode in the UK."
© Copyright 2014 TopQuadrant Inc. Slide 18
Adding more with OWL
equipment code room
X1703 main kitchen
Z0439 cold storage
room building
main kitchen 98 Main St.
cold storage 14 Broad St.
Equipment Room addresses
eq:room rdfs:subPropertyOf ex:locatedIn .
rmaddr:building rdfs:subPropertyOf ex:locatedIn .
ex:locatedIn a owl:TransitiveProperty.
rmaddr:98MainSt a ex:Building.
eq:X1703 eq:room eq:mainKitchen .
eq:mainKitchen rmaddr:building rmaddr:98MainSt .
© Copyright 2014 TopQuadrant Inc. Slide 19
Query for which building
# SPARQL query: what building is
# equipment piece x1703 in?
SELECT ?building
WHERE {
?building a ex:Building.
eq:X1703 ex:locatedIn ?building .
}
located
in
located
in
© Copyright 2014 TopQuadrant Inc. Slide 20
A little more OWL
schema:email a owl:inverseFunctionalProperty .
ex:cust401 cust:GivenName "James" .
ex:cust401 cust:Surname "Smith" .
ex:cust401 cust:Email "jsmith@somecompany.com" .
ex:ca04395 ca:first_name "Jim" .
ex:ca04395 ca:last_name "Smith" .
ex:ca04395 ca:email "jsmith@somecompany.com" .
ex:cust401 owl:sameAs ex:ca04395 .
© Copyright 2014 TopQuadrant Inc. Slide 21
What OWL adds to RDFS
 RDFS gives you properties to describe your
properties, classes, and instances (i.e. your
resources)
 OWL gives you:
• More properties to describe your resources
• Classes that you can use to describe resources
• The ability to define your own classes that you can
use to describe resources
© Copyright 2014 TopQuadrant Inc. Slide 22
Middleware to treat RDBMS as RDF
Customers
Mapping Middleware (e.g. D2R, Ultrawrap)
Application
SPARQL
query
SQL
query
Relational
results
SPARQL
query
results
Conference
Attendees
SQL
query
Relational
results
Schema
metadata
triplestore
© Copyright 2014 TopQuadrant Inc. Slide 23
Descriptive vs. Proscriptive schemas
 Not rules to follow
– e.g. “Employee must have a first and last name!”
– Other ways to do implement constraints
 Machine-readable guides to what you’ve got
to work with
– Data types
– Relationships to other resources and classes of
resources
 Metadata!
© Copyright 2014 TopQuadrant Inc. Slide 24
Whose schemas?
 Your own schemas can describe what you need from
the data you’re using
 Standardized schemas (e.g. schema.org,
GoodRelations) can tie together your data with data
form other sources
 Tie together your custom schemas with (subsets that
you’re interested in of) standardized schemas
 Tie together (subsets that you’re interested in of)
different data sets from different sources
© Copyright 2014 TopQuadrant Inc. Slide 25
Top-down or bottom-up schema development?
 Whichever you like
 I like bottom-up
– (Hey Cyc project: good luck with that!)
 Lots of data to deal with?
– Model just enough to drive a simple, proof-of-
concept application
– Build the model (schema) a little at a time, then
add more to your application
– Connect that model to models of (subsets of)
other data sets
© Copyright 2014 TopQuadrant Inc. Slide 26
Who is doing this now?
 Pharma
 Oil and gas
 Publishing
© Copyright 2014 TopQuadrant Inc. Slide 27
TopQuadrant Products and Solutions
Solutions
Asset Management
Solutions
Search / Content
Enrichment
TopBraid Platform
Solution Engine
IDE
Solutions
Compose your own
Solutions
Master Data
Management
Solutions
Information Discovery for
Life Sciences
Solutions
Information
Exchange
• TopQuadrant offers configurable, out-of-the box
solutions enabling organizations to evolve their
information infrastructure into a semantic ecosystem
© Copyright 2014 TopQuadrant Inc. Slide 28
 Dynamic Interactive Exploration - Search, Query, Filter, Browse,
Navigate, Visualize, Share
 Logical Data Warehouse - Flexible, Adaptive Information Structuring
TopBraid Insight™ (TBI)
Connect the dots for new insights. Ease Big Data Variety
© Copyright 2013 TopQuadrant Inc. Slide 29
© Copyright 2014 TopQuadrant Inc. Slide 30
• Tames Big Data to empower businesses
• Offers on-demand integrated access to diverse data, making it
possible to discover information just in time
• Delivers new levels of creativity and infrastructure flexibility
TopBraid Insight: Connects the Dots
© Copyright 2014 TopQuadrant Inc. Slide 31
Photo credits
• Volume: (CC BY-NC 2.0) Fabrizio Monti
https://www.flickr.com/photos/delphaber/3514894189
• Velocity: (CC BY 2.0) Gabriel
https://www.flickr.com/photos/cod_gabriel/1332225362
• Variety: (CC BY-NC-SA 2.0) IRRI Photos
https://www.flickr.com/photos/ricephotos/4753359957
© Copyright 2014 TopQuadrant Inc. Slide 32
“A wonderful harmony is created when we join
together the seemingly unconnected.”
- Heraclitus
Bob DuCharme bducharme@topquadrant.com
Thank you!

More Related Content

What's hot

Hadoop and DynamoDB
Hadoop and DynamoDBHadoop and DynamoDB
Hadoop and DynamoDB
Amazon Web Services
 
Security managers july 2015 (1)
Security managers july 2015 (1)Security managers july 2015 (1)
Security managers july 2015 (1)
Arthur Schmunk
 
Kb12012011 amitava cloud_computing
Kb12012011 amitava cloud_computingKb12012011 amitava cloud_computing
Kb12012011 amitava cloud_computing
Amitava Kumar
 
AWS Summit Paris - Keynote Slides
AWS Summit Paris - Keynote SlidesAWS Summit Paris - Keynote Slides
AWS Summit Paris - Keynote Slides
Amazon Web Services
 
Real World Azure - IT Pros
Real World Azure - IT ProsReal World Azure - IT Pros
Real World Azure - IT Pros
Clint Edmonson
 
Lean Cloud - Amazon Web Services
Lean Cloud - Amazon Web ServicesLean Cloud - Amazon Web Services
Lean Cloud - Amazon Web Services
Simone Brunozzi
 
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AILudwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Data Science Milan
 
Similarity at Scale
Similarity at ScaleSimilarity at Scale
Similarity at Scale
DataWorks Summit
 
Public Sector Partner in the Nordics Webinar
Public Sector Partner in the Nordics WebinarPublic Sector Partner in the Nordics Webinar
Public Sector Partner in the Nordics Webinar
Amazon Web Services
 
AWS Summit Berlin 2013 - Keynote Werner Vogels
AWS Summit Berlin 2013 - Keynote Werner VogelsAWS Summit Berlin 2013 - Keynote Werner Vogels
AWS Summit Berlin 2013 - Keynote Werner Vogels
AWS Germany
 
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Sebastian Verheughe
 
Cloud Computing Opportunities in the Goverment Military Sectors
Cloud Computing Opportunities in the Goverment Military SectorsCloud Computing Opportunities in the Goverment Military Sectors
Cloud Computing Opportunities in the Goverment Military Sectors
Joseph Holbrook, Chief Learning Officer (CLO)
 
How Graph Databases started the Multi Model revolution
How Graph Databases started the Multi Model revolutionHow Graph Databases started the Multi Model revolution
How Graph Databases started the Multi Model revolution
Luca Garulli
 
AWS overview
AWS overviewAWS overview
AWS overview
Rajib Associates
 
Simplifying AI and Machine Learning with Watson Studio
Simplifying AI and Machine Learning with Watson StudioSimplifying AI and Machine Learning with Watson Studio
Simplifying AI and Machine Learning with Watson Studio
DataWorks Summit
 
An Intro to Cloud Computing......RG
An Intro to Cloud Computing......RGAn Intro to Cloud Computing......RG
An Intro to Cloud Computing......RG
rajatricky
 
AWS Summit Benelux 2013 - 'Transformation Powered by the AWS Cloud' Keynote
AWS Summit Benelux 2013 - 'Transformation Powered by the AWS Cloud' KeynoteAWS Summit Benelux 2013 - 'Transformation Powered by the AWS Cloud' Keynote
AWS Summit Benelux 2013 - 'Transformation Powered by the AWS Cloud' Keynote
Amazon Web Services
 

What's hot (17)

Hadoop and DynamoDB
Hadoop and DynamoDBHadoop and DynamoDB
Hadoop and DynamoDB
 
Security managers july 2015 (1)
Security managers july 2015 (1)Security managers july 2015 (1)
Security managers july 2015 (1)
 
Kb12012011 amitava cloud_computing
Kb12012011 amitava cloud_computingKb12012011 amitava cloud_computing
Kb12012011 amitava cloud_computing
 
AWS Summit Paris - Keynote Slides
AWS Summit Paris - Keynote SlidesAWS Summit Paris - Keynote Slides
AWS Summit Paris - Keynote Slides
 
Real World Azure - IT Pros
Real World Azure - IT ProsReal World Azure - IT Pros
Real World Azure - IT Pros
 
Lean Cloud - Amazon Web Services
Lean Cloud - Amazon Web ServicesLean Cloud - Amazon Web Services
Lean Cloud - Amazon Web Services
 
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AILudwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
 
Similarity at Scale
Similarity at ScaleSimilarity at Scale
Similarity at Scale
 
Public Sector Partner in the Nordics Webinar
Public Sector Partner in the Nordics WebinarPublic Sector Partner in the Nordics Webinar
Public Sector Partner in the Nordics Webinar
 
AWS Summit Berlin 2013 - Keynote Werner Vogels
AWS Summit Berlin 2013 - Keynote Werner VogelsAWS Summit Berlin 2013 - Keynote Werner Vogels
AWS Summit Berlin 2013 - Keynote Werner Vogels
 
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
 
Cloud Computing Opportunities in the Goverment Military Sectors
Cloud Computing Opportunities in the Goverment Military SectorsCloud Computing Opportunities in the Goverment Military Sectors
Cloud Computing Opportunities in the Goverment Military Sectors
 
How Graph Databases started the Multi Model revolution
How Graph Databases started the Multi Model revolutionHow Graph Databases started the Multi Model revolution
How Graph Databases started the Multi Model revolution
 
AWS overview
AWS overviewAWS overview
AWS overview
 
Simplifying AI and Machine Learning with Watson Studio
Simplifying AI and Machine Learning with Watson StudioSimplifying AI and Machine Learning with Watson Studio
Simplifying AI and Machine Learning with Watson Studio
 
An Intro to Cloud Computing......RG
An Intro to Cloud Computing......RGAn Intro to Cloud Computing......RG
An Intro to Cloud Computing......RG
 
AWS Summit Benelux 2013 - 'Transformation Powered by the AWS Cloud' Keynote
AWS Summit Benelux 2013 - 'Transformation Powered by the AWS Cloud' KeynoteAWS Summit Benelux 2013 - 'Transformation Powered by the AWS Cloud' Keynote
AWS Summit Benelux 2013 - 'Transformation Powered by the AWS Cloud' Keynote
 

Viewers also liked

escuelas del pensamiento economico 2da parte 1.3
escuelas del pensamiento economico 2da parte 1.3escuelas del pensamiento economico 2da parte 1.3
escuelas del pensamiento economico 2da parte 1.3
MINERVA LIZETH ROJAS ARIAS
 
Article LSA
Article LSAArticle LSA
Article LSA
AuchanGP
 
Cn (cs 604) objective question
Cn (cs 604) objective questionCn (cs 604) objective question
Cn (cs 604) objective question
Amit Ku Rathore
 
Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13
Daniel Abadi
 
The Art of Negotiation
The Art of NegotiationThe Art of Negotiation
The Art of Negotiation
Rob Kaufman
 
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
Daniel Abadi
 
LIS 653, Session 4-B: Introduction to Descriptive Metadata
LIS 653, Session 4-B: Introduction to Descriptive Metadata LIS 653, Session 4-B: Introduction to Descriptive Metadata
LIS 653, Session 4-B: Introduction to Descriptive Metadata
Dr. Starr Hoffman
 
Drupal 7 and Semantic Web Hands-on Tutorial
Drupal 7 and Semantic Web Hands-on TutorialDrupal 7 and Semantic Web Hands-on Tutorial
Drupal 7 and Semantic Web Hands-on Tutorial
Juan Sequeda
 
Virtualizing Relational Databases as Graphs: a multi-model approach
Virtualizing Relational Databases as Graphs: a multi-model approachVirtualizing Relational Databases as Graphs: a multi-model approach
Virtualizing Relational Databases as Graphs: a multi-model approach
Juan Sequeda
 
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
LDBC council
 
Beckman abadi-5min-pres
Beckman abadi-5min-presBeckman abadi-5min-pres
Beckman abadi-5min-pres
Daniel Abadi
 
Why relationships are cool but "join" sucks
Why relationships are cool but "join" sucksWhy relationships are cool but "join" sucks
Why relationships are cool but "join" sucks
Luca Garulli
 
Do I need a Graph Database?
Do I need a Graph Database?Do I need a Graph Database?
Do I need a Graph Database?
Juan Sequeda
 
WTF is the Semantic Web
WTF is the Semantic WebWTF is the Semantic Web
WTF is the Semantic Web
Juan Sequeda
 
WTF is the Semantic Web and Linked Data
WTF is the Semantic Web and Linked DataWTF is the Semantic Web and Linked Data
WTF is the Semantic Web and Linked Data
Juan Sequeda
 
How to Get My Paper Accepted at Top Software Engineering Conferences
How to Get My Paper Accepted at Top Software Engineering ConferencesHow to Get My Paper Accepted at Top Software Engineering Conferences
How to Get My Paper Accepted at Top Software Engineering Conferences
Alex Orso
 
Linked Data tutorial at Semtech 2012
Linked Data tutorial at Semtech 2012Linked Data tutorial at Semtech 2012
Linked Data tutorial at Semtech 2012
Juan Sequeda
 
Graph Processing with Titan and Scylla
Graph Processing with Titan and ScyllaGraph Processing with Titan and Scylla
Graph Processing with Titan and Scylla
Jason Plurad
 
New pedagogies for deep learning
New pedagogies for deep learningNew pedagogies for deep learning
New pedagogies for deep learning
Daniel M Groenewald
 
resume Jennifer Hall 01-2017a
resume Jennifer Hall 01-2017aresume Jennifer Hall 01-2017a
resume Jennifer Hall 01-2017a
Jennifer Hall
 

Viewers also liked (20)

escuelas del pensamiento economico 2da parte 1.3
escuelas del pensamiento economico 2da parte 1.3escuelas del pensamiento economico 2da parte 1.3
escuelas del pensamiento economico 2da parte 1.3
 
Article LSA
Article LSAArticle LSA
Article LSA
 
Cn (cs 604) objective question
Cn (cs 604) objective questionCn (cs 604) objective question
Cn (cs 604) objective question
 
Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13Shared slides-edbt-keynote-03-19-13
Shared slides-edbt-keynote-03-19-13
 
The Art of Negotiation
The Art of NegotiationThe Art of Negotiation
The Art of Negotiation
 
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper into Real...
 
LIS 653, Session 4-B: Introduction to Descriptive Metadata
LIS 653, Session 4-B: Introduction to Descriptive Metadata LIS 653, Session 4-B: Introduction to Descriptive Metadata
LIS 653, Session 4-B: Introduction to Descriptive Metadata
 
Drupal 7 and Semantic Web Hands-on Tutorial
Drupal 7 and Semantic Web Hands-on TutorialDrupal 7 and Semantic Web Hands-on Tutorial
Drupal 7 and Semantic Web Hands-on Tutorial
 
Virtualizing Relational Databases as Graphs: a multi-model approach
Virtualizing Relational Databases as Graphs: a multi-model approachVirtualizing Relational Databases as Graphs: a multi-model approach
Virtualizing Relational Databases as Graphs: a multi-model approach
 
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
 
Beckman abadi-5min-pres
Beckman abadi-5min-presBeckman abadi-5min-pres
Beckman abadi-5min-pres
 
Why relationships are cool but "join" sucks
Why relationships are cool but "join" sucksWhy relationships are cool but "join" sucks
Why relationships are cool but "join" sucks
 
Do I need a Graph Database?
Do I need a Graph Database?Do I need a Graph Database?
Do I need a Graph Database?
 
WTF is the Semantic Web
WTF is the Semantic WebWTF is the Semantic Web
WTF is the Semantic Web
 
WTF is the Semantic Web and Linked Data
WTF is the Semantic Web and Linked DataWTF is the Semantic Web and Linked Data
WTF is the Semantic Web and Linked Data
 
How to Get My Paper Accepted at Top Software Engineering Conferences
How to Get My Paper Accepted at Top Software Engineering ConferencesHow to Get My Paper Accepted at Top Software Engineering Conferences
How to Get My Paper Accepted at Top Software Engineering Conferences
 
Linked Data tutorial at Semtech 2012
Linked Data tutorial at Semtech 2012Linked Data tutorial at Semtech 2012
Linked Data tutorial at Semtech 2012
 
Graph Processing with Titan and Scylla
Graph Processing with Titan and ScyllaGraph Processing with Titan and Scylla
Graph Processing with Titan and Scylla
 
New pedagogies for deep learning
New pedagogies for deep learningNew pedagogies for deep learning
New pedagogies for deep learning
 
resume Jennifer Hall 01-2017a
resume Jennifer Hall 01-2017aresume Jennifer Hall 01-2017a
resume Jennifer Hall 01-2017a
 

Similar to Semantic Web Standards and the Variety “V” of Big Data

The CIOs Guide to NoSQL
The CIOs Guide to NoSQLThe CIOs Guide to NoSQL
The CIOs Guide to NoSQL
DATAVERSITY
 
The Cloud Revolution - Philippines Cloud Summit
The Cloud Revolution - Philippines Cloud SummitThe Cloud Revolution - Philippines Cloud Summit
The Cloud Revolution - Philippines Cloud Summit
Randy Bias
 
4. aws enterprise summit seoul 기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park
4. aws enterprise summit seoul   기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park4. aws enterprise summit seoul   기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park
4. aws enterprise summit seoul 기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park
Amazon Web Services Korea
 
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the Cloud
Inside Analysis
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
HostedbyConfluent
 
SeattleUniv-IntroductionToCloudComputing-WinsowsAzure101
SeattleUniv-IntroductionToCloudComputing-WinsowsAzure101SeattleUniv-IntroductionToCloudComputing-WinsowsAzure101
SeattleUniv-IntroductionToCloudComputing-WinsowsAzure101
Mithun T. Dhar
 
ShapeBlue South Africa Launch-Iaas business use cases
ShapeBlue South Africa Launch-Iaas business use cases ShapeBlue South Africa Launch-Iaas business use cases
ShapeBlue South Africa Launch-Iaas business use cases
ShapeBlue
 
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
 
Pm440 Presentation Black Cloud
Pm440 Presentation Black CloudPm440 Presentation Black Cloud
Pm440 Presentation Black Cloud
guesta946d0
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh
 
As You Seek – How Search Enables Big Data Analytics
As You Seek – How Search Enables Big Data AnalyticsAs You Seek – How Search Enables Big Data Analytics
As You Seek – How Search Enables Big Data Analytics
Inside Analysis
 
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
nimak
 
Cloud Computing World Forum Chairmans Introduction
Cloud Computing World Forum Chairmans IntroductionCloud Computing World Forum Chairmans Introduction
Cloud Computing World Forum Chairmans Introduction
David Terrar
 
Data-Centric Infrastructure for Agile Development
Data-Centric Infrastructure for Agile DevelopmentData-Centric Infrastructure for Agile Development
Data-Centric Infrastructure for Agile Development
DATAVERSITY
 
LinuxCon North America 2013: Why Lease When You Can Buy Your Cloud
LinuxCon North America 2013: Why Lease When You Can Buy Your CloudLinuxCon North America 2013: Why Lease When You Can Buy Your Cloud
LinuxCon North America 2013: Why Lease When You Can Buy Your Cloud
Mark Hinkle
 
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...
marksimpsongw
 
Realize True Business Value With ThousandEyes
Realize True Business Value With ThousandEyesRealize True Business Value With ThousandEyes
Realize True Business Value With ThousandEyes
ThousandEyes
 
Retooling on the Modern Data and Analytics Tech Stack
Retooling on the Modern Data and Analytics Tech StackRetooling on the Modern Data and Analytics Tech Stack
Retooling on the Modern Data and Analytics Tech Stack
Adam Doyle
 
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsScaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Matei Zaharia
 

Similar to Semantic Web Standards and the Variety “V” of Big Data (20)

The CIOs Guide to NoSQL
The CIOs Guide to NoSQLThe CIOs Guide to NoSQL
The CIOs Guide to NoSQL
 
The Cloud Revolution - Philippines Cloud Summit
The Cloud Revolution - Philippines Cloud SummitThe Cloud Revolution - Philippines Cloud Summit
The Cloud Revolution - Philippines Cloud Summit
 
4. aws enterprise summit seoul 기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park
4. aws enterprise summit seoul   기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park4. aws enterprise summit seoul   기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park
4. aws enterprise summit seoul 기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park
 
The New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the CloudThe New Database Frontier: Harnessing the Cloud
The New Database Frontier: Harnessing the Cloud
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
 
SeattleUniv-IntroductionToCloudComputing-WinsowsAzure101
SeattleUniv-IntroductionToCloudComputing-WinsowsAzure101SeattleUniv-IntroductionToCloudComputing-WinsowsAzure101
SeattleUniv-IntroductionToCloudComputing-WinsowsAzure101
 
ShapeBlue South Africa Launch-Iaas business use cases
ShapeBlue South Africa Launch-Iaas business use cases ShapeBlue South Africa Launch-Iaas business use cases
ShapeBlue South Africa Launch-Iaas business use cases
 
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to Cloud
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Pm440 Presentation Black Cloud
Pm440 Presentation Black CloudPm440 Presentation Black Cloud
Pm440 Presentation Black Cloud
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
As You Seek – How Search Enables Big Data Analytics
As You Seek – How Search Enables Big Data AnalyticsAs You Seek – How Search Enables Big Data Analytics
As You Seek – How Search Enables Big Data Analytics
 
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
 
Cloud Computing World Forum Chairmans Introduction
Cloud Computing World Forum Chairmans IntroductionCloud Computing World Forum Chairmans Introduction
Cloud Computing World Forum Chairmans Introduction
 
Data-Centric Infrastructure for Agile Development
Data-Centric Infrastructure for Agile DevelopmentData-Centric Infrastructure for Agile Development
Data-Centric Infrastructure for Agile Development
 
LinuxCon North America 2013: Why Lease When You Can Buy Your Cloud
LinuxCon North America 2013: Why Lease When You Can Buy Your CloudLinuxCon North America 2013: Why Lease When You Can Buy Your Cloud
LinuxCon North America 2013: Why Lease When You Can Buy Your Cloud
 
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...
Mark Simpson - UKOUG23 - Refactoring Monolithic Oracle Database Applications ...
 
Realize True Business Value With ThousandEyes
Realize True Business Value With ThousandEyesRealize True Business Value With ThousandEyes
Realize True Business Value With ThousandEyes
 
Retooling on the Modern Data and Analytics Tech Stack
Retooling on the Modern Data and Analytics Tech StackRetooling on the Modern Data and Analytics Tech Stack
Retooling on the Modern Data and Analytics Tech Stack
 
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsScaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
 

Recently uploaded

Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
22ad0301
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
MastanaihnaiduYasam
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
zsafxbf
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
uevausa
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
Timothy Spann
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
Bisnar Chase Personal Injury Attorneys
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
lzdvtmy8
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 

Recently uploaded (20)

Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 

Semantic Web Standards and the Variety “V” of Big Data

  • 1. © Copyright 2014 TopQuadrant Inc. Slide 1 Semantic Web standards and the Variety “V” of Big Data Bob DuCharme August 20, 2014
  • 2. © Copyright 2014 TopQuadrant Inc. Slide 2 Three Vs of Big Data  Volume  Velocity  Variety
  • 3. © Copyright 2014 TopQuadrant Inc. Slide 3 Gartner, September 2013
  • 4. © Copyright 2014 TopQuadrant Inc. Slide 4 Which dimensions did people struggle with the most?  Volume 35%  Velocity 16%  Variety 49%
  • 5. © Copyright 2014 TopQuadrant Inc. Slide 5 Why is variety hard? Furniture Inventory Protein Database ? Customer Database Conference Attendees? Surname GivenName LastPurchase ZipCode Email last_name first_name is_speaker postal_code email
  • 6. © Copyright 2014 TopQuadrant Inc. Slide 6 Schemas Good thing: Ensure data quality Make query writing* easier Add efficiency *And essentially, all application development Annoying thing:  Can’t add property values someone didn’t see coming  Changing schema (and data with it) slow and expensive  Often tied too closely to specific implementation Inflexibility × 3.
  • 7. © Copyright 2014 TopQuadrant Inc. Slide 7 Schemaless NoSQL databases  Can’t add property values someone didn’t see coming?  Changing schema (and data with it) slow and expensive?  Often tied too closely to specific implementation?
  • 8. © Copyright 2014 TopQuadrant Inc. Slide 8 Schemaless: how do applications know what properties are available?  By any means necessary  Documentation  Query for properties that got used  App possibly written by same person or team  Responsibility shifted from database (designer) to application (designer)
  • 9. © Copyright 2014 TopQuadrant Inc. Slide 9 Schema: all or nothing? Customer Database Conference Attendees? Surname GivenName LastPurchase ZipCode Email last_name first_name is_speaker postal_code email ETL (Extract-Transform-Load)?
  • 10. © Copyright 2014 TopQuadrant Inc. Slide 10 RDF Schema (RDFS)  W3C Standard since 2004  Often overshadowed by superset standard OWL  Describes RDF, written using RDF syntaxes Semantic Web Linked Data
  • 11. © Copyright 2014 TopQuadrant Inc. Slide 11 RDF  www.w3.org/RDF (second sentence!): “RDF has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed.”
  • 12. © Copyright 2014 TopQuadrant Inc. Slide 12 Sample schema @prefix cust: <http://companyX.com/ns/customer#> . @prefix ca: <http://companyY.com/ns/confAttendees#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . cust:Surname a rdf:Property . # or: cust:Surname rdf:type rdf:Property . cust:GivenName a rdf:Property . cust:ZipCode a rdf:Property . cust:Email a rdf:Property . ca:last_name a rdf:Property . ca:first_name a rdf:Property . ca:postal_code a rdf:Property. ca:email a rdf:Property . # LastPurchase and is_speaker: don't care (for now)! Customer Database Conference Attendees
  • 13. © Copyright 2014 TopQuadrant Inc. Slide 13 Relating properties # assuming prefix declarations from previous slide @prefix schema: <http://schema.org/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . cust:Surname rdfs:subPropertyOf schema:familyName . ca:last_name rdfs:subPropertyOf schema:familyName . cust:GivenName rdfs:subPropertyOf schema:givenName . ca:first_name rdfs:subPropertyOf schema:givenName . cust:Email rdfs:subPropertyOf schema:email . ca:email rdfs:subPropertyOf schema:email . Cust:ZipCode rdfs:subPropertyOf schema:postalCode . ca:postal_code rdfs:subPropertyOf schema:postalCode .
  • 14. © Copyright 2014 TopQuadrant Inc. Slide 14 Using the combined data # SPARQL query: where should we open # a government relations office? SELECT ?postalCode WHERE { ?person schema:email ?email . FILTER(strends(?email,".gov")) ?person schema:postalCode ?postalCode . }
  • 15. © Copyright 2014 TopQuadrant Inc. Slide 15 Middleware to treat RDBMS as RDF Customers Mapping Middleware (e.g. D2R, Ultrawrap) Application SPARQL query SQL query Relational results SPARQL query results
  • 16. © Copyright 2014 TopQuadrant Inc. Slide 16 Middleware to treat RDBMS as RDF Customers Mapping Middleware (e.g. D2R, Ultrawrap) Application SPARQL query SQL query Relational results SPARQL query results Conference Attendees SQL query Relational results Schema metadata triplestore
  • 17. © Copyright 2014 TopQuadrant Inc. Slide 17 Further enhancement ex:Person a rdfs:Class. schema:familyName rdfs:domain ex:Person . schema:givenName rdfs:domain ex:Person . schema:email rdfs:domain ex:Person . schema:postalCode rdfs:domain ex:Person . schema:postalCode rdfs:label "postal code" . Schema:postalCode rdfs:comment "Zip code in the USA, postcode in the UK."
  • 18. © Copyright 2014 TopQuadrant Inc. Slide 18 Adding more with OWL equipment code room X1703 main kitchen Z0439 cold storage room building main kitchen 98 Main St. cold storage 14 Broad St. Equipment Room addresses eq:room rdfs:subPropertyOf ex:locatedIn . rmaddr:building rdfs:subPropertyOf ex:locatedIn . ex:locatedIn a owl:TransitiveProperty. rmaddr:98MainSt a ex:Building. eq:X1703 eq:room eq:mainKitchen . eq:mainKitchen rmaddr:building rmaddr:98MainSt .
  • 19. © Copyright 2014 TopQuadrant Inc. Slide 19 Query for which building # SPARQL query: what building is # equipment piece x1703 in? SELECT ?building WHERE { ?building a ex:Building. eq:X1703 ex:locatedIn ?building . } located in located in
  • 20. © Copyright 2014 TopQuadrant Inc. Slide 20 A little more OWL schema:email a owl:inverseFunctionalProperty . ex:cust401 cust:GivenName "James" . ex:cust401 cust:Surname "Smith" . ex:cust401 cust:Email "jsmith@somecompany.com" . ex:ca04395 ca:first_name "Jim" . ex:ca04395 ca:last_name "Smith" . ex:ca04395 ca:email "jsmith@somecompany.com" . ex:cust401 owl:sameAs ex:ca04395 .
  • 21. © Copyright 2014 TopQuadrant Inc. Slide 21 What OWL adds to RDFS  RDFS gives you properties to describe your properties, classes, and instances (i.e. your resources)  OWL gives you: • More properties to describe your resources • Classes that you can use to describe resources • The ability to define your own classes that you can use to describe resources
  • 22. © Copyright 2014 TopQuadrant Inc. Slide 22 Middleware to treat RDBMS as RDF Customers Mapping Middleware (e.g. D2R, Ultrawrap) Application SPARQL query SQL query Relational results SPARQL query results Conference Attendees SQL query Relational results Schema metadata triplestore
  • 23. © Copyright 2014 TopQuadrant Inc. Slide 23 Descriptive vs. Proscriptive schemas  Not rules to follow – e.g. “Employee must have a first and last name!” – Other ways to do implement constraints  Machine-readable guides to what you’ve got to work with – Data types – Relationships to other resources and classes of resources  Metadata!
  • 24. © Copyright 2014 TopQuadrant Inc. Slide 24 Whose schemas?  Your own schemas can describe what you need from the data you’re using  Standardized schemas (e.g. schema.org, GoodRelations) can tie together your data with data form other sources  Tie together your custom schemas with (subsets that you’re interested in of) standardized schemas  Tie together (subsets that you’re interested in of) different data sets from different sources
  • 25. © Copyright 2014 TopQuadrant Inc. Slide 25 Top-down or bottom-up schema development?  Whichever you like  I like bottom-up – (Hey Cyc project: good luck with that!)  Lots of data to deal with? – Model just enough to drive a simple, proof-of- concept application – Build the model (schema) a little at a time, then add more to your application – Connect that model to models of (subsets of) other data sets
  • 26. © Copyright 2014 TopQuadrant Inc. Slide 26 Who is doing this now?  Pharma  Oil and gas  Publishing
  • 27. © Copyright 2014 TopQuadrant Inc. Slide 27 TopQuadrant Products and Solutions Solutions Asset Management Solutions Search / Content Enrichment TopBraid Platform Solution Engine IDE Solutions Compose your own Solutions Master Data Management Solutions Information Discovery for Life Sciences Solutions Information Exchange • TopQuadrant offers configurable, out-of-the box solutions enabling organizations to evolve their information infrastructure into a semantic ecosystem
  • 28. © Copyright 2014 TopQuadrant Inc. Slide 28  Dynamic Interactive Exploration - Search, Query, Filter, Browse, Navigate, Visualize, Share  Logical Data Warehouse - Flexible, Adaptive Information Structuring TopBraid Insight™ (TBI) Connect the dots for new insights. Ease Big Data Variety
  • 29. © Copyright 2013 TopQuadrant Inc. Slide 29
  • 30. © Copyright 2014 TopQuadrant Inc. Slide 30 • Tames Big Data to empower businesses • Offers on-demand integrated access to diverse data, making it possible to discover information just in time • Delivers new levels of creativity and infrastructure flexibility TopBraid Insight: Connects the Dots
  • 31. © Copyright 2014 TopQuadrant Inc. Slide 31 Photo credits • Volume: (CC BY-NC 2.0) Fabrizio Monti https://www.flickr.com/photos/delphaber/3514894189 • Velocity: (CC BY 2.0) Gabriel https://www.flickr.com/photos/cod_gabriel/1332225362 • Variety: (CC BY-NC-SA 2.0) IRRI Photos https://www.flickr.com/photos/ricephotos/4753359957
  • 32. © Copyright 2014 TopQuadrant Inc. Slide 32 “A wonderful harmony is created when we join together the seemingly unconnected.” - Heraclitus Bob DuCharme bducharme@topquadrant.com Thank you!

Editor's Notes

  1. Introduce myself, mention book.
  2. I’m going to assume that I don’t have to convince you that there’s a lot more Volume now. I could say “since you got up this morning, more data has been created than all the data created from the time the first cuneiform writing was invented up through some surprisingly recent historical event” but we’ve all be hearing those stories a lot lately. A related issue is Velocity. One of the reasons that there’s a greater volume is that more devices are generating data, and some of them very quickly because it’s cheaper to do so. Sensors to measure how much liquid is going through a pipe or whether a window is open are less expensive to make, so people are making them and having them send data. OPTIONAL: The classic example is a modern smartphone, which besides measuring your geo location can also record things like they angle that you’re holding it, not to mention the things you’re doing on the phone. When I install an app on my phone that doesn’t need permission to read or write any special data, it’s always a pleasant surprise because the default is that so many of them do. Industrial processing and an increasing number of household devices are taking greater advantage of inexpensive devices that can record things and then pass along what they record, and because the computation and transmission is cheap, they can do it a lot, so they do. Variety: people want to learn things by combining different kinds of data and looking for patterns. With big data efforts, people often want to combine two data sets that have only one or two fields in common, and then they can use those two fields as connections to look for interesting patterns, but forging those connections is not typically very easy. I’m going to talk more about this shortly because the Velocity V is really the focus of my talk.
  3. The research firm formerly known as “The Gartner Group”
  4. These are classic old-fashioned data integration problems, but they’re an issue with big data projects because people want to integrate more databases more often, sometimes just temporarily to see if anything interesting results.
  5. 1.3. Efficiency of development (see 1.2.) and execution, because you can create indexes based on schemas. 2.1 If I want to add a formerEmployer property to note that someone used to work at one of our customers… 2.3. The SQL standard does specify a way to list a database’s tables, but Oracle and DB2 don’t follow it, and have their own way. http://troels.arvin.dk/db/rdbms/
  6. 3. Many popular NoSQL database managers offer some schema-like features, like MongoDB’s data models and Neo4J’s constraints, but these are obviously very implementation-specific.
  7. 1.3 the NoSQL database is typically assembled to play a specific role in a specific database, as opposed to providing a general-purpose database.
  8. We’ve seen some advantages of using schemas and some advantages of not using them. The choice has often been this: are you going to have a description of every single database field, or are you going to go with no description of any of them? This is a tiny example to fit on the slide. What if I have 12 databases with a hundred properties each? What if I want the advantages that we saw of schema but I’m only interested a combination of 8 fields from one database, 12 from another, and 2 from another? Do I have to choose between using the 12 entire schemas or no schemas at all? How can I use schemas as metadata to drive my use of the specific subset of data that I’m interested in? ETL? We can move this intelligence into program code, but then it’s code, as opposed to re-usable metadata. But, code is less re-usable than schema metadata, and it also doesn’t age well. It’s a lot easier to picture twenty-year-old data or metadata being useful today than twenty-year-old code. Plus, you’re copying data and changing it (transforming it) along the way, which introduces the possibility of errors, and your have to plan around the likely possibility of the copy becoming out of date.
  9. 4. Often associated with Semantic Web or Linked Data technologies. I’m happy to talk about those, but I’m not here to talk about them today. I’m here to talk about how RDFS (and if you like, a little OWL and the associated RDF query language SPARQL) can make it easier to flexibly deal with a variety of data.
  10. (After describing slide) We haven’t even gotten to the RDFS standard yet, and are just using standard parts of RDF. So far, so what? We’ve listed the properties that we’re interested in, in a machine-readable standardized way. For one thing, I can look at this and it can guide me in the writing of a query, because I see what the available properties are. Even better, a program that’s going to generate a form—for example, a search form for this data—can read this schema and generate just such a form. But let’s look at some more interesting things we can do.
  11. There are ways in RDFS to assert that we want to treat surname in one database the same as last name in the other, but it’s even better to relate them to a common one—a standard one if available, and here you can see that I’ve used properties from schema.org, or one that you make up for this purpose. Here we have implemented a simple little bit of data integration to deal with the variety of names in the different data sources. I can search and use the data using these property names (on the right) and it will actually use the data from these property names (on the left).
  12. With most NoSQL applications that I know of, “querying” data means writing code in a scripting language. Some of the tools have their own special query languages, but SPARQL is a standard, and a well-implemented one. The SPARQL query is for querying RDF triples, and our original data was not in triples. How can we query it with triples?
  13. R2RML
  14. (After last build) DON’T BOTHER WITH THIS: To actually act on the schema metadata—that is, to have the application know that it should treat the customer surnames and the conference attendee last names as schema.org family names—requires an inferencing step, there are plenty of commercial and open source tools that can do that. It can even be done with SPARQL queries. The important thing is, it’s all done with documented standards that have implementations and traction.
  15. I’m going to take this little data integration schema that I’ve been developing and enhance it even more by just adding a few more statements. Remember, schema:postalCode stands in for a full URI. rdfs:domain statements can be used by an application generating a report or an editing form.
  16. So far we’ve seen that RDFS gives us ways to list properties and classes and to say things about them in a machine-readable way so that applications can use that data. OWL lets us say more things. This shows some of the triples that a program like D2R might generate from these tables. There wasn’t a “located in” property in schema.org, so I declared one myself. Read through triples, pointing back at tables. “But if locatedIn encompasses both the room and building properties, and locatedIn is transitive, I can just query on locatedIn values to find out what building that piece of equipment is in…”
  17. …with a very simple query. I don’t have to specify any joins or look up foreign keys or anything.
  18. 2.1. we saw some RDFS ones like domain and range; OWL gives you new ones like sameAs from the previous slide 2.2. For example, transitiveProperty is a class, and I said that locatedIn… 2.3. I could define a class called NewCustomers as the set of all customers whose first purchase was in the last 90 days, then use that class to drive decisions about which customers get which communications from the company. This last category is where OWL can be particularly powerful, but also somewhat intimidating. There’s a lot that you can get out of the first two categories.
  19. Returning to this slide to emphasize that while mapping middleware can generate a lot of schema metadata for you, the ability to add more metadata to that, about the fields you’re interested in and only those fields—is very powerful. (build) The metadata lets you tie it all together, or just tied the bits you’re interested in together, using a documented standard with a wide choice of implementations. This is the real key to handling the variety.
  20. 1. A way to say “That data may have been created for one particular application or another, but here’s what I need it for.” 2. If I describe my products for sale using the GoodRelations schema, I can more easily combine my product data with product data from other companies and automate how I sell it using a website or app 3. One example is the way that an earlier slide said that the surname property from the customer database was a subproperty of the family name property from schema.org… 4. … and that lets me (read bullet) Which is ultimately what my presentation here is about.
  21. 2. Bottom up was not necessarily an option 15 years ago. You planned a whole system at a high level and then filled in details before you could do any development before you took advantage of the model. 3.1 Does one data source have 25 tables with dozens of columns in each? Pick the ones that you need for you application and model those. You don’t have to start with weeks of planning. You can start prototyping at a small scale and build organically from there.
  22. 1. Research data, clinical trials, standardized and internal taxonomies, 2. combine sets of production, exploration, and environmental data  3. Looking for new income sources outside of printed books—combining content in different forms from different subsidiaries with different CMSs and other systems and, in the education market that is particulary important to them, lining it up with standards
  23. In a talk like this, it’s more traditional to tell you about the company at the beginning of the talk, but I wanted to wait until the end because you have more context. When I joined the company…
  24. Because of the nature of this conference, and track, I’ve gone into some more of the geeky details about how the standards work and make this kind of integration possible. TopBraid Insight provides a front end that takes advantage of these capabilities of the standards but keeps the geekier details under the hood so that business users can take advantage of them with an intuitive interface.
  25. We have a webinar online
  26. Before I finish I wanted to be a good web citizen and credit the pictures I used on my second slide…
  27. I’d like to finish with this quote from Heraclitus, who lived in the sixth century BC, because it so nicely sums up how if we connect up things that are seemingly unconnected, we can end up with some great new possibilities.