LDM Slides: Data Modeling for XML and JSON

Data Modeling for XML & JSON
Donna Burbank
Global Data Strategy Ltd.
Lessons in Data Modeling DATAVERSITY Series
Dec 6th, 2016

Global Data Strategy, Ltd. 2016
Donna is a recognized industry expert in
information management with over 20
years of experience in data strategy,
information management, data modeling,
metadata management, and enterprise
architecture.
She is currently the Managing Director at
Global Data Strategy, Ltd., an international
information management consulting
company that specialises in the alignment
of business drivers with data-centric
technology. In past roles, she has served in
a number of roles related to data modeling
& metadata:
• Metadata consultant (US, Europe, Asia,
Africa)
• Product Manager PLATINUM Metadata
Repository
• Director of Product Management,
ER/Studio
• VP of Product Marketing, Erwin
• Data modeling & data strategy
implementation & consulting
• Author of 2 books of data modeling &
contributor to 1 book on metadata
management, plus numerous articles
• OMG committee member of the
Information Management Metamodel
(IMM)
As an active contributor to the data
management community, she is a long
time DAMA International member and is
the President of the DAMA Rocky
Mountain chapter. She has worked with
dozens of Fortune 500 companies
worldwide in the Americas, Europe, Asia,
and Africa and speaks regularly at industry
conferences. She has co-authored two
books: Data Modeling for the
Business and Data Modeling Made Simple
with ERwin Data Modeler and is a regular
contributor to industry publications such
as DATAVERSITY, EM360, & TDAN. She can
be reached at
donna.burbank@globaldatastrategy.com
Donna is based in Boulder, Colorado, USA.
Donna Burbank
2
Follow on Twitter @donnaburbank
Today’s hashtag: #LessonsDM

Lessons in Data Modeling Series
• July 28th Why a Data Model is an Important Part of your Data Strategy
• August 25th Data Modeling for Big Data
• September 22nd UML for Data Modeling – When Does it Make Sense?
• October 27th Data Modeling & Metadata Management
• December 6th Data Modeling for XML and JSON
3
This Year’s Line Up

Agenda
• Overview of XML and JSON
• Data Modeling & Metadata for XML & JSON
• Integrating XML & JSON with Databases (Relational & NoSQL)
• RDF & the Semantic Web
• Summary & Questions
4
What we’ll cover today

Assumption
• An assumption for today is that the majority of attendees are familiar with relational databases &
Entity-Relationship (E/R) modeling.
• E.g. Data Modelers, Data Architects, SQL Developers, BI Developers, etc.
• The examples are given with that bias, i.e. a comparison with the relational database world.
5
From Data Modeling for the Business by
Hoberman, Burbank, Bradley, Technics
Publications, 2009

What is XML?
• What is XML? – (Extensible Markup Language) is used to store and transport data.
• Some design principles of XML:
• Simplicity: ease of usage, interoperability & understanding
• Modular design: do one thing well
• Extensible: Ability to easily modify the structure & content
• Self-descriptive: ease of understanding
• Machine readable
• Human readable
• Embedded descriptive tags
• XML is designed for data availability, sharing & transport.
• It requires complementary technology to do anything else. i.e. Someone must write a piece of
software to send, receive, store, or display it, for example:
• HTML: Format & presentation of the data
• Web Service: Transport of the data (e.g. SOAP)
• Database: Store & integrate with other data sources
6

XML and JSON Assist with Data Exchange
7
• XML and JSON can be used to assist with data exchange (B2B, B2C, etc.)
• Companies
• Government Agencies
• Research Organizations
• Etc.
Purchase Order

Emergence & the Growth of Data Exchange
In philosophy, systems theory, science, and art, emergence is
the way complex systems and patterns arise out of a
multiplicity of relatively simple interactions.
- Wikipedia

XML uses a Hierarchical Structure
• XML uses a hierarchical, nested tree structure
• An XML tree starts at a root element and branches from the root to child elements.
• All elements can have sub elements (child elements)
9
<?xml version="1.0"?>
<shipto>
<name>John Smith</name>
<address>123 Main ST</address>
<city>Boise</city>
<country>USA</country>
</shipto>
Root
element
Child
elements

XML is Extensible
• XML is extensible, in that element can be easily added as needed.
• If the <state> element is added below, older applications using the original version will still work.
10
<shipto>
<city>Boise</city>
</shipto>
<shipto>
<city>Boise</city>
<state>ID</state>
</shipto>

XML is Self-Describing
• XML is self-describing (sort of) with the use of element tags
• Human-readable format
• Tags describe the content of the element (sort of)
11
<shipto>
<city>Boise</city>
</shipto>
From reading the tags, it’s
pretty clear that we’re
talking about a “Ship To”
address that contains the
name, address, city &
country.
But it doesn’t provide full metadata, e.g.:
• What’s the data type?
• What’s the business definition?
• Is <name> a required field?

XML Metadata – the XML Schema
• Similar to DDL, an XML Schema (XSD) defines the structure & format of data
12
<?xml version="1.0" encoding="UTF-8" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="shiporder">
<xs:complexType>
<xs:sequence>
<xs:element name="orderperson" type="xs:string"/>
<xs:element name="shipto">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="address" type="xs:string"/>
<xs:element name="city" type="xs:string"/>
<xs:element name="country" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="orderid" type="xs:string" use="required"/>
</xs:complexType>
</xs:element>
</xs:schema> XSD
Metadata
Ship to:
John Smith
123 Main ST
Boise
USA
………………………………………
………………………………………
Order Shipment
Data
<shipto>
<city>Boise</city>
</shipto>
XML
Data

Graphical Models of XML Schemas
13
• XML Schemas can be shown graphically as well as via text.
* Source: Altova

XML Metadata – the XML Schema
• Although the XML Schema does provide some physical structural metadata, full metadata
descriptions are incomplete, e.g.
• Is the name field required?
• What’s the business definition for each field?
• Are there code values and/or reference data that can be used?
• Can a complex data type be used?
• Etc.
14

Levels of Data Modeling
15
Conceptual
Logical
Physical
Purpose
Communication & Definition of
Business Terms & Rules
Clarification & Detail
of Business Rules &
Data Structures
Technical Implementation
with a Physical Database
or Structure
Audience
Business Stakeholders
Data Architecture
Business Analysts
DBAs
Developers
Business Concepts
Data Entities
Physical Tables
XML Schema defines some physical
metadata
But limited or no business metadata

Metadata & Context
From Data Modeling for the Business by Hoberman, Burbank,
Bradley, Technics Publications, 2009
Is this Customer a:
• Premier Customer
• Lapsed Customer
• High Risk Customer?
Can a Customer have
more than one Account?
Is the Ship To Address
related to the Customer
or the Account?
What are the valid state
codes for the Ship To
Address?

XML Assists with Data Exchange
17
• XML and JSON can be used to assist with data exchange (B2B, B2C, etc.)
• Remember modularity, simplicity, etc.
Purchase Order
Dude-all that other stuff
isn’t my job. I’m just
sending the PO!

Integrating XML with Relational Databases
• XML is often used in conjunction with relational databases for permanent storage and integration
with other operational, reporting, and reference data.
18
Purchase Order
Oracle SQL Server

• XML can be translated into relational databases, and vice-versa
19
XML Schema DDL
* Source: Altova

20
• XML can be translated into relational databases, and vice-versa
XML Model Diagram Relational Model Diagram
* Source: Altova

What is JSON?
• What is JSON? – (JavaScript Object Notation) is a minimal, readable format for structuring data. It
is used primarily to transmit data between a server and web application, as an alternative to XML.
• It is similar to XML in that it is:
• "self describing" & human readable
• hierarchical
• simple & interoperable
21
• It differs from XML in that it is:
• can be parsed with standard JavaScript notation
• uses arrays
• can be simpler & shorter to read & write.
{"employees":[
{"firstName":“Shannon", "lastName":“Kempe"},
{"firstName":"Anita", "lastName":“Kress"},
{"firstName":“Tony", "lastName":“Shaw"}
]}
<employees>
<employee>
<firstName>Shannon</firstName>
<lastName>Kempe</lastName>
</employee>
<employee>
<firstName>Anita</firstName>
<lastName>Kress</lastName>
</employee>
<employee>
<firstName>Tony</firstName>
<lastName>Shaw</lastName>
</employee>
</employees>
JSON XML

JSON Metadata – The JSON Schema
22
• The JSON schema offers a richer set of metadata.
{
"id": 127849,
“brand": “Super Cooler",
"price": 12.50,
"tags": [“camping", “sports"]
}
Example Product in the API
Data
• Can the ID contain letters?
• What is a brand?
• Is a price required?
• Etc.
Context Needed
(i.e. Metadata)
For example, assume we have a JSON based product catalog. This catalog has a product which has an id, a brand,
a price, and an optional set of tags.
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Product",
"description": "A retail product from Acme's online catalog",
"type": "object",
"properties": {
"id": {
"description": "The unique identifier for a product",
"type": "integer"
},
“brand": {
"description": “The brand name of the product as shown in the online catalogue",
"type": "string"
},
"price": {
"type": "number",
},
"tags": {
"type": "array",
"items": {
"type": "string"
},
"minItems": 1,
}
},
"required": ["id", “brand", "price"]
}
JSON Schema
Metadata

Integrating JSON with Document Databases
• JSON is often used with document databases, such as MongoDB, which uses JSON documents in
order to store records
• Document databases are popular ways to store unstructured information in a flexible way (e.g.
multimedia, social media posts, etc. )
23
• Each Collection can contain numerous Documents which could all contain
different fields.
{type: “Artifact”,
medium: “Ceramic”
country: “China”,
}
{type: “Book”,
title: “Ancient China”
country: “China”,
}

The Semantic Web & RDF
• The RDF (Resource Description Framework) model from the World Wide Web Consortium (W3C) provides a
way to link resources on the web (people, places, things). It provides a common framework for applications to
share information without losing meaning.
• Search Engines
• Exchanging data between datasets
• Sharing information with applications / APIs
• Building social networks
• Etc.
• The goal is to move from a web of documents to a web of data.
• The Framework is a simple way to express relationships between resources.
• IRIs (International Resource Identifiers) (e.g. URI) identify resources
• Simple triples relate objects together in the format: <subject> <predicate> <object>
• These relationships create a connected Graph
• There are several serialization formats, with RDF XML being a common one. For example:
• Turtle is a human-friendly format
• RDF/XML
• JSON-LD
• Schemas define the vocabularies used to describe the objects
• Dublin Core and Schema.org are two common ones
24
Subject Object
Predicate
ACME
Publishing
RDF is
Easy
Is Publisher Of

Creating a Web of Data
25
@type: Place
Sheraton San Diego Hotel & Marina
1380 Harbor Island Drive
San Diego, California 92101 USA
"@context": "http://schema.org",
“location": {
"@type": "Place",
"name": "Sheraton San Diego Hotel & Marina",
"address": {
"@type": "PostalAddress",
"streetAddress": "1380 Harbor Island Drive",
"addressLocality": "San Diego",
"addressRegion": "CA",
"postalCode": "92101"
},
"telephone" : "+1-877-734-2726",
"image":
"http://edw2016.dataversity.net/uploads/ConfSiteAssets/72/im
age/sheraton.jpg",
"url":"http://edw2016.dataversity.net/travel.cfm"
},
"@context": "http://schema.org",
"location": {
"@type": "Place",
"name": "Sheraton San Diego Hotel & Marina",
"address": {
"@type": "PostalAddress",
"streetAddress": "1380 Harbor Island Drive",
"addressLocality": "San Diego",
"addressRegion": "CA",
"postalCode": "92101"
},
"telephone" : "+1-877-734-2726",
"image": “http://mysite.com/edw16photo.jpg",
"url":“http://mysite.com/myphotos"
},
* Script provided by: Eric Franzon, eric@smartdataconsultants.com
*

Dublin Core Metadata Initiative
• The Dublin Core Metadata Initiative provides a common metadata standards for resources such as
media, library books, etc.
• It defines standards for information such as:
26
http://dublincore.org
Title
Creator
Subject
Description
Publisher
Contributor
Date
Type
Format
Identifier
Source
Language
Relation
Coverage
Rights
 Resources can be described using:
 Text
 HTML
 XML
 RDF XML
Sample Metadata
Format="video/mpeg; 5 minutes“
Language="en"
Publisher=“Kats Online, LLC"
Title=“My Favorite Cat Video“
Subject=“Cats“
Description=“A short video of a black cat playing with string."

Schema.org
• Schema.org is a vocabulary that webmasters can use to mark-up Web pages for the Semantic
Web, so that search engines understand what the pages are about .
• Created by a group of search providers (e.g. Google, Microsoft, Yahoo and Yandex).
• Vocabularies are developed by an open community process
• Through GitHub (https://github.com/schemaorg/schemaorg)
• Using the public-schemaorg@w3.org mailing list
• The schemas are a set of 'types', each associated with a set of properties. The types are arranged
in a hierarchy. There are currently over 570 types, including:
• Creative works
• Organization
• Person
• Place, LocalBusiness, Restaurant
• Product, Offer, AggregateOffer
• Etc.
• There are also extensions for particular industries such as:
• auto.schema.org
• health-lifesci.schema.org
27
 Resources can be described using:
 JSON-LD
 RDFa
 Etc.

There are Many Other Common Schemas & Vocabularies
• The Dublin Core and Schema.org are two popular schemas, but many more exist for particular
subject areas, industries, etc.
• The Linked Open Vocabularies site (LOV) provides a helpful listing
28http://lov.okfn.org/dataset/lov/
Dublin Core
Schema.org
Friend of a Friend

Summary
• XML and JSON are used for transport and interoperability of data
• They offer a variety of benefits
• Simplicity: ease of usage, interoperability & understanding
• Modular design: do one thing well
• Extensible: Ability to easily modify the structure & content
• Self-descriptive: ease of understanding
• Integration with Databases allows for broader enterprise sharing & storage
• Translation to Relational databases
• Storage for Document databases
• Graphical Models can be used across technologies for an intuitive way to visualize hierarchies &
relationships
• The Semantic Web is a powerful way to support the internet as a “web of data”

About Global Data Strategy, Ltd
• Global Data Strategy is an international information management consulting company that specializes
in the alignment of business drivers with data-centric technology.
• Our passion is data, and helping organizations enrich their business opportunities through data and
information.
• Our core values center around providing solutions that are:
• Business-Driven: We put the needs of your business first, before we look at any technology solution.
• Clear & Relevant: We provide clear explanations using real-world examples.
• Customized & Right-Sized: Our implementations are based on the unique needs of your organization’s
size, corporate culture, and geography.
• High Quality & Technically Precise: We pride ourselves in excellence of execution, with years of
technical expertise in the industry.
30
Data-Driven Business Transformation
Business Strategy
Aligned With
Data Strategy
Visit www.globaldatastrategy.com for more information

Contact Info
• Email: donna.burbank@globaldatastrategy.com
• Twitter: @donnaburbank
@GlobalDataStrat
• Website: www.globaldatastrategy.com
• Company Linkedin: https://www.linkedin.com/company/global-data-strategy-ltd
• Personal Linkedin: https://www.linkedin.com/in/donnaburbank
31

DATAVERSITY Training Center
• Learn the basics of Metadata Management and practical tips on how to apply metadata
management in the real world. This online course hosted by DATAVERSITY provides a series of six
courses including:
• What is Metadata
• The Business Value of Metadata
• Sources of Metadata
• Metamodels and Metadata Standards
• Metadata Architecture, Integration, and Storage
• Metadata Strategy and Implementation
• Purchase all six courses for $399 or individually at $79 each.
Register here
• Other courses available on Data Governance & Data Quality
32
Online Training Courses
New Metadata Management Course
Visit: http://training.dataversity.net/lms/

Lessons in Data Modeling Series - 2017
• January 26th How Data Modeling Fits into an Overall Enterprise Architecture
• February 23rd Data Modeling & Business Intelligence
• March 23rd Conceptual Data Models - How to Get the Attention of Business Users
(for a Technical Audience)
• April 27th The Evolving Role of the Data Architect – What Does it Mean for Your Career?
• May 25th Data Modeling & Metadata Management
• June 22nd Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –
how do they fit together?
• July 27th Data Modeling & Metadata for Graph Databases
• August 24th Data Modeling & Data Integration
• September 28th Data Modeling & MDM
• October 26th Agile & Data Modeling – How can they work together?
• December 5th Data Modeling, Data Governance, & Data Quality
33
Next Year’s Line Up

Questions?
34
Thoughts? Ideas?

LDM Slides: Data Modeling for XML and JSON

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to LDM Slides: Data Modeling for XML and JSON

Similar to LDM Slides: Data Modeling for XML and JSON (20)

More from DATAVERSITY

More from DATAVERSITY (20)

Recently uploaded

Recently uploaded (20)

LDM Slides: Data Modeling for XML and JSON