SlideShare a Scribd company logo
If you can't Beat 'em, Join 'em!
Integrating NoSQL data elements into a relational model
This presentation and SQL file is on SlideShare and PGCon2015
http://www.slideshare.net/jamesphanson/pg-no-sqlbeatemjoinemv10sql
Jamey Hanson
jhanson@freedomconsultinggroup.com
jamesphanson@yahoo.com
@jamey_hanson
Freedom Consulting Group
http://www.freedomconsultinggroup.com
PGCon 2015,
Ottawa, ON CA
19-Jun-2015
NoSQL is magic a panacea 42 a hot mess
really useful in some situations, not applicable in
other situations - and here to stay.
Q: How can PostgreSQL thrive in a mixed
NoSQL environment?
A: By integrating NoSQL data types and features
plus understanding where PostgreSQL is - and is
not - a good fit.
NoSQL hype check
PGCon 2015 Ottawa, ON CA 19-Jun-20152
Stages of NoSQL acceptance …
PGCon 2015 Ottawa, ON CA 19-Jun-20153
NoSQL
RDBMS
Given that I have PostgreSQL,
how can I leverage NoSQL data?
Given that I have NoSQL data,
how can I leverage PostgreSQL?
Reverse the traditional approach
PGCon 2015 Ottawa, ON CA 19-Jun-20154
The framework and approach come from
PGCon 2015 Ottawa, ON CA 19-Jun-20155
Martin Fowler's book NoSQL Distilled
and his term
Polyglot Persistence
Jamey Hanson
jhanson@freedomconsultinggroup.com
jamesphanson@yahoo.com
Manage a team for Freedom Consulting Group migrating
applications from Oracle to Postgres Plus Advanced Server and
PostgreSQL in the government space. We are subcontracting to
EnterpriseDB
Overly certified: PMP, CISSP, CSEP, OCP in 5 versions of Oracle,
Cloudera developer & admin. Used to be NetApp admin and
MCSE. I teach PMP and CISSP at the Univ. MD training center
Alumnus of multiple schools and was C-130 aircrew
About the author
PGConf US, NYC 26-Mar-20156
}  Document store (MongoDB)
}  Wide column store (Cassandra)*
a.k.a. Column family database
}  Key-value store (Redis)
}  Graph DBMS (Neo4j)
}  Search engine (Solr)**
Categories adapted from db-engines.com and
Martin Fowler, martinfowler.com/nosql.html
* Not covered in this presentation.
** See PGConf 2015 NYC "Full Text Search with Ranked Results"
What is NoSQL … for the next 45 min?
PGCon 2015 Ottawa, ON CA 19-Jun-20157
}  Too much data for a single machine to process.
RDBMS is a "single-or-few" machine architecture.*
NoSQL has expectations of sharding across large cluster
while RDBMS does not.*
}  Shard – divide database into aggregates of related business data
and spread the entire database across a cluster.
Sharding incorporates (changes to) application design.
* Sharded RDBMSs have been developed but they are more
difficult than NoSQL sharding and have not been as successful.
Why NoSQL (vs. RDBMS/PostgreSQL)? 1
PGCon 2015 Ottawa, ON CA 19-Jun-20158
}  Because RDBMSs store data in very small pieces in lots of
different places, which does not match object-oriented
methodology and is inconvenient for developers.
}  A.k.a. Object-relational Impedance Mismatch, which is handled
by Object Relational Mapping (ORM) tools such as Hibernate.
Why NoSQL (vs. RDBMS/PostgreSQL)? 2
PGCon 2015 Ottawa, ON CA 19-Jun-20159
RDBMS
NoSQL
}  Because RDBMS data models are difficult to modify as
data structures and business needs change.
}  RDBMS models must be consistent for all the data in a table. It
is not possible to have legacy data use on structure, new data
use a different structure and keep them all in the same table(s).
Why NoSQL (vs. RDBMS/PostgreSQL)? 3
PGCon 2015 Ottawa, ON CA 19-Jun-201510
}  Because RDBMS is the incumbent.
}  Installed everywhere, widely understood, mature technology
that still has active development – such as this conference.
}  Because RDBMS have transactions and a consistent view
of the data.
}  Sometimes you need to change a small piece of data and you
need every connection to see that change instantly.
}  Because most data sets are not Google-sized.
}  A single machine easily can process terabytes of data.
}  Because RDBMS are better at finding relationships* and
enforcing data integrity. (*Graph databases are an exception.)
}  Sometimes you want to stop bad-data from loading.
Why RDBMS/PostgreSQL (vs. NoSQL)?
PGCon 2015 Ottawa, ON CA 19-Jun-201511
}  A: With Polyglot Persistence*.
Organizations will have relational and NoSQL databases
… our job is to match the business needs + data to the
technology.
* Polyglot Persistence was coined by Martin Fowler.
It refers to using multiple database tools and architectures.
}  This presentation is about identifying where PostgreSQL
is a great fit and demonstrating how to integrate NoSQL
data into PostgreSQL's relational model.
PostgreSQL can thrive – not just survive –
in a world that includes NoSQL.
Q: Where does this leave us?
PGCon 2015 Ottawa, ON CA 19-Jun-201512
PostgreSQL NoSQL data sweet spot
PGCon 2015 Ottawa, ON CA 19-Jun-201513
Amount of data
Sizeofbusinesstransactions
doesnotfitonafewmachines
doesnotfitononemachine
document sized
geographic region
need to change individual data elements
PostgreSQL's NoSQL
sweet spot!
NoSQL tools
}  Scenario: You are given ~ 1 million JSON files and need to
decide how to handle them. (i.e. Do I need MongoDB?)
}  Q: Can you process this data on a single/few servers?
A: Easily!
}  Q: How large are the business transactions?
(element-level, document-level or other?)
A: I have no idea.*
}  Perfect for PostgreSQL integrating NoSQL data.
* PostgreSQL can handle most answers.
NoSQL can (generally) only handle document-level or larger transactions.
On to the technical parts …
PGCon 2015 Ottawa, ON CA 19-Jun-201514
}  JavaScript Object Notation:
A widely accepted, human readable open standard for
transmitting optionally-nested key-value pairs.
{
"firstName": "John",
"lastName": "Smith",
"isAlive": true,
"age": 25,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY"
"postalCode": "10021-3100"
}...
What is JSON?
PGCon 2015 Ottawa, ON CA 19-Jun-201515
Also applies to XML … but it's not as cool
PGCon 2015 Ottawa, ON CA 19-Jun-201516
}  10,000 files from the Million Song Database (MSD)
}  http://labrosa.ee.columbia.edu/millionsong/lastfm
}  http://labrosa.ee.columbia.edu/millionsong/sites/default/files/
lastfm/lastfm_subset.zip
}  Each file includes the song's:
}  track_id
}  artist
}  title
}  0-N key-value pairs of similar track_id's and weights.
}  0-N key-value pairs of song tags and weights.
What's in our JSON files?
PGCon 2015 Ottawa, ON CA 19-Jun-201517
}  Create a table with JSONB* data type.
CREATE TABLE j_songs (
id SERIAL PRIMARY KEY,
song JSONB
);
}  Use COPY command to load each file.
COPY j_songs (song) FROM '/NoSQL/TRAAFD.json'
CSV QUOTE e'x01' DELIMITER e'x01';
NOTE: The e'x01' parameter handles embedded quotes.
*There are very few reasons to use JSON over JSONB
How do I load JSON files into PostgreSQL?
PGCon 2015 Ottawa, ON CA 19-Jun-201518
}  Requires some Linux work … but not too bad.
}  Extract JSON files into OS postgres's ~/NoSQL
$ unzip ~/NoSQL/lastfm_subset.zip
}  Create a symbolic link for each JSON file from ~/NoSQL
to $PGDATA/ExtFiles
$ find ~/NoSQL/lastfm_subset -name *.json|
xargs -i ln -s {} $PGDATA/ExtFiles/
How do I load JSON files into PostgreSQL?
PGCon 2015 Ottawa, ON CA 19-Jun-201519
}  Use pg_ls_dir* to generate the file-loading SQL
SELECT 'COPY nosql.j_songs(song) FROM
''ExtFiles/' || pg_ls_dir('ExtFiles') ||
''' CSV QUOTE e''x01''
DELIMITER e''x02'';';
* pg_ls_dir can list directory contents under $PGDATA.
This is why we created symbolic links in under $PGDATA
We could also have used COPY command in the files' original location.
How do I load JSON files into PostgreSQL?
PGCon 2015 Ottawa, ON CA 19-Jun-201520
Switch to SQL interactive
Prepare and loading JSON files.
PGCon 2015 Ottawa, ON CA 19-Jun-201521
NOTE: The SQL statements are in the file PG_NOSQL_BeatEmJoin_vXX.sql,
which is loaded in SlideShare and the PGCon web page.
}  Return tags
SELECT DISTINCT jsonb_object_keys(song)...
}  Return values
SELECT
song ->> 'title' AS title, -- return TEXT
song -> 'artist' AS artist, -- return JSON
FROM j_songs ...
}  Match tags
WHERE song @> '{"artist":"Arctic Monkeys"}'::JSONB
Exploring JSON data
PGCon 2015 Ottawa, ON CA 19-Jun-201522
Switch to SQL interactive
Exploring JSON data and indexing
PGCon 2015 Ottawa, ON CA 19-Jun-201523
}  Some interfaces – and a lot of existing code – require an
RDBMS structure.
}  JPA (a.k.a. Hibernate) SQL cannot interact with non-RDBMS
structures.
}  Present JSON data as a view or materialized view.
CREATE OR REPLACE VIEW v_songs AS
SELECT
song ->> 'track_id' AS track_id,
song ->> 'artist' AS artist,
song ->> 'title'
Present JSON as an RDBMS relation
PGCon 2015 Ottawa, ON CA 19-Jun-201524
Switch to SQL interactive
Presenting JSON as a view and/or
materialized view
PGCon 2015 Ottawa, ON CA 19-Jun-201525
}  The tags and similars JSON elements contain
arrays of key-value-pairs.
}  Convert them to HSTORE
}  track_id, artist and title are present in every
JSON file – so we can turn them into columns.
CREATE TABLE h_songs (
track_id TEXT PRIMARY KEY,
artist TEXT,
title TEXT,
tags HSTORE,
similars HSTORE);
Transform tags and similars to HSTORE
PGCon 2015 Ottawa, ON CA 19-Jun-201526
}  Return elements in the arrays with
jsonb_array_elements
jsonb_array_elements (song -> 'tags') ->> 0
AS tag_key,
}  Build the HSTORE column with HSTORE and
array_agg operators.
HSTORE (array_agg(tag_key), array_agg(tag_value))
Use JSON and HSTORE operators to convert
PGCon 2015 Ottawa, ON CA 19-Jun-201527
Switch to SQL interactive
Convert from JSON to HSTORE
PGCon 2015 Ottawa, ON CA 19-Jun-201528
}  Select records with a specific tag and value.
SELECT
artist,
title,
tags -> 'latin'
FROM h_songs
WHERE
tags ? 'latin'
and (tags -> 'latin')::INTEGER > 67;
HSTORE also has operators and indexes
PGCon 2015 Ottawa, ON CA 19-Jun-201529
Switch to SQL interactive
Explore and index HSTORE key-value pairs
PGCon 2015 Ottawa, ON CA 19-Jun-201530
}  The similars element (key-value pairs of
similar_song => weight) ~ edges on a graph.
}  Neo4j is the leading NoSQL graph database.
Can also explore songs data as a graph
PGCon 2015 Ottawa, ON CA 19-Jun-201531
Neo4j-sized graph PostgreSQL-sized graph
}  Example adapted from PostgreSQL documentation.
}  Transform songs data format into:
}  track_id
}  similar_track_id (a.k.a. link)
}  weight
}  Filter our data set to 'rock' songs with relatively strong links
… because recursive queries are expensive.
}  Answer the burning question:
Is there a path of related songs from Lady Gaga Poker Face to
Justin Timberlake What Goes Around … Comes Around?
Use recursive query to find path
PGCon 2015 Ottawa, ON CA 19-Jun-201532
Switch to SQL interactive
Explore recursive query and graphing
PGCon 2015 Ottawa, ON CA 19-Jun-201533
Graph song results
PGCon 2015 Ottawa, ON CA 19-Jun-201534
0.280
0.214
0.190
0.301
}  Assertion: NoSQL is here to stay.
our task is to thrive within Polyglot Persistence world.
}  5-ish types of NoSQL databases.
PostgreSQL plays nice with 4 of them:
}  Document store (MongoDB)
}  Wide column store (Cassandra)
}  Key-value store (Redis)
}  Graph DBMS (Neo4j)
}  Search engine (Solr)*
*See "Full Test Search with Ranked Results" from PGConf 2015.
Summary (1)
PGCon 2015 Ottawa, ON CA 19-Jun-201535
}  PostgreSQL can load, interact with and present
NoSQL data in a relational structure.
}  PostgreSQL's sweet spot:
}  Data volume that fits well on one or a few servers.
}  Transaction boundaries from element to document level.
}  Want to enforce (some) referential integrity.
}  Want to find relations within data.
NOTE: This is what most organizations call "my real data".
}  Leave edge cases to NoSQL tools.
Summary (2)
PGCon 2015 Ottawa, ON CA 19-Jun-201536
}  Database architecture rules are more subtle and
complex now.
}  It is too simplistic to think If it's not in at least 3NF –
it's wrong.
}  Know your business needs.
}  Know your data.
}  Know the strengths and limitations of the relational
model plus NoSQL.
Seek to thrive in a world of Polyglot Persistence.
Summary (3)
PGCon 2015 Ottawa, ON CA 19-Jun-201537
Are
there
any
Questions or
follow up?
PGConf US, NYC 26-Mar-201538
LIFE. LIBERTY. TECHNOLOGY.
Freedom Consulting Group is a talented, hard-working,
and committed partner, providing hardware, software
and database development and integration services
to a diverse set of clients.
“
”
POSTGRES
innovation
ENTERPRISE
reliability
24/7
support
Services
& training
Enterprise-class
features, tools &
compatibility
Indemnification
Product
road-map
Control
Thousands
of developers
Fast
development
cycles
Low cost
No vendor
lock-in
Advanced
features
Enabling commercial
adoption of Postgres

More Related Content

What's hot

Custom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBCustom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDB
ArangoDB Database
 
Running complex data queries in a distributed system
Running complex data queries in a distributed systemRunning complex data queries in a distributed system
Running complex data queries in a distributed system
ArangoDB Database
 
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release WebinarA Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
ArangoDB Database
 
Introduction to DGraph - A Graph Database
Introduction to DGraph - A Graph DatabaseIntroduction to DGraph - A Graph Database
Introduction to DGraph - A Graph Database
Knoldus Inc.
 
RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4
Marin Dimitrov
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Olaf Hartig
 
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Sergio Fernández
 
Bicod2017
Bicod2017Bicod2017
Bicod2017
Rim Moussa
 
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoDB Database
 
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
Juan Sequeda
 
Considerations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT projectConsiderations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT project
Akmal Chaudhri
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Christophe Debruyne
 
Graph Databases, The Web of Data Storage Engines
Graph Databases, The Web of Data Storage EnginesGraph Databases, The Web of Data Storage Engines
Graph Databases, The Web of Data Storage Engines
Pere Urbón-Bayes
 
Introducing Datawave
Introducing DatawaveIntroducing Datawave
Introducing Datawave
Accumulo Summit
 
IEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
IEEE IRI 16 - Clustering Web Pages based on Structure and Style SimilarityIEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
IEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
Thamme Gowda
 
Clustering output of Apache Nutch using Apache Spark
Clustering output of Apache Nutch using Apache SparkClustering output of Apache Nutch using Apache Spark
Clustering output of Apache Nutch using Apache Spark
Thamme Gowda
 
ROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data Stack
Martin Voigt
 
Considerations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT projectConsiderations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT project
Akmal Chaudhri
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
Giorgos Santipantakis
 
Doc store
Doc storeDoc store
Doc store
Mysql User Camp
 

What's hot (20)

Custom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBCustom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDB
 
Running complex data queries in a distributed system
Running complex data queries in a distributed systemRunning complex data queries in a distributed system
Running complex data queries in a distributed system
 
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release WebinarA Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
 
Introduction to DGraph - A Graph Database
Introduction to DGraph - A Graph DatabaseIntroduction to DGraph - A Graph Database
Introduction to DGraph - A Graph Database
 
RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
 
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
 
Bicod2017
Bicod2017Bicod2017
Bicod2017
 
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
 
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
 
Considerations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT projectConsiderations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT project
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
 
Graph Databases, The Web of Data Storage Engines
Graph Databases, The Web of Data Storage EnginesGraph Databases, The Web of Data Storage Engines
Graph Databases, The Web of Data Storage Engines
 
Introducing Datawave
Introducing DatawaveIntroducing Datawave
Introducing Datawave
 
IEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
IEEE IRI 16 - Clustering Web Pages based on Structure and Style SimilarityIEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
IEEE IRI 16 - Clustering Web Pages based on Structure and Style Similarity
 
Clustering output of Apache Nutch using Apache Spark
Clustering output of Apache Nutch using Apache SparkClustering output of Apache Nutch using Apache Spark
Clustering output of Apache Nutch using Apache Spark
 
ROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data StackROI in Linking Content to CRM by Applying the Linked Data Stack
ROI in Linking Content to CRM by Applying the Linked Data Stack
 
Considerations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT projectConsiderations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT project
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Doc store
Doc storeDoc store
Doc store
 

Similar to Pg no sql_beatemjoinem_v10

Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoDElephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Jamey Hanson
 
BICOD-2017
BICOD-2017BICOD-2017
BICOD-2017
Rim Moussa
 
If NoSQL is your answer, you are probably asking the wrong question.
If NoSQL is your answer, you are probably asking the wrong question.If NoSQL is your answer, you are probably asking the wrong question.
If NoSQL is your answer, you are probably asking the wrong question.
Lukas Smith
 
Semantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12cSemantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12c
Martin Toshev
 
My 2019 SQL Saturday In San Diego presentation
My 2019 SQL Saturday In San Diego presentationMy 2019 SQL Saturday In San Diego presentation
My 2019 SQL Saturday In San Diego presentation
Steve Rezhener, MBA
 
Ontotext's GraphDB Connectors
Ontotext's GraphDB ConnectorsOntotext's GraphDB Connectors
Ontotext's GraphDB Connectors
logomachy
 
NoSQL Database: Classification, Characteristics and Comparison
NoSQL Database: Classification, Characteristics and ComparisonNoSQL Database: Classification, Characteristics and Comparison
NoSQL Database: Classification, Characteristics and Comparison
Mayuree Srikulwong
 
PostgreSQL versus MySQL - What Are The Real Differences
PostgreSQL versus MySQL - What Are The Real DifferencesPostgreSQL versus MySQL - What Are The Real Differences
PostgreSQL versus MySQL - What Are The Real Differences
All Things Open
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresql
Zaid Shabbir
 
GDBinSV_Meetup_DBMS_Trends_10062016
GDBinSV_Meetup_DBMS_Trends_10062016GDBinSV_Meetup_DBMS_Trends_10062016
GDBinSV_Meetup_DBMS_Trends_10062016
Joshua Bae
 
NoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSONNoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSON
Mario Beck
 
Os Lonergan
Os LonerganOs Lonergan
Os Lonergan
oscon2007
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
StreamNative
 
.NET Development for SQL Server Developer
.NET Development for SQL Server Developer.NET Development for SQL Server Developer
.NET Development for SQL Server Developer
Marco Parenzan
 
SQL vs NoSQL deep dive
SQL vs NoSQL deep diveSQL vs NoSQL deep dive
SQL vs NoSQL deep dive
Ahmed Shaaban
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
Marcin Bielak
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
Gezim Sejdiu
 
How do You Graph
How do You GraphHow do You Graph
How do You Graph
Ben Krug
 
Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...
Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...
Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...
Ahsan Javed Awan
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
Shamima Yeasmin Mukta
 

Similar to Pg no sql_beatemjoinem_v10 (20)

Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoDElephants vs. Dolphins:  Comparing PostgreSQL and MySQL for use in the DoD
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoD
 
BICOD-2017
BICOD-2017BICOD-2017
BICOD-2017
 
If NoSQL is your answer, you are probably asking the wrong question.
If NoSQL is your answer, you are probably asking the wrong question.If NoSQL is your answer, you are probably asking the wrong question.
If NoSQL is your answer, you are probably asking the wrong question.
 
Semantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12cSemantic Technology In Oracle Database 12c
Semantic Technology In Oracle Database 12c
 
My 2019 SQL Saturday In San Diego presentation
My 2019 SQL Saturday In San Diego presentationMy 2019 SQL Saturday In San Diego presentation
My 2019 SQL Saturday In San Diego presentation
 
Ontotext's GraphDB Connectors
Ontotext's GraphDB ConnectorsOntotext's GraphDB Connectors
Ontotext's GraphDB Connectors
 
NoSQL Database: Classification, Characteristics and Comparison
NoSQL Database: Classification, Characteristics and ComparisonNoSQL Database: Classification, Characteristics and Comparison
NoSQL Database: Classification, Characteristics and Comparison
 
PostgreSQL versus MySQL - What Are The Real Differences
PostgreSQL versus MySQL - What Are The Real DifferencesPostgreSQL versus MySQL - What Are The Real Differences
PostgreSQL versus MySQL - What Are The Real Differences
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresql
 
GDBinSV_Meetup_DBMS_Trends_10062016
GDBinSV_Meetup_DBMS_Trends_10062016GDBinSV_Meetup_DBMS_Trends_10062016
GDBinSV_Meetup_DBMS_Trends_10062016
 
NoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSONNoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSON
 
Os Lonergan
Os LonerganOs Lonergan
Os Lonergan
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
.NET Development for SQL Server Developer
.NET Development for SQL Server Developer.NET Development for SQL Server Developer
.NET Development for SQL Server Developer
 
SQL vs NoSQL deep dive
SQL vs NoSQL deep diveSQL vs NoSQL deep dive
SQL vs NoSQL deep dive
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
 
How do You Graph
How do You GraphHow do You Graph
How do You Graph
 
Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...
Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...
Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
 

Recently uploaded

University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 

Recently uploaded (20)

University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 

Pg no sql_beatemjoinem_v10

  • 1. If you can't Beat 'em, Join 'em! Integrating NoSQL data elements into a relational model This presentation and SQL file is on SlideShare and PGCon2015 http://www.slideshare.net/jamesphanson/pg-no-sqlbeatemjoinemv10sql Jamey Hanson jhanson@freedomconsultinggroup.com jamesphanson@yahoo.com @jamey_hanson Freedom Consulting Group http://www.freedomconsultinggroup.com PGCon 2015, Ottawa, ON CA 19-Jun-2015
  • 2. NoSQL is magic a panacea 42 a hot mess really useful in some situations, not applicable in other situations - and here to stay. Q: How can PostgreSQL thrive in a mixed NoSQL environment? A: By integrating NoSQL data types and features plus understanding where PostgreSQL is - and is not - a good fit. NoSQL hype check PGCon 2015 Ottawa, ON CA 19-Jun-20152
  • 3. Stages of NoSQL acceptance … PGCon 2015 Ottawa, ON CA 19-Jun-20153 NoSQL RDBMS
  • 4. Given that I have PostgreSQL, how can I leverage NoSQL data? Given that I have NoSQL data, how can I leverage PostgreSQL? Reverse the traditional approach PGCon 2015 Ottawa, ON CA 19-Jun-20154
  • 5. The framework and approach come from PGCon 2015 Ottawa, ON CA 19-Jun-20155 Martin Fowler's book NoSQL Distilled and his term Polyglot Persistence
  • 6. Jamey Hanson jhanson@freedomconsultinggroup.com jamesphanson@yahoo.com Manage a team for Freedom Consulting Group migrating applications from Oracle to Postgres Plus Advanced Server and PostgreSQL in the government space. We are subcontracting to EnterpriseDB Overly certified: PMP, CISSP, CSEP, OCP in 5 versions of Oracle, Cloudera developer & admin. Used to be NetApp admin and MCSE. I teach PMP and CISSP at the Univ. MD training center Alumnus of multiple schools and was C-130 aircrew About the author PGConf US, NYC 26-Mar-20156
  • 7. }  Document store (MongoDB) }  Wide column store (Cassandra)* a.k.a. Column family database }  Key-value store (Redis) }  Graph DBMS (Neo4j) }  Search engine (Solr)** Categories adapted from db-engines.com and Martin Fowler, martinfowler.com/nosql.html * Not covered in this presentation. ** See PGConf 2015 NYC "Full Text Search with Ranked Results" What is NoSQL … for the next 45 min? PGCon 2015 Ottawa, ON CA 19-Jun-20157
  • 8. }  Too much data for a single machine to process. RDBMS is a "single-or-few" machine architecture.* NoSQL has expectations of sharding across large cluster while RDBMS does not.* }  Shard – divide database into aggregates of related business data and spread the entire database across a cluster. Sharding incorporates (changes to) application design. * Sharded RDBMSs have been developed but they are more difficult than NoSQL sharding and have not been as successful. Why NoSQL (vs. RDBMS/PostgreSQL)? 1 PGCon 2015 Ottawa, ON CA 19-Jun-20158
  • 9. }  Because RDBMSs store data in very small pieces in lots of different places, which does not match object-oriented methodology and is inconvenient for developers. }  A.k.a. Object-relational Impedance Mismatch, which is handled by Object Relational Mapping (ORM) tools such as Hibernate. Why NoSQL (vs. RDBMS/PostgreSQL)? 2 PGCon 2015 Ottawa, ON CA 19-Jun-20159 RDBMS NoSQL
  • 10. }  Because RDBMS data models are difficult to modify as data structures and business needs change. }  RDBMS models must be consistent for all the data in a table. It is not possible to have legacy data use on structure, new data use a different structure and keep them all in the same table(s). Why NoSQL (vs. RDBMS/PostgreSQL)? 3 PGCon 2015 Ottawa, ON CA 19-Jun-201510
  • 11. }  Because RDBMS is the incumbent. }  Installed everywhere, widely understood, mature technology that still has active development – such as this conference. }  Because RDBMS have transactions and a consistent view of the data. }  Sometimes you need to change a small piece of data and you need every connection to see that change instantly. }  Because most data sets are not Google-sized. }  A single machine easily can process terabytes of data. }  Because RDBMS are better at finding relationships* and enforcing data integrity. (*Graph databases are an exception.) }  Sometimes you want to stop bad-data from loading. Why RDBMS/PostgreSQL (vs. NoSQL)? PGCon 2015 Ottawa, ON CA 19-Jun-201511
  • 12. }  A: With Polyglot Persistence*. Organizations will have relational and NoSQL databases … our job is to match the business needs + data to the technology. * Polyglot Persistence was coined by Martin Fowler. It refers to using multiple database tools and architectures. }  This presentation is about identifying where PostgreSQL is a great fit and demonstrating how to integrate NoSQL data into PostgreSQL's relational model. PostgreSQL can thrive – not just survive – in a world that includes NoSQL. Q: Where does this leave us? PGCon 2015 Ottawa, ON CA 19-Jun-201512
  • 13. PostgreSQL NoSQL data sweet spot PGCon 2015 Ottawa, ON CA 19-Jun-201513 Amount of data Sizeofbusinesstransactions doesnotfitonafewmachines doesnotfitononemachine document sized geographic region need to change individual data elements PostgreSQL's NoSQL sweet spot! NoSQL tools
  • 14. }  Scenario: You are given ~ 1 million JSON files and need to decide how to handle them. (i.e. Do I need MongoDB?) }  Q: Can you process this data on a single/few servers? A: Easily! }  Q: How large are the business transactions? (element-level, document-level or other?) A: I have no idea.* }  Perfect for PostgreSQL integrating NoSQL data. * PostgreSQL can handle most answers. NoSQL can (generally) only handle document-level or larger transactions. On to the technical parts … PGCon 2015 Ottawa, ON CA 19-Jun-201514
  • 15. }  JavaScript Object Notation: A widely accepted, human readable open standard for transmitting optionally-nested key-value pairs. { "firstName": "John", "lastName": "Smith", "isAlive": true, "age": 25, "address": { "streetAddress": "21 2nd Street", "city": "New York", "state": "NY" "postalCode": "10021-3100" }... What is JSON? PGCon 2015 Ottawa, ON CA 19-Jun-201515
  • 16. Also applies to XML … but it's not as cool PGCon 2015 Ottawa, ON CA 19-Jun-201516
  • 17. }  10,000 files from the Million Song Database (MSD) }  http://labrosa.ee.columbia.edu/millionsong/lastfm }  http://labrosa.ee.columbia.edu/millionsong/sites/default/files/ lastfm/lastfm_subset.zip }  Each file includes the song's: }  track_id }  artist }  title }  0-N key-value pairs of similar track_id's and weights. }  0-N key-value pairs of song tags and weights. What's in our JSON files? PGCon 2015 Ottawa, ON CA 19-Jun-201517
  • 18. }  Create a table with JSONB* data type. CREATE TABLE j_songs ( id SERIAL PRIMARY KEY, song JSONB ); }  Use COPY command to load each file. COPY j_songs (song) FROM '/NoSQL/TRAAFD.json' CSV QUOTE e'x01' DELIMITER e'x01'; NOTE: The e'x01' parameter handles embedded quotes. *There are very few reasons to use JSON over JSONB How do I load JSON files into PostgreSQL? PGCon 2015 Ottawa, ON CA 19-Jun-201518
  • 19. }  Requires some Linux work … but not too bad. }  Extract JSON files into OS postgres's ~/NoSQL $ unzip ~/NoSQL/lastfm_subset.zip }  Create a symbolic link for each JSON file from ~/NoSQL to $PGDATA/ExtFiles $ find ~/NoSQL/lastfm_subset -name *.json| xargs -i ln -s {} $PGDATA/ExtFiles/ How do I load JSON files into PostgreSQL? PGCon 2015 Ottawa, ON CA 19-Jun-201519
  • 20. }  Use pg_ls_dir* to generate the file-loading SQL SELECT 'COPY nosql.j_songs(song) FROM ''ExtFiles/' || pg_ls_dir('ExtFiles') || ''' CSV QUOTE e''x01'' DELIMITER e''x02'';'; * pg_ls_dir can list directory contents under $PGDATA. This is why we created symbolic links in under $PGDATA We could also have used COPY command in the files' original location. How do I load JSON files into PostgreSQL? PGCon 2015 Ottawa, ON CA 19-Jun-201520
  • 21. Switch to SQL interactive Prepare and loading JSON files. PGCon 2015 Ottawa, ON CA 19-Jun-201521 NOTE: The SQL statements are in the file PG_NOSQL_BeatEmJoin_vXX.sql, which is loaded in SlideShare and the PGCon web page.
  • 22. }  Return tags SELECT DISTINCT jsonb_object_keys(song)... }  Return values SELECT song ->> 'title' AS title, -- return TEXT song -> 'artist' AS artist, -- return JSON FROM j_songs ... }  Match tags WHERE song @> '{"artist":"Arctic Monkeys"}'::JSONB Exploring JSON data PGCon 2015 Ottawa, ON CA 19-Jun-201522
  • 23. Switch to SQL interactive Exploring JSON data and indexing PGCon 2015 Ottawa, ON CA 19-Jun-201523
  • 24. }  Some interfaces – and a lot of existing code – require an RDBMS structure. }  JPA (a.k.a. Hibernate) SQL cannot interact with non-RDBMS structures. }  Present JSON data as a view or materialized view. CREATE OR REPLACE VIEW v_songs AS SELECT song ->> 'track_id' AS track_id, song ->> 'artist' AS artist, song ->> 'title' Present JSON as an RDBMS relation PGCon 2015 Ottawa, ON CA 19-Jun-201524
  • 25. Switch to SQL interactive Presenting JSON as a view and/or materialized view PGCon 2015 Ottawa, ON CA 19-Jun-201525
  • 26. }  The tags and similars JSON elements contain arrays of key-value-pairs. }  Convert them to HSTORE }  track_id, artist and title are present in every JSON file – so we can turn them into columns. CREATE TABLE h_songs ( track_id TEXT PRIMARY KEY, artist TEXT, title TEXT, tags HSTORE, similars HSTORE); Transform tags and similars to HSTORE PGCon 2015 Ottawa, ON CA 19-Jun-201526
  • 27. }  Return elements in the arrays with jsonb_array_elements jsonb_array_elements (song -> 'tags') ->> 0 AS tag_key, }  Build the HSTORE column with HSTORE and array_agg operators. HSTORE (array_agg(tag_key), array_agg(tag_value)) Use JSON and HSTORE operators to convert PGCon 2015 Ottawa, ON CA 19-Jun-201527
  • 28. Switch to SQL interactive Convert from JSON to HSTORE PGCon 2015 Ottawa, ON CA 19-Jun-201528
  • 29. }  Select records with a specific tag and value. SELECT artist, title, tags -> 'latin' FROM h_songs WHERE tags ? 'latin' and (tags -> 'latin')::INTEGER > 67; HSTORE also has operators and indexes PGCon 2015 Ottawa, ON CA 19-Jun-201529
  • 30. Switch to SQL interactive Explore and index HSTORE key-value pairs PGCon 2015 Ottawa, ON CA 19-Jun-201530
  • 31. }  The similars element (key-value pairs of similar_song => weight) ~ edges on a graph. }  Neo4j is the leading NoSQL graph database. Can also explore songs data as a graph PGCon 2015 Ottawa, ON CA 19-Jun-201531 Neo4j-sized graph PostgreSQL-sized graph
  • 32. }  Example adapted from PostgreSQL documentation. }  Transform songs data format into: }  track_id }  similar_track_id (a.k.a. link) }  weight }  Filter our data set to 'rock' songs with relatively strong links … because recursive queries are expensive. }  Answer the burning question: Is there a path of related songs from Lady Gaga Poker Face to Justin Timberlake What Goes Around … Comes Around? Use recursive query to find path PGCon 2015 Ottawa, ON CA 19-Jun-201532
  • 33. Switch to SQL interactive Explore recursive query and graphing PGCon 2015 Ottawa, ON CA 19-Jun-201533
  • 34. Graph song results PGCon 2015 Ottawa, ON CA 19-Jun-201534 0.280 0.214 0.190 0.301
  • 35. }  Assertion: NoSQL is here to stay. our task is to thrive within Polyglot Persistence world. }  5-ish types of NoSQL databases. PostgreSQL plays nice with 4 of them: }  Document store (MongoDB) }  Wide column store (Cassandra) }  Key-value store (Redis) }  Graph DBMS (Neo4j) }  Search engine (Solr)* *See "Full Test Search with Ranked Results" from PGConf 2015. Summary (1) PGCon 2015 Ottawa, ON CA 19-Jun-201535
  • 36. }  PostgreSQL can load, interact with and present NoSQL data in a relational structure. }  PostgreSQL's sweet spot: }  Data volume that fits well on one or a few servers. }  Transaction boundaries from element to document level. }  Want to enforce (some) referential integrity. }  Want to find relations within data. NOTE: This is what most organizations call "my real data". }  Leave edge cases to NoSQL tools. Summary (2) PGCon 2015 Ottawa, ON CA 19-Jun-201536
  • 37. }  Database architecture rules are more subtle and complex now. }  It is too simplistic to think If it's not in at least 3NF – it's wrong. }  Know your business needs. }  Know your data. }  Know the strengths and limitations of the relational model plus NoSQL. Seek to thrive in a world of Polyglot Persistence. Summary (3) PGCon 2015 Ottawa, ON CA 19-Jun-201537
  • 39. LIFE. LIBERTY. TECHNOLOGY. Freedom Consulting Group is a talented, hard-working, and committed partner, providing hardware, software and database development and integration services to a diverse set of clients. “ ”
  • 40. POSTGRES innovation ENTERPRISE reliability 24/7 support Services & training Enterprise-class features, tools & compatibility Indemnification Product road-map Control Thousands of developers Fast development cycles Low cost No vendor lock-in Advanced features Enabling commercial adoption of Postgres