SlideShare a Scribd company logo
Building a Hybrid Data
Cluster with MongoDB
and Postgres
A solution based on PostgreSQL’s Foreign Data Wrapper
27 April 2015
Context and
Customer scenario
Customer Requirements for Hybrid Cluster
- More and more unstructured data being generated
- Increasing use and requirements of noSQL databases –
because of
- usage scenario
- ability to scale horizontally
- Challenges
- A lot of Admin and Developer still prefer SQL as easy and
intutive tool to query information out of available data
- Not many noSQL databases support complex queries as SQL
does e.g. JOINs, Sub-query etc
3
Real Life Use Cases
- noSQL as Archive store of RDBMS
- RDBMS being used to store the operational and transactional data
- while noSQL may act as an archive store for historical data
- noSQL for receiving write stream
- noSQL databases being used to accumulate data from various sources
with high write throughput across multiple shards
- while RDBMS is used to store the filtered data after it has been
transformed into proper structures
- RDBMS makes it easier for the users to query data using SQLs and
JOINs
4
Hybrid Data Cluster is
the ‘need of hour’
- Most Advanced Open Source Database
- Supports Relational model of storing database
- Supports ACID features of Transactions
- Multi Version Concurrency Control
- Write Ahead WAL files
- Scalability with Tablespaces and Partitions/child tables
- Supports unstructured data-types (JSON, JSONB, HSTORE)
and full text search features
PostgreSQL
6
- Most popular noSQL Database for vast set of workloads
- Best for storing un-structured data
- Horizontal Scalability with sharding capability
- Provision for secondary indexes
- Aggregation and Map-reduce features
MongoDB
7
- Get the best out of both the worlds
- Based on SQL/MED – Management of External Data
- Allows you to create FOREIGN TABLES which maps to
external entities
- These entities could be
- Table in RDBMS
- collection in MongoDB
- Or can be mapped respective entities in HDFS or File System
- More about FDW in Postgres:
https://wiki.postgresql.org/wiki/Foreign_data_wrappers
Foreign Data Wrappers of PostgreSQL
8
FDW for MongoDB
- Started by CitusDB and then forked by EnterpriseDB
- More details - https://github.com/EnterpriseDB/mongo_fdw
- The example we will discuss here is based on a Blogpost
from EnterpriseDB -
http://www.enterprisedb.com/postgres-plus-edb-
blog/jason-davis/tales-trenches-new-mongodb-fdw
- Let’s go through the Demo
MongoDB FDW
10
Preparing the MongoDB
- Platform: Windows 7
- Create the directories that you will need
- cd d:mongodb
- mkdir a0
- mkdir b0
- mkdir c0
- mkdir c1
- mkdir c2
- mkdir d0
- mkdir d1
- mkdir d2
- mkdir cfg0
- mkdir cfg1
- mkdir cfg2
Prepare for a MongoDB Cluster
12
mongod --configsvr --dbpath d:mongodbcfg0 --port 26050 --install --logpath
d:mongodbcfg0.log --serviceName new_mongod_cfg0 --serviceDisplayName
new_mongod_cfg0
net start new_mongod_cfg0
mongod --configsvr --dbpath d:mongodbcfg1 --port 26051 --install --logpath
d:mongodbcfg1.log --serviceName new_mongod_cfg1 --serviceDisplayName
new_mongod_cfg1
net start new_mongod_cfg1
mongod --configsvr --dbpath d:mongodbcfg2 --port 26052 --install --logpath
d:mongodbcfg2.log --serviceName new_mongod_cfg2 --serviceDisplayName
new_mongod_cfg2
net start new_mongod_cfg2
Create the services for MongoDB Cluster: Config
Server
13
mongod --shardsvr --replSet a --dbpath d:mongodba0 --logpath d:mongodba0.log --
port 27000 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_a0 --
serviceDisplayName new_mongod_shrd_a0
net start new_mongod_shrd_a0
mongod --shardsvr --replSet b --dbpath d:mongodbb0 --logpath d:mongodbb0.log --
port 27100 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_b0 --
serviceDisplayName new_mongod_shrd_b0
net start new_mongod_shrd_b0
mongod --shardsvr --replSet c --dbpath d:mongodbc0 --logpath d:mongodbc0.log --
port 27200 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_c0 --
serviceDisplayName new_mongod_shrd_c0
net start new_mongod_shrd_c0
Create the services for MongoDB Cluster: Create
Shards
14
- Though here for simplicity we have skipped the creation of
replica set you can do that
- e.g.
- mkdir a1
- mongod --shardsvr --replSet a --dbpath d:mongodba1 --logpath
d:mongodba0.log --port 27001 --smallfiles --oplogSize 50 --install
--serviceName new_mongod_shrd_a1 --serviceDisplayName
new_mongod_shrd_a1
- net start new_mongod_shrd_a1
Create the services for MongoDB Cluster:
Optionally Create the Replicas
15
- mongos --configdb
sameer:26050,sameer:26051,sameer:26052 --install --
serviceName new_mongos_svc0 --serviceDisplayName
new_mongos_svc0 --logpath d:mongodbmongos0.log --port
26060
- net start new_mongos_svc0
Initiate the Mongos
16
- I am going to initiate 1 member replica set for all my shards
Initiate the Replica Set
17
- Shard A
mongo --port 27000
> rs.initiate()
a:OTHER> rs.conf()
a:PRIMARY> exit
- Shard B
mongo --port 27100
> rs.initiate()
b:OTHER> rs.conf()
b:PRIMARY> exit
- Shard C
mongo --port 27200
> rs.initiate()
c:OTHER> rs.conf()
c:PRIMARY> exit
mongo --port 26060 test
mongos> sh.addShard("sameer:27100")
mongos> sh.addShard("sameer:27200")
mongos> sh.addShard("sameer:27000")
mongos> sh.enableSharding("db")
mongos>
sh.shardCollection("db.warehouse",{warehouse_created:1},true)
Setup Sharding
18
mongos> use db
mongos> db.createUser(
... {
... user: "superuser",
... pwd: "password",
... roles: [ { role: "root", db: "admin" } ]
... }
... )
Setup Users and Security
19
Creating FDW Extension
in Postgres
- Download MongoDB FDW from Github
- Installation is quite easy when you use autogen.sh
- Cd $PATH_WHERE_FDW_IS_EXTRACTED
- ./autogen.sh
- It will automatically install all the required components
- libbson
- libmongoc
- Once installation is done then you can make and install
- make -f Makefile.meta && make -f Makefile.meta install
Build MongoDB FDW
21
- Allows you to build with Legacy Driver or Master Branch
- Has read and write capability for the foreign table
- Connection Pooling which uses the same MongoDB
connection for queries in same session
- Build with MongoDB's legacy branch driver
- autogen.sh --with-legacy
- Build MongoDB's master branch driver
- autogen.sh --with-master
Features of mongo_fdw
22
- Create Extension for mongo_fdw in PostgreSQL database
- You may create the table in template database
- Create a Foreign Data Server
- Create a user mapping a MongoDB user in Postgres
- Create Foreign Table which maps to a MongoDB Collection
Using mongo_fdw
23
- psql=# CREATE EXTENSION mongo_fdw;
- psql=# CREATE SERVER mongo_server
FOREIGN DATA WRAPPER mongo_fdw
OPTIONS (address '192.168.160.1', port '26060');
- psql=# CREATE USER MAPPING FOR postgres
SERVER mongo_server
OPTIONS (username 'superuser',
password 'password');
Create Foreign Server: Example
24
- psql=# CREATE FOREIGN TABLE warehouse(
_id NAME,
warehouse_id int,
warehouse_name text,
warehouse_created timestamptz)
SERVER mongo_server
OPTIONS (database 'db', collection 'warehouse');
Create Foreign Table: Example
25
- It stores a unique Object ID
- By default if you skip this column MongoDB will insert a 12
Byte BSON Object ID
- While inserting data into MongoDB you may choose the
value of this field
- In mongo_fdw you have to define _id column with its data
type as “NAME”
- mongo_fdw will ignore the value inserted in _id column and
let MongoDB
‘_id’ column of MongoDB
26
- INSERT INTO warehouse values (0, 1, 'UPS', '2014-12-
12T07:12:10Z');
- INSERT INTO warehouse values (0, 2, 'EMS', '2013-12-
12T07:12:10Z');
- INSERT INTO warehouse values (0, 3, 'ASX', '2013-11-
12T07:12:10Z');
- UPDATE warehouse set warehouse_name = 'UPS_NEW'
where warehouse_id = 1;
DML on Foreign Tables
27
- Connect to MongoDB
- mongo --port 26060 --username superuser --password password
- Check the data in collection
- db.warehouse.find()
Operations on MongoDB
28
- You can run analyze on the foreign Table to collect statistics
- You can fire queries with “where” clause
- You may fire JOIN queries with other FOREIGN TABLE or
NATIVE PostgreSQL Tables
Operations in Postgres on Foreign Data
29
Live walkthrough of the
Hybrid Cluster
Leverage upon complex SQLs with Sharded MongoDB
Benefits of this Setup
- Build a sharded MongoDB cluster with SQL Interface
- Query MongoDB data using SQL
- Join MongoDB collections with each other or with tables in
Postgres
- Combine and process MongoDB data with data from other data
source with help of respective FDW e.g. Hadoop, Oracle, MySQL
etc
- Add more shards on the go
- Add Replica for MongoDB on the go
- Use Postgres as front end to insert/update/delete data in
MongoDB using SQL
31
Send us your suggestions and questions
success@ashnik.com
Stay Tuned!
Website: www.ashnik.com

More Related Content

What's hot

Neo4j - 5 cool graph examples
Neo4j - 5 cool graph examplesNeo4j - 5 cool graph examples
Neo4j - 5 cool graph examples
Peter Neubauer
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
DataWorks Summit
 
Intrusive data structure 소개
Intrusive data structure 소개Intrusive data structure 소개
Intrusive data structure 소개
종빈 오
 

What's hot (20)

Airbnb 리뷰데이터 분석을 통한 좋은 숙소의 조건 분석
Airbnb 리뷰데이터 분석을 통한 좋은 숙소의 조건 분석Airbnb 리뷰데이터 분석을 통한 좋은 숙소의 조건 분석
Airbnb 리뷰데이터 분석을 통한 좋은 숙소의 조건 분석
 
Business Intelligent & Data science roadmap part 1
Business Intelligent & Data science roadmap part 1Business Intelligent & Data science roadmap part 1
Business Intelligent & Data science roadmap part 1
 
MongoDB World 2015 - A Technical Introduction to WiredTiger
MongoDB World 2015 - A Technical Introduction to WiredTigerMongoDB World 2015 - A Technical Introduction to WiredTiger
MongoDB World 2015 - A Technical Introduction to WiredTiger
 
Optimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsOptimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDs
 
Neo4j - 5 cool graph examples
Neo4j - 5 cool graph examplesNeo4j - 5 cool graph examples
Neo4j - 5 cool graph examples
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @Twitter
 
CNIT 129S - Ch 3: Web Application Technologies
CNIT 129S - Ch 3: Web Application TechnologiesCNIT 129S - Ch 3: Web Application Technologies
CNIT 129S - Ch 3: Web Application Technologies
 
Connection Pooling in PostgreSQL using pgbouncer
Connection Pooling in PostgreSQL using pgbouncer Connection Pooling in PostgreSQL using pgbouncer
Connection Pooling in PostgreSQL using pgbouncer
 
Node.js API 서버 성능 개선기
Node.js API 서버 성능 개선기Node.js API 서버 성능 개선기
Node.js API 서버 성능 개선기
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)
 
Mongo db 최범균
Mongo db 최범균Mongo db 최범균
Mongo db 최범균
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDB
 
Intrusive data structure 소개
Intrusive data structure 소개Intrusive data structure 소개
Intrusive data structure 소개
 
Mongo DB 성능최적화 전략
Mongo DB 성능최적화 전략Mongo DB 성능최적화 전략
Mongo DB 성능최적화 전략
 
CNIT 129S: Securing Web Applications Ch 1-2
CNIT 129S: Securing Web Applications Ch 1-2CNIT 129S: Securing Web Applications Ch 1-2
CNIT 129S: Securing Web Applications Ch 1-2
 
Redis data modeling examples
Redis data modeling examplesRedis data modeling examples
Redis data modeling examples
 
MariaDB Performance Tuning Crash Course
MariaDB Performance Tuning Crash CourseMariaDB Performance Tuning Crash Course
MariaDB Performance Tuning Crash Course
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
 
차곡차곡 쉽게 알아가는 Elasticsearch와 Node.js
차곡차곡 쉽게 알아가는 Elasticsearch와 Node.js차곡차곡 쉽게 알아가는 Elasticsearch와 Node.js
차곡차곡 쉽게 알아가는 Elasticsearch와 Node.js
 

Viewers also liked

plProxy, pgBouncer, pgBalancer
plProxy, pgBouncer, pgBalancerplProxy, pgBouncer, pgBalancer
plProxy, pgBouncer, pgBalancer
elliando dias
 

Viewers also liked (20)

Scaling postgres
Scaling postgresScaling postgres
Scaling postgres
 
PostgreSQL Scaling And Failover
PostgreSQL Scaling And FailoverPostgreSQL Scaling And Failover
PostgreSQL Scaling And Failover
 
Tuning Slow Running SQLs in PostgreSQL
Tuning Slow Running SQLs in PostgreSQLTuning Slow Running SQLs in PostgreSQL
Tuning Slow Running SQLs in PostgreSQL
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applicationsFOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
 
NGINX Plus PLATFORM For Flawless Application Delivery
NGINX Plus PLATFORM For Flawless Application DeliveryNGINX Plus PLATFORM For Flawless Application Delivery
NGINX Plus PLATFORM For Flawless Application Delivery
 
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
 
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
 
Streaming replication in PostgreSQL
Streaming replication in PostgreSQLStreaming replication in PostgreSQL
Streaming replication in PostgreSQL
 
PostgreSQL Hangout Parameter Tuning
PostgreSQL Hangout Parameter TuningPostgreSQL Hangout Parameter Tuning
PostgreSQL Hangout Parameter Tuning
 
X-DB Replication Server and MMR
X-DB Replication Server and MMRX-DB Replication Server and MMR
X-DB Replication Server and MMR
 
PostgreSQL Hangout Replication Features v9.4
PostgreSQL Hangout Replication Features v9.4PostgreSQL Hangout Replication Features v9.4
PostgreSQL Hangout Replication Features v9.4
 
2016 may-countdown-to-postgres-v96-parallel-query
2016 may-countdown-to-postgres-v96-parallel-query2016 may-countdown-to-postgres-v96-parallel-query
2016 may-countdown-to-postgres-v96-parallel-query
 
Building Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using PentahoBuilding Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using Pentaho
 
Architecture for building scalable and highly available Postgres Cluster
Architecture for building scalable and highly available Postgres ClusterArchitecture for building scalable and highly available Postgres Cluster
Architecture for building scalable and highly available Postgres Cluster
 
The Magic of Tuning in PostgreSQL
The Magic of Tuning in PostgreSQLThe Magic of Tuning in PostgreSQL
The Magic of Tuning in PostgreSQL
 
PgDay Asia 2016 - Security Best Practices for your Postgres Deployment
PgDay Asia 2016 - Security Best Practices for your Postgres DeploymentPgDay Asia 2016 - Security Best Practices for your Postgres Deployment
PgDay Asia 2016 - Security Best Practices for your Postgres Deployment
 
PG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU devicePG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU device
 
plProxy, pgBouncer, pgBalancer
plProxy, pgBouncer, pgBalancerplProxy, pgBouncer, pgBalancer
plProxy, pgBouncer, pgBalancer
 
Demystifying PostgreSQL
Demystifying PostgreSQLDemystifying PostgreSQL
Demystifying PostgreSQL
 

Similar to Building Hybrid data cluster using PostgreSQL and MongoDB

171_74_216_Module_5-Non_relational_database_-mongodb.pptx
171_74_216_Module_5-Non_relational_database_-mongodb.pptx171_74_216_Module_5-Non_relational_database_-mongodb.pptx
171_74_216_Module_5-Non_relational_database_-mongodb.pptx
sukrithlal008
 
Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL Polybase
Henk van der Valk
 
introtomongodb
introtomongodbintrotomongodb
introtomongodb
saikiran
 
Nko workshop - node js & nosql
Nko workshop - node js & nosqlNko workshop - node js & nosql
Nko workshop - node js & nosql
Simon Su
 

Similar to Building Hybrid data cluster using PostgreSQL and MongoDB (20)

MongoDB and DynamoDB
MongoDB and DynamoDBMongoDB and DynamoDB
MongoDB and DynamoDB
 
171_74_216_Module_5-Non_relational_database_-mongodb.pptx
171_74_216_Module_5-Non_relational_database_-mongodb.pptx171_74_216_Module_5-Non_relational_database_-mongodb.pptx
171_74_216_Module_5-Non_relational_database_-mongodb.pptx
 
11 schema design & crud
11 schema design & crud11 schema design & crud
11 schema design & crud
 
MongoDB – Sharded cluster tutorial - Percona Europe 2017
MongoDB – Sharded cluster tutorial - Percona Europe 2017MongoDB – Sharded cluster tutorial - Percona Europe 2017
MongoDB – Sharded cluster tutorial - Percona Europe 2017
 
Sharded cluster tutorial
Sharded cluster tutorialSharded cluster tutorial
Sharded cluster tutorial
 
MongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialMongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster Tutorial
 
Setting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSetting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutes
 
Percona Live 2017 ­- Sharded cluster tutorial
Percona Live 2017 ­- Sharded cluster tutorialPercona Live 2017 ­- Sharded cluster tutorial
Percona Live 2017 ­- Sharded cluster tutorial
 
Mongo db basics
Mongo db basicsMongo db basics
Mongo db basics
 
Mongodb workshop
Mongodb workshopMongodb workshop
Mongodb workshop
 
Mongo-Drupal
Mongo-DrupalMongo-Drupal
Mongo-Drupal
 
Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL Polybase
 
Introduction to MongoDB with PHP
Introduction to MongoDB with PHPIntroduction to MongoDB with PHP
Introduction to MongoDB with PHP
 
MongoDB
MongoDBMongoDB
MongoDB
 
MongoDB basics & Introduction
MongoDB basics & IntroductionMongoDB basics & Introduction
MongoDB basics & Introduction
 
introtomongodb
introtomongodbintrotomongodb
introtomongodb
 
This upload requires better support for ODP format
This upload requires better support for ODP formatThis upload requires better support for ODP format
This upload requires better support for ODP format
 
Mongodb By Vipin
Mongodb By VipinMongodb By Vipin
Mongodb By Vipin
 
Nko workshop - node js & nosql
Nko workshop - node js & nosqlNko workshop - node js & nosql
Nko workshop - node js & nosql
 
Python and MongoDB
Python and MongoDB Python and MongoDB
Python and MongoDB
 

More from Ashnikbiz

More from Ashnikbiz (20)

CloudOps_tool.pptx
CloudOps_tool.pptxCloudOps_tool.pptx
CloudOps_tool.pptx
 
Webinar_CloudOps final.pptx
Webinar_CloudOps final.pptxWebinar_CloudOps final.pptx
Webinar_CloudOps final.pptx
 
Autoscaling in Kubernetes (K8s)
Autoscaling in Kubernetes (K8s)Autoscaling in Kubernetes (K8s)
Autoscaling in Kubernetes (K8s)
 
Why and how to use Kubernetes for scaling of your multi-tier (n-tier) appli...
Why and how to use Kubernetes  for scaling of your  multi-tier (n-tier) appli...Why and how to use Kubernetes  for scaling of your  multi-tier (n-tier) appli...
Why and how to use Kubernetes for scaling of your multi-tier (n-tier) appli...
 
Zero trust in a multi tenant environment
Zero trust in a multi tenant environment  Zero trust in a multi tenant environment
Zero trust in a multi tenant environment
 
Deploy and automate ‘Secrets Management’ for a multi-cloud environment
Deploy and automate ‘Secrets Management’ for a multi-cloud environmentDeploy and automate ‘Secrets Management’ for a multi-cloud environment
Deploy and automate ‘Secrets Management’ for a multi-cloud environment
 
Deploy, move and manage Postgres across cloud platforms
Deploy, move and manage Postgres across cloud platformsDeploy, move and manage Postgres across cloud platforms
Deploy, move and manage Postgres across cloud platforms
 
Deploy, move and manage Postgres across cloud platforms
Deploy, move and manage Postgres across cloud platformsDeploy, move and manage Postgres across cloud platforms
Deploy, move and manage Postgres across cloud platforms
 
The Best Approach For Multi-cloud Infrastructure Provisioning-2
The Best Approach For Multi-cloud Infrastructure Provisioning-2The Best Approach For Multi-cloud Infrastructure Provisioning-2
The Best Approach For Multi-cloud Infrastructure Provisioning-2
 
The Best Approach For Multi-cloud Infrastructure Provisioning
The Best Approach For Multi-cloud Infrastructure ProvisioningThe Best Approach For Multi-cloud Infrastructure Provisioning
The Best Approach For Multi-cloud Infrastructure Provisioning
 
Which PostgreSQL is right for your multi cloud strategy? P2
Which PostgreSQL is right for your multi cloud strategy? P2Which PostgreSQL is right for your multi cloud strategy? P2
Which PostgreSQL is right for your multi cloud strategy? P2
 
Which PostgreSQL is right for your multi cloud strategy? P1
Which PostgreSQL is right for your multi cloud strategy? P1Which PostgreSQL is right for your multi cloud strategy? P1
Which PostgreSQL is right for your multi cloud strategy? P1
 
Reduce the complexities of managing Kubernetes clusters anywhere 2
Reduce the complexities of managing Kubernetes clusters anywhere 2Reduce the complexities of managing Kubernetes clusters anywhere 2
Reduce the complexities of managing Kubernetes clusters anywhere 2
 
Reduce the complexities of managing Kubernetes clusters anywhere
Reduce the complexities of managing Kubernetes clusters anywhereReduce the complexities of managing Kubernetes clusters anywhere
Reduce the complexities of managing Kubernetes clusters anywhere
 
Enhance your multi-cloud application performance using Redis Enterprise P2
Enhance your multi-cloud application performance using Redis Enterprise P2Enhance your multi-cloud application performance using Redis Enterprise P2
Enhance your multi-cloud application performance using Redis Enterprise P2
 
Enhance your multi-cloud application performance using Redis Enterprise P1
Enhance your multi-cloud application performance using Redis Enterprise P1Enhance your multi-cloud application performance using Redis Enterprise P1
Enhance your multi-cloud application performance using Redis Enterprise P1
 
Gain multi-cloud versatility with software load balancing designed for cloud-...
Gain multi-cloud versatility with software load balancing designed for cloud-...Gain multi-cloud versatility with software load balancing designed for cloud-...
Gain multi-cloud versatility with software load balancing designed for cloud-...
 
Gain multi-cloud versatility with software load balancing designed for cloud-...
Gain multi-cloud versatility with software load balancing designed for cloud-...Gain multi-cloud versatility with software load balancing designed for cloud-...
Gain multi-cloud versatility with software load balancing designed for cloud-...
 
Enterprise-class security with PostgreSQL - 1
Enterprise-class security with PostgreSQL - 1Enterprise-class security with PostgreSQL - 1
Enterprise-class security with PostgreSQL - 1
 
Enterprise-class security with PostgreSQL - 2
Enterprise-class security with PostgreSQL - 2Enterprise-class security with PostgreSQL - 2
Enterprise-class security with PostgreSQL - 2
 

Recently uploaded

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 

Recently uploaded (20)

AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
GraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisGraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysis
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting software
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 

Building Hybrid data cluster using PostgreSQL and MongoDB

  • 1. Building a Hybrid Data Cluster with MongoDB and Postgres A solution based on PostgreSQL’s Foreign Data Wrapper 27 April 2015
  • 3. Customer Requirements for Hybrid Cluster - More and more unstructured data being generated - Increasing use and requirements of noSQL databases – because of - usage scenario - ability to scale horizontally - Challenges - A lot of Admin and Developer still prefer SQL as easy and intutive tool to query information out of available data - Not many noSQL databases support complex queries as SQL does e.g. JOINs, Sub-query etc 3
  • 4. Real Life Use Cases - noSQL as Archive store of RDBMS - RDBMS being used to store the operational and transactional data - while noSQL may act as an archive store for historical data - noSQL for receiving write stream - noSQL databases being used to accumulate data from various sources with high write throughput across multiple shards - while RDBMS is used to store the filtered data after it has been transformed into proper structures - RDBMS makes it easier for the users to query data using SQLs and JOINs 4
  • 5. Hybrid Data Cluster is the ‘need of hour’
  • 6. - Most Advanced Open Source Database - Supports Relational model of storing database - Supports ACID features of Transactions - Multi Version Concurrency Control - Write Ahead WAL files - Scalability with Tablespaces and Partitions/child tables - Supports unstructured data-types (JSON, JSONB, HSTORE) and full text search features PostgreSQL 6
  • 7. - Most popular noSQL Database for vast set of workloads - Best for storing un-structured data - Horizontal Scalability with sharding capability - Provision for secondary indexes - Aggregation and Map-reduce features MongoDB 7
  • 8. - Get the best out of both the worlds - Based on SQL/MED – Management of External Data - Allows you to create FOREIGN TABLES which maps to external entities - These entities could be - Table in RDBMS - collection in MongoDB - Or can be mapped respective entities in HDFS or File System - More about FDW in Postgres: https://wiki.postgresql.org/wiki/Foreign_data_wrappers Foreign Data Wrappers of PostgreSQL 8
  • 10. - Started by CitusDB and then forked by EnterpriseDB - More details - https://github.com/EnterpriseDB/mongo_fdw - The example we will discuss here is based on a Blogpost from EnterpriseDB - http://www.enterprisedb.com/postgres-plus-edb- blog/jason-davis/tales-trenches-new-mongodb-fdw - Let’s go through the Demo MongoDB FDW 10
  • 12. - Platform: Windows 7 - Create the directories that you will need - cd d:mongodb - mkdir a0 - mkdir b0 - mkdir c0 - mkdir c1 - mkdir c2 - mkdir d0 - mkdir d1 - mkdir d2 - mkdir cfg0 - mkdir cfg1 - mkdir cfg2 Prepare for a MongoDB Cluster 12
  • 13. mongod --configsvr --dbpath d:mongodbcfg0 --port 26050 --install --logpath d:mongodbcfg0.log --serviceName new_mongod_cfg0 --serviceDisplayName new_mongod_cfg0 net start new_mongod_cfg0 mongod --configsvr --dbpath d:mongodbcfg1 --port 26051 --install --logpath d:mongodbcfg1.log --serviceName new_mongod_cfg1 --serviceDisplayName new_mongod_cfg1 net start new_mongod_cfg1 mongod --configsvr --dbpath d:mongodbcfg2 --port 26052 --install --logpath d:mongodbcfg2.log --serviceName new_mongod_cfg2 --serviceDisplayName new_mongod_cfg2 net start new_mongod_cfg2 Create the services for MongoDB Cluster: Config Server 13
  • 14. mongod --shardsvr --replSet a --dbpath d:mongodba0 --logpath d:mongodba0.log -- port 27000 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_a0 -- serviceDisplayName new_mongod_shrd_a0 net start new_mongod_shrd_a0 mongod --shardsvr --replSet b --dbpath d:mongodbb0 --logpath d:mongodbb0.log -- port 27100 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_b0 -- serviceDisplayName new_mongod_shrd_b0 net start new_mongod_shrd_b0 mongod --shardsvr --replSet c --dbpath d:mongodbc0 --logpath d:mongodbc0.log -- port 27200 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_c0 -- serviceDisplayName new_mongod_shrd_c0 net start new_mongod_shrd_c0 Create the services for MongoDB Cluster: Create Shards 14
  • 15. - Though here for simplicity we have skipped the creation of replica set you can do that - e.g. - mkdir a1 - mongod --shardsvr --replSet a --dbpath d:mongodba1 --logpath d:mongodba0.log --port 27001 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_a1 --serviceDisplayName new_mongod_shrd_a1 - net start new_mongod_shrd_a1 Create the services for MongoDB Cluster: Optionally Create the Replicas 15
  • 16. - mongos --configdb sameer:26050,sameer:26051,sameer:26052 --install -- serviceName new_mongos_svc0 --serviceDisplayName new_mongos_svc0 --logpath d:mongodbmongos0.log --port 26060 - net start new_mongos_svc0 Initiate the Mongos 16
  • 17. - I am going to initiate 1 member replica set for all my shards Initiate the Replica Set 17 - Shard A mongo --port 27000 > rs.initiate() a:OTHER> rs.conf() a:PRIMARY> exit - Shard B mongo --port 27100 > rs.initiate() b:OTHER> rs.conf() b:PRIMARY> exit - Shard C mongo --port 27200 > rs.initiate() c:OTHER> rs.conf() c:PRIMARY> exit
  • 18. mongo --port 26060 test mongos> sh.addShard("sameer:27100") mongos> sh.addShard("sameer:27200") mongos> sh.addShard("sameer:27000") mongos> sh.enableSharding("db") mongos> sh.shardCollection("db.warehouse",{warehouse_created:1},true) Setup Sharding 18
  • 19. mongos> use db mongos> db.createUser( ... { ... user: "superuser", ... pwd: "password", ... roles: [ { role: "root", db: "admin" } ] ... } ... ) Setup Users and Security 19
  • 21. - Download MongoDB FDW from Github - Installation is quite easy when you use autogen.sh - Cd $PATH_WHERE_FDW_IS_EXTRACTED - ./autogen.sh - It will automatically install all the required components - libbson - libmongoc - Once installation is done then you can make and install - make -f Makefile.meta && make -f Makefile.meta install Build MongoDB FDW 21
  • 22. - Allows you to build with Legacy Driver or Master Branch - Has read and write capability for the foreign table - Connection Pooling which uses the same MongoDB connection for queries in same session - Build with MongoDB's legacy branch driver - autogen.sh --with-legacy - Build MongoDB's master branch driver - autogen.sh --with-master Features of mongo_fdw 22
  • 23. - Create Extension for mongo_fdw in PostgreSQL database - You may create the table in template database - Create a Foreign Data Server - Create a user mapping a MongoDB user in Postgres - Create Foreign Table which maps to a MongoDB Collection Using mongo_fdw 23
  • 24. - psql=# CREATE EXTENSION mongo_fdw; - psql=# CREATE SERVER mongo_server FOREIGN DATA WRAPPER mongo_fdw OPTIONS (address '192.168.160.1', port '26060'); - psql=# CREATE USER MAPPING FOR postgres SERVER mongo_server OPTIONS (username 'superuser', password 'password'); Create Foreign Server: Example 24
  • 25. - psql=# CREATE FOREIGN TABLE warehouse( _id NAME, warehouse_id int, warehouse_name text, warehouse_created timestamptz) SERVER mongo_server OPTIONS (database 'db', collection 'warehouse'); Create Foreign Table: Example 25
  • 26. - It stores a unique Object ID - By default if you skip this column MongoDB will insert a 12 Byte BSON Object ID - While inserting data into MongoDB you may choose the value of this field - In mongo_fdw you have to define _id column with its data type as “NAME” - mongo_fdw will ignore the value inserted in _id column and let MongoDB ‘_id’ column of MongoDB 26
  • 27. - INSERT INTO warehouse values (0, 1, 'UPS', '2014-12- 12T07:12:10Z'); - INSERT INTO warehouse values (0, 2, 'EMS', '2013-12- 12T07:12:10Z'); - INSERT INTO warehouse values (0, 3, 'ASX', '2013-11- 12T07:12:10Z'); - UPDATE warehouse set warehouse_name = 'UPS_NEW' where warehouse_id = 1; DML on Foreign Tables 27
  • 28. - Connect to MongoDB - mongo --port 26060 --username superuser --password password - Check the data in collection - db.warehouse.find() Operations on MongoDB 28
  • 29. - You can run analyze on the foreign Table to collect statistics - You can fire queries with “where” clause - You may fire JOIN queries with other FOREIGN TABLE or NATIVE PostgreSQL Tables Operations in Postgres on Foreign Data 29
  • 30. Live walkthrough of the Hybrid Cluster Leverage upon complex SQLs with Sharded MongoDB
  • 31. Benefits of this Setup - Build a sharded MongoDB cluster with SQL Interface - Query MongoDB data using SQL - Join MongoDB collections with each other or with tables in Postgres - Combine and process MongoDB data with data from other data source with help of respective FDW e.g. Hadoop, Oracle, MySQL etc - Add more shards on the go - Add Replica for MongoDB on the go - Use Postgres as front end to insert/update/delete data in MongoDB using SQL 31
  • 32. Send us your suggestions and questions success@ashnik.com Stay Tuned! Website: www.ashnik.com