3. Customer Requirements for Hybrid Cluster
- More and more unstructured data being generated
- Increasing use and requirements of noSQL databases –
because of
- usage scenario
- ability to scale horizontally
- Challenges
- A lot of Admin and Developer still prefer SQL as easy and
intutive tool to query information out of available data
- Not many noSQL databases support complex queries as SQL
does e.g. JOINs, Sub-query etc
3
4. Real Life Use Cases
- noSQL as Archive store of RDBMS
- RDBMS being used to store the operational and transactional data
- while noSQL may act as an archive store for historical data
- noSQL for receiving write stream
- noSQL databases being used to accumulate data from various sources
with high write throughput across multiple shards
- while RDBMS is used to store the filtered data after it has been
transformed into proper structures
- RDBMS makes it easier for the users to query data using SQLs and
JOINs
4
6. - Most Advanced Open Source Database
- Supports Relational model of storing database
- Supports ACID features of Transactions
- Multi Version Concurrency Control
- Write Ahead WAL files
- Scalability with Tablespaces and Partitions/child tables
- Supports unstructured data-types (JSON, JSONB, HSTORE)
and full text search features
PostgreSQL
6
7. - Most popular noSQL Database for vast set of workloads
- Best for storing un-structured data
- Horizontal Scalability with sharding capability
- Provision for secondary indexes
- Aggregation and Map-reduce features
MongoDB
7
8. - Get the best out of both the worlds
- Based on SQL/MED – Management of External Data
- Allows you to create FOREIGN TABLES which maps to
external entities
- These entities could be
- Table in RDBMS
- collection in MongoDB
- Or can be mapped respective entities in HDFS or File System
- More about FDW in Postgres:
https://wiki.postgresql.org/wiki/Foreign_data_wrappers
Foreign Data Wrappers of PostgreSQL
8
10. - Started by CitusDB and then forked by EnterpriseDB
- More details - https://github.com/EnterpriseDB/mongo_fdw
- The example we will discuss here is based on a Blogpost
from EnterpriseDB -
http://www.enterprisedb.com/postgres-plus-edb-
blog/jason-davis/tales-trenches-new-mongodb-fdw
- Let’s go through the Demo
MongoDB FDW
10
12. - Platform: Windows 7
- Create the directories that you will need
- cd d:mongodb
- mkdir a0
- mkdir b0
- mkdir c0
- mkdir c1
- mkdir c2
- mkdir d0
- mkdir d1
- mkdir d2
- mkdir cfg0
- mkdir cfg1
- mkdir cfg2
Prepare for a MongoDB Cluster
12
13. mongod --configsvr --dbpath d:mongodbcfg0 --port 26050 --install --logpath
d:mongodbcfg0.log --serviceName new_mongod_cfg0 --serviceDisplayName
new_mongod_cfg0
net start new_mongod_cfg0
mongod --configsvr --dbpath d:mongodbcfg1 --port 26051 --install --logpath
d:mongodbcfg1.log --serviceName new_mongod_cfg1 --serviceDisplayName
new_mongod_cfg1
net start new_mongod_cfg1
mongod --configsvr --dbpath d:mongodbcfg2 --port 26052 --install --logpath
d:mongodbcfg2.log --serviceName new_mongod_cfg2 --serviceDisplayName
new_mongod_cfg2
net start new_mongod_cfg2
Create the services for MongoDB Cluster: Config
Server
13
14. mongod --shardsvr --replSet a --dbpath d:mongodba0 --logpath d:mongodba0.log --
port 27000 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_a0 --
serviceDisplayName new_mongod_shrd_a0
net start new_mongod_shrd_a0
mongod --shardsvr --replSet b --dbpath d:mongodbb0 --logpath d:mongodbb0.log --
port 27100 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_b0 --
serviceDisplayName new_mongod_shrd_b0
net start new_mongod_shrd_b0
mongod --shardsvr --replSet c --dbpath d:mongodbc0 --logpath d:mongodbc0.log --
port 27200 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_c0 --
serviceDisplayName new_mongod_shrd_c0
net start new_mongod_shrd_c0
Create the services for MongoDB Cluster: Create
Shards
14
15. - Though here for simplicity we have skipped the creation of
replica set you can do that
- e.g.
- mkdir a1
- mongod --shardsvr --replSet a --dbpath d:mongodba1 --logpath
d:mongodba0.log --port 27001 --smallfiles --oplogSize 50 --install
--serviceName new_mongod_shrd_a1 --serviceDisplayName
new_mongod_shrd_a1
- net start new_mongod_shrd_a1
Create the services for MongoDB Cluster:
Optionally Create the Replicas
15
17. - I am going to initiate 1 member replica set for all my shards
Initiate the Replica Set
17
- Shard A
mongo --port 27000
> rs.initiate()
a:OTHER> rs.conf()
a:PRIMARY> exit
- Shard B
mongo --port 27100
> rs.initiate()
b:OTHER> rs.conf()
b:PRIMARY> exit
- Shard C
mongo --port 27200
> rs.initiate()
c:OTHER> rs.conf()
c:PRIMARY> exit
21. - Download MongoDB FDW from Github
- Installation is quite easy when you use autogen.sh
- Cd $PATH_WHERE_FDW_IS_EXTRACTED
- ./autogen.sh
- It will automatically install all the required components
- libbson
- libmongoc
- Once installation is done then you can make and install
- make -f Makefile.meta && make -f Makefile.meta install
Build MongoDB FDW
21
22. - Allows you to build with Legacy Driver or Master Branch
- Has read and write capability for the foreign table
- Connection Pooling which uses the same MongoDB
connection for queries in same session
- Build with MongoDB's legacy branch driver
- autogen.sh --with-legacy
- Build MongoDB's master branch driver
- autogen.sh --with-master
Features of mongo_fdw
22
23. - Create Extension for mongo_fdw in PostgreSQL database
- You may create the table in template database
- Create a Foreign Data Server
- Create a user mapping a MongoDB user in Postgres
- Create Foreign Table which maps to a MongoDB Collection
Using mongo_fdw
23
24. - psql=# CREATE EXTENSION mongo_fdw;
- psql=# CREATE SERVER mongo_server
FOREIGN DATA WRAPPER mongo_fdw
OPTIONS (address '192.168.160.1', port '26060');
- psql=# CREATE USER MAPPING FOR postgres
SERVER mongo_server
OPTIONS (username 'superuser',
password 'password');
Create Foreign Server: Example
24
26. - It stores a unique Object ID
- By default if you skip this column MongoDB will insert a 12
Byte BSON Object ID
- While inserting data into MongoDB you may choose the
value of this field
- In mongo_fdw you have to define _id column with its data
type as “NAME”
- mongo_fdw will ignore the value inserted in _id column and
let MongoDB
‘_id’ column of MongoDB
26
27. - INSERT INTO warehouse values (0, 1, 'UPS', '2014-12-
12T07:12:10Z');
- INSERT INTO warehouse values (0, 2, 'EMS', '2013-12-
12T07:12:10Z');
- INSERT INTO warehouse values (0, 3, 'ASX', '2013-11-
12T07:12:10Z');
- UPDATE warehouse set warehouse_name = 'UPS_NEW'
where warehouse_id = 1;
DML on Foreign Tables
27
28. - Connect to MongoDB
- mongo --port 26060 --username superuser --password password
- Check the data in collection
- db.warehouse.find()
Operations on MongoDB
28
29. - You can run analyze on the foreign Table to collect statistics
- You can fire queries with “where” clause
- You may fire JOIN queries with other FOREIGN TABLE or
NATIVE PostgreSQL Tables
Operations in Postgres on Foreign Data
29
30. Live walkthrough of the
Hybrid Cluster
Leverage upon complex SQLs with Sharded MongoDB
31. Benefits of this Setup
- Build a sharded MongoDB cluster with SQL Interface
- Query MongoDB data using SQL
- Join MongoDB collections with each other or with tables in
Postgres
- Combine and process MongoDB data with data from other data
source with help of respective FDW e.g. Hadoop, Oracle, MySQL
etc
- Add more shards on the go
- Add Replica for MongoDB on the go
- Use Postgres as front end to insert/update/delete data in
MongoDB using SQL
31
32. Send us your suggestions and questions
success@ashnik.com
Stay Tuned!
Website: www.ashnik.com