Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
MongoDB World 2018: Bumps and Breezes: Our Journey from RDBMS to MongoDB
1. Bumps and Breezes
Our Journey From RDBMS To MongoDB
Presenters: Ajit Oke and Harsha Undapalli
Organization:
2. Background of Pre-MongoDB(RDBMS) Environment
Why MongoDB?
Evolution of MongoDB Environment
Database – Bumps and Breezes
MongoDB Application Layer – Bumps and Breezes
Benchmarking Results
Next Steps and Help Needed
Agenda
Our Journey From RDBMS to MongoDB
3. Pre-MongoDB Environment
To ensure quality, every Intel product is electrically and
functionally tested thoroughly before it reaches
customer.
Currently, this multi TB test data is managed by
RDBMS based Decision Support System (DSS).
Intel CPU products, typically, have a Die attached to
Substrate using thousands of tiny balls. When DSS
team received a request to store ball level test data in
RDBMS, team started facing challenges in terms of
performance and storage.
After literature review and proof of concept(POC)
analysis, team decided to use MongoDB for Ball
level test data. Substrate
Die
Our Journey From RDBMS to MongoDB
5. Evolution of MongoDB Environment
In the last year and half, our MongoDB
environment is continuing to evolve with
the business needs.
This journey was not easy. We had a few
bumps and many breezes along the way.
Our Journey From RDBMS to MongoDB
7. Breezes
Small Install Footprint
MongoDB binaries have a small footprint.
Lean installation package of MongoDB
makes it easy to install on even virtual
servers, desktops or laptops.
In comparison to our current RDBMS
platform, which takes several gigabytes of
storage space, MongoDB is very lightweight
in terms of storage needs.
Our Journey From RDBMS to MongoDB
8. Breezes
Ease of creating Mongo databases and database objects
Compared to RDBMS, creating database and database objects is very easy in MongoDB.
INDEX creation is very intuitive and similar to RDBMS.
Config file in YAML format with sections.
Similarity with RDBMS but doesn’t have the overhead of RDBMS.
Our Journey From RDBMS to MongoDB
9. Breezes
Data Compression
Collection stats - db.getCollection(‘<collection-name>').stats()
Example – We are almost getting 3 times compression through wiredTiger which will be huge
on large volume databases.
WiredTiger data compression comes inbuilt in MongoDB.
Our Journey From RDBMS to MongoDB
10. Breezes
TTL index
Eliminates the need of a separate data purger utility.
https://docs.mongodb.com/manual/core/index-ttl/
Command to change the expiration value -
https://docs.mongodb.com/manual/reference/command/collMod/#dbcmd.collMod
Our Journey From RDBMS to MongoDB
11. Breezes
Sharding capability
As database grows in size and usage, MongoDB Sharding
capability provides excellent mechanism of horizontal
scaling using commodity hardware.
In RDBMS, vertical scalability is achieved by upgrading
server hardware and horizontal scalability is offered using
shared disk storage systems. Both these options are very
expensive compared to MongoDB sharding.
Creating Sharding environment was formidable task initially,
but after taking a few MongoDB University online courses,
and learning from experiences from MongoDB user
community, we could create a foolproof process for shard
creation.
Choosing hashed sharding key on high cardinality field
provided us excellent balancing while loading data as well
as storing data.
Scalability Technology Shift over the years
Our Journey From RDBMS to MongoDB
13. Bumps and Key Learnings
Usage of some config settings
bindIpAll parameter - changes in
MongoDB 3.6
keyFile parameter for authentication
cacheSize parameter for performance
Our Journey From RDBMS to MongoDB
14. Bumps and Key Learnings
Replica Set/Sharding with
authentication and key file creation
Creating a key file, was a challenge at that time
because we could not find proper documentation.
Experiences from some of the other users were
referring to third party tools to create hexadecimal
key etc.
At the end, we realized that we can easily create a
key file using text editor. Since our servers are well
secured inside a firewall and only authorized users
can access them, we are OK with key file having
readable text.
Our Journey From RDBMS to MongoDB
16. Software Stack in our MongoDB Application
Software components are developed in .NET C# platform
using MongoDB drivers –
CSV Importer
Custom File Loader
Standard Library
Standard library for reusable functionalities - Connect to
mongo client, Insert/Update/Delete/Find documents in
Mongo collection.
Loaders are highly configurable in terms of connection
options, database, collection, user credentials and number
of parallel instances to run.
Standard
Library
Custom
File
Loader
CSV
Importer
Our Journey From RDBMS to MongoDB
19. Breezes
Flexible schema structure – dynamic document creation with flexible fields in same collection.
Our Journey From RDBMS to MongoDB
20. Breezes
Support for parallel loading
As MongoDB supports document level concurrency, we were able to simultaneously insert several 1000s of documents
into the same collection using our parallel running loaders with configurable number of processes.
85
18
10.5
0
10
20
30
40
50
60
70
80
90
1 4 8
LOADTIMEINSECONDS
NUMBER OF LOADER INSTANCES
Load Performance
12.3
52.4
70
0
10
20
30
40
50
60
70
80
1 4 8
LOADRATE(GB/HOUR)
NUMBER OF LOADER INSTANCES
Load Rate(GB/Hour)
Our Journey From RDBMS to MongoDB
21. Breezes
MongoDB Documentation -
https://docs.mongodb.com/
MongoDB University -
https://university.mongodb.com/courses
Our Journey From RDBMS to MongoDB
23. Bumps and Key Learnings
Mongo Import options
Capture success/error message from import command Screen -
Define the fields and their datatypes in each document (Headerline vs No Headerline) -
Connect to a replica set -
mongoimport --host svr1:27017 --db MDB_CSV --collection MC_CSV –username readWriteUser --password readwrite$ --authenticationDatabase MDB_CSV --
type CSV --columnsHaveTypes --headerline --file E:RawDataLoadCSVtest_0034_WW13.CSV --numInsertionWorkers 4 --ignoreBlanks --parseGrace stop
mongoimport --host RS_NAME/svr1:27017, svr2:27017, svr3:27017 --db MDB_CSV --collection MC_CSV –username readWriteUser --password readwrite$ --
authenticationDatabase MDB_CSV --type CSV --columnsHaveTypes --fieldFile D:MongoFieldFilesfields.txt --file
E:RawDataLoadCSVtest_0034_WW15.CSV --numInsertionWorkers 4 --ignoreBlanks --parseGrace stop
mongoimport --host localhost:27017 --db MDB_CSV --collection MC_CSV –username readWriteUser --password readwrite$ --authenticationDatabase
MDB_CSV --type CSV --columnsHaveTypes --fieldFile D:MongoFieldFilesfields.txt --file E:RawDataLoadCSVtest_0034_WW14.CSV --
numInsertionWorkers 4 --ignoreBlanks --parseGrace stop
mongoimport --host svr1:27017 --db MDB_CSV --collection MC_CSV –username readWriteUser --password readwrite$ --authenticationDatabase MDB_CSV --
type CSV --columnsHaveTypes --headerline --file E:RawDataLoadCSVtest_0034_WW13.CSV --numInsertionWorkers 4 --ignoreBlanks --parseGrace stop
Our Journey From RDBMS to MongoDB
24. Bumps and Key Learnings
InsertMany BulkWrite
InsertMany Vs BulkWrite
No easy way to capture insertion count.
Internally uses BulkWrite.
Cannot handle multiple type of
(Insert/Update/Delete) operations at once.
Our Journey From RDBMS to MongoDB
25. Bumps and Key Learnings
Replica set connection through C# driver - Programmatically build the connection string -
mongodb://<username>:<password>@<host1:port1>,..,<hostN:portN>/<authDB>?<connectOptions>
Various Ways of Replica set Connection
Our Journey From RDBMS to MongoDB
26. Data Access Patterns
Aggregation queries that roll up the ball data – $group, $project, $match
3D Scatter plots of ball X and Y positions along with ball data values, to identify certain error patterns
Clustering or commonality analysis of ball data
0
5
10
15
20
25
30
1 2 3 4 5 6 7 8 9 10
TOTALUNITS
CLUSTER ID
Cluster Distribution
Our Journey From RDBMS to MongoDB
28. Performance benchmarks : Multi User Scalability
400
410
450
1100
1900
130
131
135
150
190
2 5 10 20 40
TIMEINSECONDS
NUMBER OF CONCURRENT USERS
PERFORMANCE ON RDBMS VS MONGODB
RDBMS Performance in Seconds MongoDB Performance in Seconds
MongoDB Handles up to 40 concurrent
users within acceptable query
performance time
Our Journey From RDBMS to MongoDB
29. Query Performance - Data vs Cache Size Correlation
0
20
40
60
80
100
120
140
160
180
Default cache size 100 GB cache size Default cache size 200 GB cache size Default cache size 300GB cache size
TIMETOQUERYINMINUTES
DB Query Performance vs Cache Size Correlation
Query For 10K Units
Query For 20K Units
Query For 100K Units
2TB DB Size 3TB DB Size1TB DB Size
Our Journey From RDBMS to MongoDB
30. Next Steps and Help Needed
Our Next Steps
MongoDB 4.0
Compliance to ACID properties that includes transaction
across multiple documents. This will allow us to develop
applications and schema with advantages of RDBMS on
NO SQL environment.
Aggregation pipeline enhancements.
Sharding
Sharding will be explored further to combine multiple
replica set based MongoDB into a few giant MongoDB
shard environments.
Help Needed from MongoDB
More inbuilt analytical functions(e.g. percentile, rank,
median) in aggregation pipeline will help in advanced
data analysis.
Row level security helps with better access control.
Improvement in aggregate query performance on
shard environment.
Our Journey From RDBMS to MongoDB