MongoDB - Warehouse and Aggregator of Events

MONGODB
WAREHOUSE AND AGGREGATOR OF EVENTS
Kyiv Big Data & BI User Group
May 14, 2015

Big data is a broad term for data sets so large or complex that
traditional data processing applications are inadequate
@wikipedia

Small Data is when is fit in RAM
Big Data is when is crash because is not fit in RAM
@devops_borat

DESIGNATION
Collect, aggregate and store events from a different sources
Provide load balancing, failover and disaster recovery within
geographically distributed infrastructure

CONDITIONS
Constantly growing events rate
Random intensive access with strict response time (OLTP)
Strict retention period
Existing infrastructure

WHERE IS BIGDATA?
Huge number and variety of event sources
Events are concentrated in "one place"
Response to query is strictly limited
Returned data should be totally consistent

E-L-K SOLUTION
Events LogStash ElasticSearch Kibana

M-L-F SOLUTION
Events LogStash MongoDB Flask
(REST API)

ELASTICSEARCH VS. MONGODB
Search Engine Document Store
Java C++
9+ supported languages 25+ supported languages
(R as one of them)
– Server-side scripting
RESTful API/JSON API –
– MapReduce
– Security features

ELASTICSEARCH VS. MONGODB
Number of shards defined on
index creation
Shards can be added dynamic
Replicas synchronized with
Primary node
Secondaries synchronized
with Primary node
Replicas can be used for data
retrieval
Secondaries can be used for
data retrieval

ElasticSearch is a search engine, but MongoDB is a documents
store which is more applicable
Custom REST API is required
Easier infrastructure integration for MongoDB
Overhead in rebuilding indexes on ElasticSearch due to
inserts/removes
MongoDB can connect with ElasticSearch for full-featured text
search if required

UPTIME
Availability % Downtime
per year
Downtime
per month
Downtime
per week
90% ("one nine") 36.5 d 72 h 16.8 h
95% 18.25 d 36 h 8.4 h
99.999% ("five
nines")
5.26 m 25.9 s 6.05 s
99.9999% ("six
nines")
31.5 s 2.59 s 604.8 ms
99.9999999%
("nine nines")
31.5569 ms 2.6297 ms 0.6048 ms

DATA DISTRIBUTION
* Purpose of Sharding

RANGE BASED SHARDING
MongoDB divides the data set into ranges determined by the
shard key values to provide range based partitioning.
* Range Based Sharding

HASH BASED SHARDING
MongoDB computes a hash of a field’s value, and then uses these
hashes to create chunks.
* Hash Based Sharding

HIGH AVAILABILITY
* Primary with Two Secondary Members

HIGH AVAILABILITY
Number of
Members.
Majority Required to Elect
a New Primary.
Fault
Tolerance.
3 2 1
4 3 1
5 3 2
6 4 2

WORKING SET
50 events per second and 0.5KB each
Retention period is 90 days
Index factor is 40%
Backup factor is 50%
(effect disk size only)

WORKING SET
273 GB for 90 days
500 * 50 * 90 * 24 * 60 * 60 = 194.4 GB + 40%
91 GB for 30 days
46 GB for 15 days

DATA IN RAM
MongoDB tries to keep data in RAM (especially indexes)
For events it is hard to predict most recent data.
Only one assumption that can be taken -
older events will be less demand.

RAM & SHARDS
RAM 90 days
273 Gb
30 days
91 Gb
15 days
46 Gb
8 GB 35 shards 12 shards 5 shards

RAM & SERVERS
Days 8 Gb 16 Gb 32 Gb 64 Gb
90 175 90 45 25
30 60 30 15 10
15 25 15 10 5
* for 5 members Replica Set

RAM & SHARDS
Shards processes query in parallel
Each shard costs 3+ servers
More RAM - less shards

GOLDEN MEAN
5 member Replica Sets Disaster recovery and fail-over
30 days most recent
events
latest events are more demand
16 Gb RAM servers infrastructure limitation
30 data servers a lot of servers, but we should pay the
price ...

DISK IO & RAM
4 GB RAM, 3 nodes

EVENTS FLOW
Received (LogStash)
Buffered (Redis)
Modified (LogStash / MongoDB)
Stored (MongoDB)
Requested (User / REST API)
Processed (REST API / MongoDB)
Returned (REST API)

MUTATIONS
Done by LogStash
1. Inputs (rabbitmq, network, syslog, )
2. Codecs (json, multiline, )
3. Filters (json, csv, drop, )
4. Outputs (mongodb, elasticsearch, email, file, )
etc
etc
etc
etc

MongoDB can scale simply
99,999% level of uptime and security
Smooth infrastructure integration
Customizability of components
Reasonable IO and hardware requirements
Out-of-box features & tools (aggregation, map-reduce, MMS &
OpsManager)

USEFUL LINKS
1.
2. (events and logs manager)
3. (async Python driver for Tornado and MongoDB)
4. (The Power of MongoDb & Elasticsearch
together)
5.
MongoDb Multi-Datacenter Deployments
LogStash
Motor
Mongoosastic
10gen Mongo-Connector

MongoDB - Warehouse and Aggregator of Events

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MongoDB - Warehouse and Aggregator of Events

Similar to MongoDB - Warehouse and Aggregator of Events (20)

Recently uploaded

Recently uploaded (20)

MongoDB - Warehouse and Aggregator of Events