SlideShare a Scribd company logo
MONGODB
WAREHOUSE AND AGGREGATOR OF EVENTS
Kyiv Big Data & BI User Group
May 14, 2015
INTRO
Big data is a broad term for data sets so large or complex that
traditional data processing applications are inadequate
@wikipedia
Small Data is when is fit in RAM
Big Data is when is crash because is not fit in RAM
@devops_borat
DESIGNATION
Collect, aggregate and store events from a different sources
Provide load balancing, failover and disaster recovery within
geographically distributed infrastructure
CONDITIONS
Constantly growing events rate
Random intensive access with strict response time (OLTP)
Strict retention period
Existing infrastructure
WHERE IS BIGDATA?
Huge number and variety of event sources
Events are concentrated in "one place"
Response to query is strictly limited
Returned data should be totally consistent
SOLUTIONS
E-L-K SOLUTION
Events LogStash ElasticSearch Kibana
PLUS
M-L-F SOLUTION
Events LogStash MongoDB Flask
(REST API)
COMPARISON
ELASTICSEARCH VS. MONGODB
Search Engine Document Store
Java C++
9+ supported languages 25+ supported languages
(R as one of them)
– Server-side scripting
RESTful API/JSON API –
– MapReduce
– Security features
ELASTICSEARCH VS. MONGODB
Number of shards defined on
index creation
Shards can be added dynamic
Replicas synchronized with
Primary node
Secondaries synchronized
with Primary node
Replicas can be used for data
retrieval
Secondaries can be used for
data retrieval
DECISION
ElasticSearch is a search engine, but MongoDB is a documents
store which is more applicable
Custom REST API is required
Easier infrastructure integration for MongoDB
Overhead in rebuilding indexes on ElasticSearch due to
inserts/removes
MongoDB can connect with ElasticSearch for full-featured text
search if required
OVERVIEW
MONGODB
UPTIME
Availability % Downtime
per year
Downtime
per month
Downtime
per week
90% ("one nine") 36.5 d 72 h 16.8 h
95% 18.25 d 36 h 8.4 h
99.999% ("five
nines")
5.26 m 25.9 s 6.05 s
99.9999% ("six
nines")
31.5 s 2.59 s 604.8 ms
99.9999999%
("nine nines")
31.5569 ms 2.6297 ms 0.6048 ms
MONGODB CLUSTER
DATA DISTRIBUTION
* Purpose of Sharding
RANGE BASED SHARDING
MongoDB divides the data set into ranges determined by the
shard key values to provide range based partitioning.
* Range Based Sharding
HASH BASED SHARDING
MongoDB computes a hash of a field’s value, and then uses these
hashes to create chunks.
* Hash Based Sharding
HIGH AVAILABILITY
* Primary with Two Secondary Members
HIGH AVAILABILITY
* Primary with Two Secondary Members
HIGH AVAILABILITY
Number of
Members.
Majority Required to Elect
a New Primary.
Fault
Tolerance.
3 2 1
4 3 1
5 3 2
6 4 2
ESTIMATION
WORKING SET
50 events per second and 0.5KB each
Retention period is 90 days
Index factor is 40%
Backup factor is 50%
(effect disk size only)
WORKING SET
273 GB for 90 days
500 * 50 * 90 * 24 * 60 * 60 = 194.4 GB + 40%
91 GB for 30 days
46 GB for 15 days
DATA IN RAM
MongoDB tries to keep data in RAM (especially indexes)
For events it is hard to predict most recent data.
Only one assumption that can be taken -
older events will be less demand.
RAM & SHARDS
RAM 90 days
273 Gb
30 days
91 Gb
15 days
46 Gb
8 GB 35 shards 12 shards 5 shards
16 GB 18 shards 6 shards 3 shards
32 GB 9 shards 3 shards 2 shards
64 GB 5 shards 2 shards 1 shards
RAM & SERVERS
Days 8 Gb 16 Gb 32 Gb 64 Gb
90 175 90 45 25
30 60 30 15 10
15 25 15 10 5
* for 5 members Replica Set
RAM & SHARDS
Shards processes query in parallel
Each shard costs 3+ servers
More RAM - less shards
GOLDEN MEAN
5 member Replica Sets Disaster recovery and fail-over
30 days most recent
events
latest events are more demand
16 Gb RAM servers infrastructure limitation
30 data servers a lot of servers, but we should pay the
price ...
PERFORMANCE
DISK IO & RAM
4 GB RAM, 3 nodes
EVENTS LIFE CYCLE
EVENTS FLOW
Received (LogStash)
Buffered (Redis)
Modified (LogStash / MongoDB)
Stored (MongoDB)
Requested (User / REST API)
Processed (REST API / MongoDB)
Returned (REST API)
MUTATIONS
Done by LogStash
1. Inputs (rabbitmq, network, syslog, )
2. Codecs (json, multiline, )
3. Filters (json, csv, drop, )
4. Outputs (mongodb, elasticsearch, email, file, )
etc
etc
etc
etc
SUMMARY
MongoDB can scale simply
99,999% level of uptime and security
Smooth infrastructure integration
Customizability of components
Reasonable IO and hardware requirements
Out-of-box features & tools (aggregation, map-reduce, MMS &
OpsManager)
USEFUL LINKS
1.
2. (events and logs manager)
3. (async Python driver for Tornado and MongoDB)
4. (The Power of MongoDb & Elasticsearch
together)
5.
MongoDb Multi-Datacenter Deployments
LogStash
Motor
Mongoosastic
10gen Mongo-Connector
QUESTIONS
THANK YOU (:

More Related Content

What's hot

TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
Morgan Tocker
 
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy IndustriesWebinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
MongoDB
 
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
Rommel Garcia
 
Counters At Scale - A Cautionary Tale
Counters At Scale - A Cautionary TaleCounters At Scale - A Cautionary Tale
Counters At Scale - A Cautionary Tale
Eric Lubow
 
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
Altinity Ltd
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
Gleb Kanterov
 
Scaling metrics
Scaling metricsScaling metrics
Scaling metrics
Vladimir Varfolomeev
 
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingClickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Vianney FOUCAULT
 
Druid
DruidDruid
J-Day Kraków: Listen to the sounds of your application
J-Day Kraków: Listen to the sounds of your applicationJ-Day Kraków: Listen to the sounds of your application
J-Day Kraków: Listen to the sounds of your application
Maciej Bilas
 
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander ZaitsevClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
Altinity Ltd
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Altinity Ltd
 
empirical analysis modeling of power dissipation control in internet data ce...
 empirical analysis modeling of power dissipation control in internet data ce... empirical analysis modeling of power dissipation control in internet data ce...
empirical analysis modeling of power dissipation control in internet data ce...
saadjamil31
 
umeng analytical arch
umeng analytical archumeng analytical arch
umeng analytical arch
Yan Zhang
 
Small intro to Big Data - Old version
Small intro to Big Data - Old versionSmall intro to Big Data - Old version
Small intro to Big Data - Old version
SoftwareMill
 
Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds ma...
Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds ma...Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds ma...
Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds ma...
Nagios
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Altinity Ltd
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Altinity Ltd
 
druid.io
druid.iodruid.io
RedisConf18 - Redis and Elasticsearch
RedisConf18 - Redis and ElasticsearchRedisConf18 - Redis and Elasticsearch
RedisConf18 - Redis and Elasticsearch
Redis Labs
 

What's hot (20)

TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
 
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy IndustriesWebinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
 
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
 
Counters At Scale - A Cautionary Tale
Counters At Scale - A Cautionary TaleCounters At Scale - A Cautionary Tale
Counters At Scale - A Cautionary Tale
 
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Scaling metrics
Scaling metricsScaling metrics
Scaling metrics
 
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingClickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
 
Druid
DruidDruid
Druid
 
J-Day Kraków: Listen to the sounds of your application
J-Day Kraków: Listen to the sounds of your applicationJ-Day Kraków: Listen to the sounds of your application
J-Day Kraków: Listen to the sounds of your application
 
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander ZaitsevClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
empirical analysis modeling of power dissipation control in internet data ce...
 empirical analysis modeling of power dissipation control in internet data ce... empirical analysis modeling of power dissipation control in internet data ce...
empirical analysis modeling of power dissipation control in internet data ce...
 
umeng analytical arch
umeng analytical archumeng analytical arch
umeng analytical arch
 
Small intro to Big Data - Old version
Small intro to Big Data - Old versionSmall intro to Big Data - Old version
Small intro to Big Data - Old version
 
Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds ma...
Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds ma...Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds ma...
Nagios Conference 2012 - Anders Haal - Why dynamic and adaptive thresholds ma...
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
 
druid.io
druid.iodruid.io
druid.io
 
RedisConf18 - Redis and Elasticsearch
RedisConf18 - Redis and ElasticsearchRedisConf18 - Redis and Elasticsearch
RedisConf18 - Redis and Elasticsearch
 

Similar to MongoDB - Warehouse and Aggregator of Events

Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
MongoDB
 
MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014
Dylan Tong
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
Amazon Web Services
 
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyPilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Stuart Pook
 
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB
 
MySQL Transformation Case Study: 80% Cost Savings & Uninterrupted Availabilit...
MySQL Transformation Case Study: 80% Cost Savings & Uninterrupted Availabilit...MySQL Transformation Case Study: 80% Cost Savings & Uninterrupted Availabilit...
MySQL Transformation Case Study: 80% Cost Savings & Uninterrupted Availabilit...
Mydbops
 
mongodb tutorial
mongodb tutorialmongodb tutorial
mongodb tutorial
Jaehong Park
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About Sharding
MongoDB
 
stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4
Gaurav "GP" Pal
 
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefDevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
Gaurav "GP" Pal
 
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
MongoDB
 
Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)
Ivo Andreev
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADX
Riccardo Zamana
 
Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1
Jungsu Heo
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
Amazon Web Services
 
Galaxy Big Data with MariaDB
Galaxy Big Data with MariaDBGalaxy Big Data with MariaDB
Galaxy Big Data with MariaDB
MariaDB Corporation
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics ToolsBuilding an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Amazon Web Services
 
TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience Sharing
Chen-en Lu
 
The Secret Guide to Cloud Performance - Cloudlook
The Secret Guide to Cloud Performance - CloudlookThe Secret Guide to Cloud Performance - Cloudlook
The Secret Guide to Cloud Performance - Cloudlook
gidgreen
 

Similar to MongoDB - Warehouse and Aggregator of Events (20)

Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyPilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
 
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness Platform
 
MySQL Transformation Case Study: 80% Cost Savings & Uninterrupted Availabilit...
MySQL Transformation Case Study: 80% Cost Savings & Uninterrupted Availabilit...MySQL Transformation Case Study: 80% Cost Savings & Uninterrupted Availabilit...
MySQL Transformation Case Study: 80% Cost Savings & Uninterrupted Availabilit...
 
mongodb tutorial
mongodb tutorialmongodb tutorial
mongodb tutorial
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About Sharding
 
stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4
 
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefDevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
 
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
 
Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)Time Series Databases for IoT (On-premises and Azure)
Time Series Databases for IoT (On-premises and Azure)
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADX
 
Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
Galaxy Big Data with MariaDB
Galaxy Big Data with MariaDBGalaxy Big Data with MariaDB
Galaxy Big Data with MariaDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics ToolsBuilding an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
 
TenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience SharingTenMax Data Pipeline Experience Sharing
TenMax Data Pipeline Experience Sharing
 
The Secret Guide to Cloud Performance - Cloudlook
The Secret Guide to Cloud Performance - CloudlookThe Secret Guide to Cloud Performance - Cloudlook
The Secret Guide to Cloud Performance - Cloudlook
 

Recently uploaded

Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 

Recently uploaded (20)

Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 

MongoDB - Warehouse and Aggregator of Events

  • 1. MONGODB WAREHOUSE AND AGGREGATOR OF EVENTS Kyiv Big Data & BI User Group May 14, 2015
  • 3. Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate @wikipedia
  • 4. Small Data is when is fit in RAM Big Data is when is crash because is not fit in RAM @devops_borat
  • 5. DESIGNATION Collect, aggregate and store events from a different sources Provide load balancing, failover and disaster recovery within geographically distributed infrastructure
  • 6. CONDITIONS Constantly growing events rate Random intensive access with strict response time (OLTP) Strict retention period Existing infrastructure
  • 7. WHERE IS BIGDATA? Huge number and variety of event sources Events are concentrated in "one place" Response to query is strictly limited Returned data should be totally consistent
  • 9. E-L-K SOLUTION Events LogStash ElasticSearch Kibana
  • 10. PLUS
  • 11. M-L-F SOLUTION Events LogStash MongoDB Flask (REST API)
  • 13. ELASTICSEARCH VS. MONGODB Search Engine Document Store Java C++ 9+ supported languages 25+ supported languages (R as one of them) – Server-side scripting RESTful API/JSON API – – MapReduce – Security features
  • 14. ELASTICSEARCH VS. MONGODB Number of shards defined on index creation Shards can be added dynamic Replicas synchronized with Primary node Secondaries synchronized with Primary node Replicas can be used for data retrieval Secondaries can be used for data retrieval
  • 16. ElasticSearch is a search engine, but MongoDB is a documents store which is more applicable Custom REST API is required Easier infrastructure integration for MongoDB Overhead in rebuilding indexes on ElasticSearch due to inserts/removes MongoDB can connect with ElasticSearch for full-featured text search if required
  • 19. UPTIME Availability % Downtime per year Downtime per month Downtime per week 90% ("one nine") 36.5 d 72 h 16.8 h 95% 18.25 d 36 h 8.4 h 99.999% ("five nines") 5.26 m 25.9 s 6.05 s 99.9999% ("six nines") 31.5 s 2.59 s 604.8 ms 99.9999999% ("nine nines") 31.5569 ms 2.6297 ms 0.6048 ms
  • 22. RANGE BASED SHARDING MongoDB divides the data set into ranges determined by the shard key values to provide range based partitioning. * Range Based Sharding
  • 23. HASH BASED SHARDING MongoDB computes a hash of a field’s value, and then uses these hashes to create chunks. * Hash Based Sharding
  • 24. HIGH AVAILABILITY * Primary with Two Secondary Members
  • 25. HIGH AVAILABILITY * Primary with Two Secondary Members
  • 26. HIGH AVAILABILITY Number of Members. Majority Required to Elect a New Primary. Fault Tolerance. 3 2 1 4 3 1 5 3 2 6 4 2
  • 28. WORKING SET 50 events per second and 0.5KB each Retention period is 90 days Index factor is 40% Backup factor is 50% (effect disk size only)
  • 29. WORKING SET 273 GB for 90 days 500 * 50 * 90 * 24 * 60 * 60 = 194.4 GB + 40% 91 GB for 30 days 46 GB for 15 days
  • 30. DATA IN RAM MongoDB tries to keep data in RAM (especially indexes) For events it is hard to predict most recent data. Only one assumption that can be taken - older events will be less demand.
  • 31. RAM & SHARDS RAM 90 days 273 Gb 30 days 91 Gb 15 days 46 Gb 8 GB 35 shards 12 shards 5 shards 16 GB 18 shards 6 shards 3 shards 32 GB 9 shards 3 shards 2 shards 64 GB 5 shards 2 shards 1 shards
  • 32. RAM & SERVERS Days 8 Gb 16 Gb 32 Gb 64 Gb 90 175 90 45 25 30 60 30 15 10 15 25 15 10 5 * for 5 members Replica Set
  • 33. RAM & SHARDS Shards processes query in parallel Each shard costs 3+ servers More RAM - less shards
  • 34. GOLDEN MEAN 5 member Replica Sets Disaster recovery and fail-over 30 days most recent events latest events are more demand 16 Gb RAM servers infrastructure limitation 30 data servers a lot of servers, but we should pay the price ...
  • 36. DISK IO & RAM 4 GB RAM, 3 nodes
  • 38. EVENTS FLOW Received (LogStash) Buffered (Redis) Modified (LogStash / MongoDB) Stored (MongoDB) Requested (User / REST API) Processed (REST API / MongoDB) Returned (REST API)
  • 39. MUTATIONS Done by LogStash 1. Inputs (rabbitmq, network, syslog, ) 2. Codecs (json, multiline, ) 3. Filters (json, csv, drop, ) 4. Outputs (mongodb, elasticsearch, email, file, ) etc etc etc etc
  • 41. MongoDB can scale simply 99,999% level of uptime and security Smooth infrastructure integration Customizability of components Reasonable IO and hardware requirements Out-of-box features & tools (aggregation, map-reduce, MMS & OpsManager)
  • 42. USEFUL LINKS 1. 2. (events and logs manager) 3. (async Python driver for Tornado and MongoDB) 4. (The Power of MongoDb & Elasticsearch together) 5. MongoDb Multi-Datacenter Deployments LogStash Motor Mongoosastic 10gen Mongo-Connector