In this deep dive, we'll look under the hood at how the MongoDB storage engine works to give you greater insight into both performance and data safety. You'll learn about storage layout, indexes, memory mapping, journaling, and fragmentation. This is a session intended for those who already have a basic understanding of MongoDB.
HDFS Federation allows HDFS to scale beyond the limitations of a single namenode by federating the namespace and block management across multiple independent namenodes. This simplifies the design and implementation compared to a distributed namenode approach. Existing single namenode deployments are not impacted and can continue running as is. Federation preserves the robustness of individual namenodes while scaling to more namenodes. It generalizes the block storage layer to allow multiple namenodes to share the same blocks storage. This improves isolation and availability while providing a simpler way to scale HDFS in the near term.
The document provides an overview of the Hadoop Distributed File System (HDFS). It describes HDFS's master-slave architecture with a single NameNode master and multiple DataNode slaves. The NameNode manages filesystem metadata and data placement, while DataNodes store data blocks. The document outlines HDFS components like the SecondaryNameNode, DataNodes, and how files are written and read. It also discusses high availability solutions, operational tools, and the future of HDFS.
The document provides an overview of the Hadoop Distributed File System (HDFS). It describes key HDFS concepts including its design goals, block and rack awareness, file write and read processes, checkpointing, and safe mode operation. HDFS allows for reliable storage of very large files across commodity hardware and provides high throughput access to application data.
The document starts with the introduction for Hadoop and covers the Hadoop 1.x / 2.x services (HDFS / MapReduce / YARN).
It also explains the architecture of Hadoop, the working of Hadoop distributed file system and MapReduce programming model.
The document provides an overview of the Hadoop Distributed File System (HDFS). It describes HDFS design goals of handling hardware failures, large data sets, and streaming data access. It explains key HDFS concepts like blocks, replication, rack awareness, the read/write process, and the roles of the NameNode and DataNodes. It also covers topics like permissions, safe mode, quotas, and commands for interacting with HDFS.
Storage solutions for High Performance Computinggmateesc
This document discusses storage infrastructure for high-performance computing. It begins by introducing data-intensive science and the need for parallel storage systems. It then discusses several parallel file systems used in HPC like GPFS, Lustre, and PanFS. Key concepts covered include data striping, scale-out NAS, parallel file systems, and IO acceleration techniques. The document also discusses challenges of data growth, bottlenecks in scaling storage, and architectures of various parallel file systems.
Accelerating hbase with nvme and bucket cacheDavid Grier
This set of slides describes some initial experiments which we have designed for discovering improvements for performance in Hadoop technologies using NVMe technology
HDFS Federation allows HDFS to scale beyond the limitations of a single namenode by federating the namespace and block management across multiple independent namenodes. This simplifies the design and implementation compared to a distributed namenode approach. Existing single namenode deployments are not impacted and can continue running as is. Federation preserves the robustness of individual namenodes while scaling to more namenodes. It generalizes the block storage layer to allow multiple namenodes to share the same blocks storage. This improves isolation and availability while providing a simpler way to scale HDFS in the near term.
The document provides an overview of the Hadoop Distributed File System (HDFS). It describes HDFS's master-slave architecture with a single NameNode master and multiple DataNode slaves. The NameNode manages filesystem metadata and data placement, while DataNodes store data blocks. The document outlines HDFS components like the SecondaryNameNode, DataNodes, and how files are written and read. It also discusses high availability solutions, operational tools, and the future of HDFS.
The document provides an overview of the Hadoop Distributed File System (HDFS). It describes key HDFS concepts including its design goals, block and rack awareness, file write and read processes, checkpointing, and safe mode operation. HDFS allows for reliable storage of very large files across commodity hardware and provides high throughput access to application data.
The document starts with the introduction for Hadoop and covers the Hadoop 1.x / 2.x services (HDFS / MapReduce / YARN).
It also explains the architecture of Hadoop, the working of Hadoop distributed file system and MapReduce programming model.
The document provides an overview of the Hadoop Distributed File System (HDFS). It describes HDFS design goals of handling hardware failures, large data sets, and streaming data access. It explains key HDFS concepts like blocks, replication, rack awareness, the read/write process, and the roles of the NameNode and DataNodes. It also covers topics like permissions, safe mode, quotas, and commands for interacting with HDFS.
Storage solutions for High Performance Computinggmateesc
This document discusses storage infrastructure for high-performance computing. It begins by introducing data-intensive science and the need for parallel storage systems. It then discusses several parallel file systems used in HPC like GPFS, Lustre, and PanFS. Key concepts covered include data striping, scale-out NAS, parallel file systems, and IO acceleration techniques. The document also discusses challenges of data growth, bottlenecks in scaling storage, and architectures of various parallel file systems.
Accelerating hbase with nvme and bucket cacheDavid Grier
This set of slides describes some initial experiments which we have designed for discovering improvements for performance in Hadoop technologies using NVMe technology
Francesc Alted (UberResearch GmbH), “New Trends In Storing And Analyzing Large Data Silos With Python”.
Bio: Teacher, developer and consultant in a wide variety of business applications. Particularly interested in the field of very large databases, with special emphasis in squeezing the last drop of performance out of computer as whole, i.e. not only the CPU, but the memory and I/O subsystems.
The document discusses HDFS architecture and components. It describes how HDFS uses NameNodes and DataNodes to store and retrieve file data in a distributed manner across clusters. The NameNode manages the file system namespace and regulates access to files by clients. DataNodes store file data in blocks and replicate them for fault tolerance. The document outlines the write and read workflows in HDFS and how NameNodes and DataNodes work together to manage data storage and access.
This document provides an overview and summary of BlueStore, a new storage backend for Ceph that provides faster performance compared to the previous FileStore backend. BlueStore uses a key/value database (RocksDB) to store metadata and writes data directly to block devices. It addresses issues with the transactional consistency of FileStore by avoiding double writes and using more efficient data structures. BlueStore aims to provide more natural transaction atomicity, efficient enumeration of objects, and optimal I/O patterns for different storage devices.
This document provides an overview of big data concepts and Hadoop. It discusses the four V's of big data - volume, velocity, variety, and veracity. It then describes how Hadoop uses MapReduce and HDFS to process and store large datasets in a distributed, fault-tolerant and scalable manner across commodity hardware. Key components of Hadoop include the HDFS file system and MapReduce framework for distributed processing of large datasets in parallel.
Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012Big Data Spain
Session presented at Big Data Spain 2012 Conference
16th Nov 2012
ETSI Telecomunicacion UPM Madrid
www.bigdataspain.org
More info: http://www.bigdataspain.org/es-2012/conference/memory-efficient-applications/francesc-alted
From http://wiki.directi.com/x/hQAa - This is a fairly detailed presentation I made at BarCamp Mumbai on building large storage networks and different SAN topologies. It covers fundamentals of selecting harddrives, RAID levels and performance of various storage architectures. This is Part I of a 3-part series.
The document provides an evaluation report of DaStor, a Cassandra-based data storage and query system. It summarizes the testbed hardware configuration including 9 nodes with 112 cores and 144GB RAM. It also describes the DaStor configuration, data schema for call detail records (CDR), storage architecture with indexing scheme, and benchmark results showing a throughput of around 80,000 write operations per second for the cluster.
Facebook's Approach to Big Data Storage ChallengeDataWorks Summit
Facebook data warehouse cluster stores more than 100PB of data, with 500+ terabytes of data entered into the clusters every day. To meet the capacity requirement of future data growth, storing data in a cost-effective way becomes a top priority of Facebook data infrastructure team. This talk will present various solutions we use to reduce our warehouse cluster`s data footprint: (1) Smart retention: history-based hive table retention control (2) Increase RCFile compression ratio through clever sorting (3) HDFS file-level raiding to reduce the default replication factor of 3 to a lower ratio; (4) Attack small-file raiding problem though Directory-level raiding and raid-aware compaction.
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisSameer Tiwari
There is a plethora of storage solutions for big data, each having its own pros and cons. The objective of this talk is to delve deeper into specific classes of storage types like Distributed File Systems, in-memory Key Value Stores, Big Table Stores and provide insights on how to choose the right storage solution for a specific class of problems. For instance, running large analytic workloads, iterative machine learning algorithms, and real time analytics.
The talk will cover HDFS, HBase and brief introduction to Redis
The document provides examples of commands for using the Navisphere CLI to manage various aspects of an EMC storage system, such as:
1. Listing front-end port speeds, rebooting the SP, getting disk and RAID group information, setting cache parameters, creating RAID groups, binding and modifying LUNs.
2. Creating storage groups, adding LUNs to storage groups and connecting hosts to storage groups.
3. Summarizing how to calculate the stripe size of a LUN based on the RAID type and number of disks in the RAID group.
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksCloudera, Inc.
Scalability of the NameNode has been a key issue for HDFS clusters. Because the entire file system metadata is stored in memory on a single NameNode, and all metadata operations are processed on this single system, the NameNode both limits the growth in size of the cluster and makes the NameService a bottleneck for the MapReduce framework as demand increases. This presentation will describe the features and implementation of HDFS Federation scheduled for release with Hadoop-0.23.
Mass storagestructure pre-final-formattingmarangburu42
This document provides an overview of mass storage structures including magnetic disks, solid-state disks, disk structure, disk attachment methods like host-attached storage, network-attached storage, and storage area networks. It discusses disk scheduling algorithms like FCFS, SSTF, SCAN, C-SCAN and LOOK. It also covers disk management topics such as disk formatting, the boot block, bad blocks, swap space management, and RAID structures.
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "Kuniyasu Suzaki
1) Disk prefetching techniques in guest operating systems may not work well for virtual storage devices due to restrictions of virtual devices.
2) The document proposes adjusting the guest OS behavior to recognize virtual device features, such as increasing data locality to improve storage deduplication performance.
3) An evaluation shows that reorganizing data blocks with a tool called "ext-optimizer" based on access profiles can increase occupancy for deduplicated storage and improve performance of disk prefetching for a guest OS booting on virtualized storage.
AOS Lab 10: File system -- Inodes and beyondZubair Nabi
This document provides a summary of file system concepts in the xv6 operating system, including:
1) Inodes are data structures that represent files and provide metadata and pointers to file data blocks. On-disk inodes are read into memory inodes when files are accessed.
2) Directories are represented by special directory inodes containing directory entries with names and pointers to other inodes.
3) The file system layout divides the disk into sections for the boot sector, superblock, inodes, bitmap, data blocks, and log for atomic transactions.
This document provides an overview of Hadoop and its core components HDFS and MapReduce. It describes how HDFS uses a master/slave architecture with a single NameNode master and multiple DataNode slaves to store and retrieve data in a fault-tolerant manner. The NameNode manages the filesystem namespace and monitors data replication, while DataNodes store data blocks and perform read/write operations. It also discusses high availability techniques for the NameNode and core functions like block placement, garbage collection and stale replica detection in HDFS.
The document summarizes the development of the Fast File System (FFS) which improved upon the original UNIX file system. FFS optimized for disk geometry by dividing disks into cylinder groups and placing related data and metadata nearby to reduce seek times. It used larger block sizes, fragmentation, and rotational optimization to improve throughput by 10x over the original file system. FFS also introduced features like longer file names and quotas.
The CASPUR Staging System II is a disk and tape storage solution that uses a staging process to migrate files between disks and tapes. It consists of three main components - a stager, movers, and user interface commands. The stager monitors disk usage and migrates files to tape to maintain optimal disk occupancy based on configurable policies. Movers handle the actual data transfer between disks and tapes. Users can control files and view statuses using commands.
The document summarizes different types of computer hardware including auxiliary storage devices, input/output architecture, and interfaces. It describes magnetic tape, disks, floppy disks, optical disks, and semiconductor disks. It also covers RAID configurations, input/output control methods like bus, DMA, and different interfaces like serial, parallel, SCSI, and USB.
DWDM-RAM: DARPA-Sponsored Research for Data Intensive Service-on-Demand Advan...Tal Lavian Ph.D.
DWDM-RAM
An architecture for data intensive Grids enabled by next generation dynamic optical networks, incorporating
new methods for lightpath provisioning. DWDM-RAM is designed to meet the networking challenges of
extremely large scale Grid applications. Traditional network infrastructure cannot meet these demands,
especially, requirements for intensive data flows.
This document discusses key performance indicator (KPI) examples for measuring safety and job performance. It provides links to resources on KPIs including lists of sample KPIs, performance appraisal templates, and methods. It also outlines best practices for developing KPIs such as linking them to objectives, focusing on 3-5 key result areas, and designing them to empower employees. Finally, it describes different types of KPIs like leading indicators, lagging indicators, qualitative and quantitative measures.
Francesc Alted (UberResearch GmbH), “New Trends In Storing And Analyzing Large Data Silos With Python”.
Bio: Teacher, developer and consultant in a wide variety of business applications. Particularly interested in the field of very large databases, with special emphasis in squeezing the last drop of performance out of computer as whole, i.e. not only the CPU, but the memory and I/O subsystems.
The document discusses HDFS architecture and components. It describes how HDFS uses NameNodes and DataNodes to store and retrieve file data in a distributed manner across clusters. The NameNode manages the file system namespace and regulates access to files by clients. DataNodes store file data in blocks and replicate them for fault tolerance. The document outlines the write and read workflows in HDFS and how NameNodes and DataNodes work together to manage data storage and access.
This document provides an overview and summary of BlueStore, a new storage backend for Ceph that provides faster performance compared to the previous FileStore backend. BlueStore uses a key/value database (RocksDB) to store metadata and writes data directly to block devices. It addresses issues with the transactional consistency of FileStore by avoiding double writes and using more efficient data structures. BlueStore aims to provide more natural transaction atomicity, efficient enumeration of objects, and optimal I/O patterns for different storage devices.
This document provides an overview of big data concepts and Hadoop. It discusses the four V's of big data - volume, velocity, variety, and veracity. It then describes how Hadoop uses MapReduce and HDFS to process and store large datasets in a distributed, fault-tolerant and scalable manner across commodity hardware. Key components of Hadoop include the HDFS file system and MapReduce framework for distributed processing of large datasets in parallel.
Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012Big Data Spain
Session presented at Big Data Spain 2012 Conference
16th Nov 2012
ETSI Telecomunicacion UPM Madrid
www.bigdataspain.org
More info: http://www.bigdataspain.org/es-2012/conference/memory-efficient-applications/francesc-alted
From http://wiki.directi.com/x/hQAa - This is a fairly detailed presentation I made at BarCamp Mumbai on building large storage networks and different SAN topologies. It covers fundamentals of selecting harddrives, RAID levels and performance of various storage architectures. This is Part I of a 3-part series.
The document provides an evaluation report of DaStor, a Cassandra-based data storage and query system. It summarizes the testbed hardware configuration including 9 nodes with 112 cores and 144GB RAM. It also describes the DaStor configuration, data schema for call detail records (CDR), storage architecture with indexing scheme, and benchmark results showing a throughput of around 80,000 write operations per second for the cluster.
Facebook's Approach to Big Data Storage ChallengeDataWorks Summit
Facebook data warehouse cluster stores more than 100PB of data, with 500+ terabytes of data entered into the clusters every day. To meet the capacity requirement of future data growth, storing data in a cost-effective way becomes a top priority of Facebook data infrastructure team. This talk will present various solutions we use to reduce our warehouse cluster`s data footprint: (1) Smart retention: history-based hive table retention control (2) Increase RCFile compression ratio through clever sorting (3) HDFS file-level raiding to reduce the default replication factor of 3 to a lower ratio; (4) Attack small-file raiding problem though Directory-level raiding and raid-aware compaction.
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisSameer Tiwari
There is a plethora of storage solutions for big data, each having its own pros and cons. The objective of this talk is to delve deeper into specific classes of storage types like Distributed File Systems, in-memory Key Value Stores, Big Table Stores and provide insights on how to choose the right storage solution for a specific class of problems. For instance, running large analytic workloads, iterative machine learning algorithms, and real time analytics.
The talk will cover HDFS, HBase and brief introduction to Redis
The document provides examples of commands for using the Navisphere CLI to manage various aspects of an EMC storage system, such as:
1. Listing front-end port speeds, rebooting the SP, getting disk and RAID group information, setting cache parameters, creating RAID groups, binding and modifying LUNs.
2. Creating storage groups, adding LUNs to storage groups and connecting hosts to storage groups.
3. Summarizing how to calculate the stripe size of a LUN based on the RAID type and number of disks in the RAID group.
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksCloudera, Inc.
Scalability of the NameNode has been a key issue for HDFS clusters. Because the entire file system metadata is stored in memory on a single NameNode, and all metadata operations are processed on this single system, the NameNode both limits the growth in size of the cluster and makes the NameService a bottleneck for the MapReduce framework as demand increases. This presentation will describe the features and implementation of HDFS Federation scheduled for release with Hadoop-0.23.
Mass storagestructure pre-final-formattingmarangburu42
This document provides an overview of mass storage structures including magnetic disks, solid-state disks, disk structure, disk attachment methods like host-attached storage, network-attached storage, and storage area networks. It discusses disk scheduling algorithms like FCFS, SSTF, SCAN, C-SCAN and LOOK. It also covers disk management topics such as disk formatting, the boot block, bad blocks, swap space management, and RAID structures.
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "Kuniyasu Suzaki
1) Disk prefetching techniques in guest operating systems may not work well for virtual storage devices due to restrictions of virtual devices.
2) The document proposes adjusting the guest OS behavior to recognize virtual device features, such as increasing data locality to improve storage deduplication performance.
3) An evaluation shows that reorganizing data blocks with a tool called "ext-optimizer" based on access profiles can increase occupancy for deduplicated storage and improve performance of disk prefetching for a guest OS booting on virtualized storage.
AOS Lab 10: File system -- Inodes and beyondZubair Nabi
This document provides a summary of file system concepts in the xv6 operating system, including:
1) Inodes are data structures that represent files and provide metadata and pointers to file data blocks. On-disk inodes are read into memory inodes when files are accessed.
2) Directories are represented by special directory inodes containing directory entries with names and pointers to other inodes.
3) The file system layout divides the disk into sections for the boot sector, superblock, inodes, bitmap, data blocks, and log for atomic transactions.
This document provides an overview of Hadoop and its core components HDFS and MapReduce. It describes how HDFS uses a master/slave architecture with a single NameNode master and multiple DataNode slaves to store and retrieve data in a fault-tolerant manner. The NameNode manages the filesystem namespace and monitors data replication, while DataNodes store data blocks and perform read/write operations. It also discusses high availability techniques for the NameNode and core functions like block placement, garbage collection and stale replica detection in HDFS.
The document summarizes the development of the Fast File System (FFS) which improved upon the original UNIX file system. FFS optimized for disk geometry by dividing disks into cylinder groups and placing related data and metadata nearby to reduce seek times. It used larger block sizes, fragmentation, and rotational optimization to improve throughput by 10x over the original file system. FFS also introduced features like longer file names and quotas.
The CASPUR Staging System II is a disk and tape storage solution that uses a staging process to migrate files between disks and tapes. It consists of three main components - a stager, movers, and user interface commands. The stager monitors disk usage and migrates files to tape to maintain optimal disk occupancy based on configurable policies. Movers handle the actual data transfer between disks and tapes. Users can control files and view statuses using commands.
The document summarizes different types of computer hardware including auxiliary storage devices, input/output architecture, and interfaces. It describes magnetic tape, disks, floppy disks, optical disks, and semiconductor disks. It also covers RAID configurations, input/output control methods like bus, DMA, and different interfaces like serial, parallel, SCSI, and USB.
DWDM-RAM: DARPA-Sponsored Research for Data Intensive Service-on-Demand Advan...Tal Lavian Ph.D.
DWDM-RAM
An architecture for data intensive Grids enabled by next generation dynamic optical networks, incorporating
new methods for lightpath provisioning. DWDM-RAM is designed to meet the networking challenges of
extremely large scale Grid applications. Traditional network infrastructure cannot meet these demands,
especially, requirements for intensive data flows.
This document discusses key performance indicator (KPI) examples for measuring safety and job performance. It provides links to resources on KPIs including lists of sample KPIs, performance appraisal templates, and methods. It also outlines best practices for developing KPIs such as linking them to objectives, focusing on 3-5 key result areas, and designing them to empower employees. Finally, it describes different types of KPIs like leading indicators, lagging indicators, qualitative and quantitative measures.
The document discusses using Elasticsearch, D3.js, Angular.js, and Google Refine to create a full stack data visualization of open data from Bordeaux, France. It focuses on data from the CAPC contemporary art museum, importing the data into Elasticsearch for scalable search and then using D3.js, Angular.js, and Yeoman to build the front-end visualization with JavaScript. The goal is to make the data more accessible and understandable through interactive visualization.
The document discusses Linked Data and RDF, describing how data from different sources on the web can be connected using URIs, HTTP, and structured data formats like RDF. It provides examples of retrieving and representing data from DBpedia in RDF format using Ruby tools and libraries. It also discusses publishing RDF from a Rails application by adding MIME types and generating RDF representations.
This document discusses graphical analysis of PV plant data. It begins by noting challenges with PV plant data and the importance of displaying data in useful ways. It then provides examples of common charts for questions about plant status, energy production, and sensor performance. A case study examines using multiple irradiance sensors. The document emphasizes focusing on essential questions, using context, choosing quick to understand formats, and filtering out unnecessary details. It concludes by inviting comments and discussion on an online blog.
Data vizualisation: d3.js + sinatra + elasticsearchMathieu Elie
Live screencast on my tech blog (fr speaking):
http://www.mathieu-elie.net/screencast-video-d3-js-sinatra-elasticsearch-capucine/
other tech slides at my blog: http://www.mathieu-elie.net
Applying Safety Data and Analysis to Performance-based Transportation PlanningRPO America
During the 2016 National Regional Transportation Conference, Nicole Waldheim and Danena Gaines (Cambridge Systematics) provided information on techniques to analyzing information regarding transportation safety to the transportation planning process.
More infos on http://www.mathieu-elie.net/eventmachine-introduction-pres-rubybdx-screencast-fr
Ruby Eventmachine is a really goop option to build scalable real time servers and more...
An analyst's perspective on measuring safety performance, discussing reactive and proactive indicators, ideas on developing proactive indicators, and a balanced scorecard approach to safety metrics
MongoDB stores data in files on disk that are broken into variable-sized extents containing documents. These extents, as well as separate index structures, are memory mapped by the operating system for efficient read/write. A write-ahead journal is used to provide durability and prevent data corruption after crashes by logging operations before writing to the data files. The journal increases write performance by 5-30% but can be optimized using a separate drive. Data fragmentation over time can be addressed using the compact command or adjusting the schema.
This document provides an overview of five steps to improve PostgreSQL performance: 1) hardware optimization, 2) operating system and filesystem tuning, 3) configuration of postgresql.conf parameters, 4) application design considerations, and 5) query tuning. The document discusses various techniques for each step such as selecting appropriate hardware components, spreading database files across multiple disks or arrays, adjusting memory and disk configuration parameters, designing schemas and queries efficiently, and leveraging caching strategies.
This document provides an overview of five steps to improve PostgreSQL performance: 1) hardware optimization, 2) operating system and filesystem tuning, 3) configuration of postgresql.conf parameters, 4) application design considerations, and 5) query tuning. The document discusses various techniques for each step such as selecting appropriate hardware components, spreading database files across multiple disks or arrays, adjusting memory parameters, effective indexing, and caching queries and data.
The document describes MongoDB's directory layout and data structures. Key points:
- MongoDB stores each database in its own set of preallocated files containing extents.
- Data is memory mapped from files into virtual memory for fast access.
- The journal logs all operations before writing to data files for durability.
- Collections are divided into fixed-size extents containing records with BSON documents.
- Indexes are also stored in extents containing index records that point to documents.
The document discusses best practices for deploying MongoDB including sizing hardware with sufficient memory, CPU and I/O; using an appropriate operating system and filesystem; installing and upgrading MongoDB; ensuring durability with replication and backups; implementing security, monitoring performance with tools, and considerations for deploying on Amazon EC2.
Rick Copeland is a consultant who previously worked as a software engineer and wrote books on SQLAlchemy and Python. He discusses how MongoDB can scale better than relational databases by avoiding joins, transactions, and normalization. Some scaling techniques for MongoDB include using documents to improve data locality, optimizing indexes, being aware of working data sets, scaling disks, replication for fault tolerance, and sharding for further read and write scaling.
This document discusses MongoDB best practices for deploying MongoDB in AWS. It begins with terminology comparing MongoDB and relational databases. It then shows an example data model in SQL and how that same data would be modeled in MongoDB. The document discusses concepts like cursors, indexing, and sharding in MongoDB. It emphasizes the importance of sizing RAM and disk appropriately based on working set size and data access patterns. Finally, it covers replication in MongoDB and different replication set topologies that can be used in AWS for high availability and disaster recovery.
This document provides guidance on monitoring and optimizing MongoDB deployments. It discusses collecting metrics on operations, memory usage, page faults, locks and B-tree accesses. Key considerations for deployment include profiling usage, monitoring, capacity planning, high availability and disaster recovery. The document also offers tips on warming the cache, using separate disks for journaling and data files, and dynamically changing the log level.
The document discusses the Google File System (GFS), which was developed by Google to handle large files across thousands of commodity servers. It provides three main functions: (1) dividing files into chunks and replicating chunks for fault tolerance, (2) using a master server to manage metadata and coordinate clients and chunkservers, and (3) prioritizing high throughput over low latency. The system is designed to reliably store very large files and enable high-speed streaming reads and writes.
Oracle Open World 2014: Lies, Damned Lies, and I/O Statistics [ CON3671]Kyle Hailey
The document discusses analyzing I/O performance and summarizing lessons learned. It describes common tools used to measure I/O like moats.sh, strace, and ioh.sh. It also summarizes the top 10 anomalies encountered like caching effects, shared drives, connection limits, I/O request consolidation and fragmentation over NFS, and tiered storage migration. Solutions provided focus on avoiding caching, isolating workloads, proper sizing of NFS parameters, and direct I/O.
The document provides guidance on deploying MongoDB in production environments. It discusses sizing hardware requirements for memory, CPU, and disk I/O. It also covers installing and upgrading MongoDB, considerations for cloud platforms like EC2, security, backups, durability, scaling out, and monitoring. The focus is on performance optimization and ensuring data integrity and high availability.
The Parquet Format and Performance Optimization OpportunitiesDatabricks
The Parquet format is one of the most widely used columnar storage formats in the Spark ecosystem. Given that I/O is expensive and that the storage layer is the entry point for any query execution, understanding the intricacies of your storage format is important for optimizing your workloads.
As an introduction, we will provide context around the format, covering the basics of structured data formats and the underlying physical data storage model alternatives (row-wise, columnar and hybrid). Given this context, we will dive deeper into specifics of the Parquet format: representation on disk, physical data organization (row-groups, column-chunks and pages) and encoding schemes. Now equipped with sufficient background knowledge, we will discuss several performance optimization opportunities with respect to the format: dictionary encoding, page compression, predicate pushdown (min/max skipping), dictionary filtering and partitioning schemes. We will learn how to combat the evil that is ‘many small files’, and will discuss the open-source Delta Lake format in relation to this and Parquet in general.
This talk serves both as an approachable refresher on columnar storage as well as a guide on how to leverage the Parquet format for speeding up analytical workloads in Spark using tangible tips and tricks.
Big data technologies are needed because traditional relational databases are not well-suited for storing and analyzing the large, diverse, and fast-growing unstructured data being generated. Hadoop is an open-source framework that allows for the distributed processing of large data sets across clusters of computers. It uses HDFS for storage and MapReduce as a programming model for distributed computing. Hadoop can handle large volumes and varieties of data at high velocities to help solve both new and old problems in better ways compared to traditional databases and data warehouses.
- The document provides guidance on deploying MongoDB including sizing hardware, installing and upgrading MongoDB, configuration considerations for EC2, security, backups, durability, scaling out, and monitoring. Key aspects discussed are profiling and indexing queries for performance, allocating sufficient memory, CPU and disk I/O, using 64-bit OSes, ext4/XFS filesystems, upgrading to even version numbers, and replicating for high availability and backups.
Hadoop is an open-source framework that processes large datasets in a distributed manner across commodity hardware. It uses a distributed file system (HDFS) and MapReduce programming model to store and process data. Hadoop is highly scalable, fault-tolerant, and reliable. It can handle data volumes and variety including structured, semi-structured and unstructured data.
High Performance, Scalable MongoDB in a Bare Metal CloudMongoDB
This document summarizes the results of performance testing MongoDB deployments on bare metal cloud instances compared to public cloud instances. Small, medium, and large tests were conducted using different hardware configurations and data set sizes. The bare metal cloud instances consistently outperformed the public cloud instances, achieving higher operations per second, especially at higher concurrency levels. The document attributes the performance differences to the dedicated, tuned hardware resources of the bare metal instances compared to the shared resources of public cloud virtual instances.
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
on-Volatile-Memory express (NVMe) standard promises and order of magnitude faster storage than regular SSDs, while at the same time being more economical than regular RAM on TB/$. This talk evaluates the use cases and benefits of NVMe drives for its use in Big Data clusters with HBase and Hadoop HDFS.
First, we benchmark the different drives using system level tools (FIO) to get maximum expected values for each different device type and set expectations. Second, we explore the different options and use cases of HBase storage and benchmark the different setups. And finally, we evaluate the speedups obtained by the NVMe technology for the different Big Data use cases from the YCSB benchmark.
In summary, while the NVMe drives show up to 8x speedup in best case scenarios, testing the cost-efficiency of new device technologies is not straightforward in Big Data, where we need to overcome system level caching to measure the maximum benefits.
How to approach a problem from a performance standpoint. A small real world application is used as a case study.
I\'ve presented "High Performance With Java" at Codebits\'2008 held from 13 to 15 November 2008
(*) Codebits is a programming contest held in Portugal held the spirit of Yahoo Hack! Day
This document discusses tools for making NumPy and Pandas code faster and able to run in parallel. It introduces the Dask library, which allows users to work with large datasets in a familiar Pandas/NumPy style through parallel computing. Dask implements parallel DataFrames, Arrays, and other collections that mimic their Pandas/NumPy counterparts. It can scale computations across multiple cores on a single machine or across many machines in a cluster. The document provides examples of using Dask to analyze large CSV and text data in parallel through DataFrames and Bags. It also discusses scaling computations from a single laptop to large clusters.
Similar to Webinar: Understanding Storage for Performance and Data Safety (20)
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
This presentation discusses migrating data from other data stores to MongoDB Atlas. It begins by explaining why MongoDB and Atlas are good choices for data management. Several preparation steps are covered, including sizing the target Atlas cluster, increasing the source oplog, and testing connectivity. Live migration, mongomirror, and dump/restore options are presented for migrating between replicasets or sharded clusters. Post-migration steps like monitoring and backups are also discussed. Finally, migrating from other data stores like AWS DocumentDB, Azure CosmosDB, DynamoDB, and relational databases are briefly covered.
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
Humana, like many companies, is tackling the challenge of creating real-time insights from data that is diverse and rapidly changing. This is our journey of how we used MongoDB to combined traditional batch approaches with streaming technologies to provide continues alerting capabilities from real-time data streams.
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch".
This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
MongoDB Kubernetes operator is ready for prime-time. Learn about how MongoDB can be used with most popular orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications.
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
The document discusses guidelines for ordering fields in compound indexes to optimize query performance. It recommends the E-S-R approach: placing equality fields first, followed by sort fields, and range fields last. This allows indexes to leverage equality matches, provide non-blocking sorts, and minimize scanning. Examples show how indexes ordered by these guidelines can support queries more efficiently by narrowing the search bounds.
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
The document describes a methodology for data modeling with MongoDB. It begins by recognizing the differences between document and tabular databases, then outlines a three step methodology: 1) describe the workload by listing queries, 2) identify and model relationships between entities, and 3) apply relevant patterns when modeling for MongoDB. The document uses examples around modeling a coffee shop franchise to illustrate modeling approaches and techniques.
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms.
How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms?
In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
aux Core Data, appréciée par des centaines de milliers de développeurs. Apprenez ce qui rend Realm spécial et comment il peut être utilisé pour créer de meilleures applications plus rapidement.
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $.
La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfflufftailshop
When it comes to unit testing in the .NET ecosystem, developers have a wide range of options available. Among the most popular choices are NUnit, XUnit, and MSTest. These unit testing frameworks provide essential tools and features to help ensure the quality and reliability of code. However, understanding the differences between these frameworks is crucial for selecting the most suitable one for your projects.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
4. Directory Layout
drwxr-xr-x 4 antoine wheel 136 Nov 19 10:12 journal
-rw------- 1 antoine wheel 16777216 Oct 25 14:58 test.0
-rw------- 1 antoine wheel 134217728 Mar 13 2012 test.1
-rw------- 1 antoine wheel 268435456 Mar 13 2012 test.2
-rw------- 1 antoine wheel 536870912 May 11 2012 test.3
-rw------- 1 antoine wheel 1073741824 May 11 2012 test.4
-rw------- 1 antoine wheel 2146435072 Nov 19 10:14 test.5
-rw------- 1 antoine wheel 16777216 Nov 19 10:13 test.ns
5. Directory Layout
• Each database has one or more data files, all in
same folder (e.g. test.0, test.1, …)
• Aggressive preallocation (always 1 spare file)
• Those files get larger and larger, up to 2GB
• There is one namespace file per db which can
hold 24000 entries per default. A namespace is
a collection or an index.
• The journal folder contains the journal files
6. Tuning with options
• Use --directoryperdb to separate dbs into own folders
which allows to use different volumes (isolation,
performance)
• Use --nopreallocate to prevent preallocation
• Use --smallfiles to keep data files smaller
• If using many databases, use –nopreallocate and --
smallfiles to reduce storage size
• If using thousands of collections & indexes, increase
namespace capacity with --nssize
8. Internal File Format
• Files on disk are broken into extents which
contain the documents
• A collection has 1 to many extents
• Extent grow exponentially up to 2GB
• Namespace entries in the ns file point to the first
extent for that collection
11. Extents and Records
Extent
length
Data Record Data Record
xNext
xPrev length Documen length Documen
{
t {
t
rNext
_id: “foo”,
rNext _id: “bar”,
firstRecord
... ...
} }
rPrev rPrev
lastRecord
13. Indexes
• Indexes are BTree structures serialized to disk
• They are stored in the same files as data but
using own extents
14. Index Extents 4 9
1 3 5 6 8 A B
Extent
length
Index Index
xNext
Record Record
xPrev length Bucket length Bucket
parent parent
rNext rNext
firstRecord numKeys numKeys
rPrev K rPrev
lastRecord
{ Document }
15. the db stats
> db.stats()
{
"db" : "test",
"collections" : 22,
"objects" : 17000383, ## number of documents
"avgObjSize" : 44.33690276272011,
"dataSize" : 753744328, ## size of data
"storageSize" : 1159569408, ## size of all containing extents
"numExtents" : 81,
"indexes" : 85,
"indexSize" : 624204896, ## separate index storage size
"fileSize" : 4176478208, ## size of data files on disk
"nsSizeMB" : 16,
"ok" : 1
}
16. the collection stats
> db.large.stats()
{
"ns" : "test.large",
"count" : 5000000, ## number of documents
"size" : 280000024, ## size of data
"avgObjSize" : 56.0000048,
"storageSize" : 409206784, ## size of all containing extents
"numExtents" : 18,
"nindexes" : 1,
"lastExtentSize" : 74846208,
"paddingFactor" : 1, ## amount of padding
"systemFlags" : 0,
"userFlags" : 0,
"totalIndexSize" : 162228192, ## separate index storage size
"indexSizes" : {
"_id_" : 162228192
},
"ok" : 1
}
18. Memory Mapped Files
• All data files are memory mapped to RAM by the OS
• Mongo just reads / writes to RAM in the filesystem cache
• OS takes care of the rest!
• Mongo calls fsync every 60 seconds to flush changes to disk
• Virtual process size = total files size + overhead (connections,
heap)
• If journal is on, the virtual size will be roughly doubled
19. Virtual Address Space
32-bit System 64-bit System
232 = 4GB 264 = 1.7 x 1010 GB (16EB)?
- 1GB kernel 0xF0 – 0xFF Kernel
- .5GB binaries, stack, etc. 0x00 – 0x7F User
= 2.5GB for data 247 = 128TB for data
BAD GOOD
20. Virtual Address Space
0x7fffffffffff Kernel
STACK
…
LIBS
…
Disk
test.ns
test.0
test.1
…
…
HEAP {…
}
MONGO
D
0x0
NULL
Document
Process Virtual Memory
21. Memory map, love it or hate it
• Pros:
– No complex memory / disk code in MongoDB, huge win!
– The OS is very good at caching for any type of storage
– Pure Least Recently Used behavior
– Cache stays warm between Mongo restart
• Cons:
– RAM is affected by disk fragmentation
– RAM is affected by high read-ahead
– LRU behavior does not prioritize things (like indices)
22. How much data is in RAM?
• Resident memory the best indicator of how
much data in RAM
• Resident is: process overhead (connections,
heap) + FS pages in RAM that were accessed
• Means that it resets to 0 upon restart even
though data is still in RAM due to FS cache
• Use free command to check on FS cache size
• Can be affected by fragmentation and read-
ahead
24. The problem
• A single insert/update involves writing to many
places (the record, indexes, ns details..)
• What if the electricity goes out? Corruption…
25. Solution – use a journal
• Data gets written to a journal before making it to
the data files
• Operations written to a journal buffer in RAM that
gets flushed every 100ms or 100MB
• Once journal written to disk, data safe unless
hardware entirely fails
• Journal prevents corruption and allows
durability
• Can be turned off, but don’t!
26. Journal Format
JHeader • Section contains single group
JSectHeader [LSN commit
3] • Applied all-or-nothing
DurOp
DurOp Set database context
Op_DbContext
for subsequent
DurOp length operations
JSectFooter offset
fileNo
JSectHeader [LSN
data[length]
7]
length
DurOp Write
offset
DurOp fileNo Operation
data[length]
DurOp length
JSectFooter offset
fileNo
… data[length]
27. Can I lose data on hard
crash?
• Maximum data loss is 100ms (journal flush). This
can be reduced with –journalCommitInterval
• For durability (data is on disk when ack’ed) use the
JOURNAL_SAFE write concern (“j” option).
• Note that replication can reduce the data loss
further. Use the REPLICAS_SAFE write concern
(“w” option).
• As write guarantees increase, latency increases.
To maintain performance, use more connections!
28. What is cost of journal?
• On read-heavy systems, no impact
• Write performance is reduced by 5-30%
• If using separate drive for journal, as low as 3%
• For apps that are write-heavy (1000+ writes per
server) there can be slowdown due to mix of
journal and data flushes. Use a separate drive!
30. Fragmentation
• Files can get fragmented over time if remove()
and update() are issued.
• It gets worse if documents have varied sizes
• Fragmentation wastes disk space and RAM
• Also makes writes scattered and slower
• Fragmentation can be checked by comparing
size to storageSize in the collection’s stats.
31. How it looks like
EXTENT
Doc Doc Doc X Doc X
Doc X Doc Doc X
…
BOTH ON DISK AND IN RAM!
32. How to combat
fragmentation?
• compact command (maintenance op)
• Normalize schema more (documents don’t
grow)
• Pre-pad documents (documents don’t grow)
• Use separate collections over time, then use
collection.drop() instead of
collection.remove(query)
• --usePowerOf2sizes option makes disk buckets
more reusable
33. Conclusion
• Understand disk layout and footprint
• See how much data is actually in RAM
• Memory mapping is cool
• Answer how much data is ok to lose
• Check on fragmentation and avoid it
We’ll cover how data is laid out in more detail later, but at a very high level this is what’s contained in each data file.Extents are allocated within files as needed until space inside the file is fully consumed by extents, then another file is allocated.As more and more data is inserted into a collection, the size of the allocated extents grows exponentially up to 2GB.If you’re writing to many small collections, extents will be interleaved within a single database file, but as the collection grows, eventually a single 2GB file will be devoted to data for that single collection.
A contiguous region of a file that is allocated to a specific namespace (namespace being either an index or a collection). Extents are allocated within files as needed until space inside the file is fully consumed by extents, then another file is allocated.The extents within a namespace are chained to each other with the xNext and xPrev record pointers.Important to note that a file can have extents of different namespaces interleaved. Extents will be sized progressively larger from 4k up to 2GB.Extent is comprised of records, and these are chained together with a doubly linked list referenced from the extent. So if you were to do a table scan such as db.foo.find() we would first lookup the first extent from the ns details, and from that access the firstRecord and traverse the list of records until the query was satisfied.
Imagine your data laid out in these (possibly) sparse extents like a backbone, with linked lists of data records hanging off of each of themGo over table scan logic. $natural (sort({$natural:-1})When executing a find() with no parameters, the database returns objects in forward natural order. To do a TABLE SCAN, you would start at the first extent indicated in the NamespaceDetails and then traverse the records within.
Variable length keys, fixed size bucketConstraints: 1k per index key limit; 8k bucket sizeKeyNodes (sorted and point to DataRecord and a KeyNode with offset) KeyObjects (unsorted and variable length)Keys names are not stored in the index – i.e., if you index spec was on CityName and EmailAddress, we won’t store the keyName, but we would store that key value as a document with zero length key.As of 2.0 in the v1 indexes we remove the extra overhead of keeping the document structure and store just the keys themselvesB-tree uses standard split/merge.Unless we detect a monotonically increasing key value, in this case we’ll do a 90/10 split. So in the case that you are inserting timestamp data or an ObjectID, your buckets will stay more optimally filled. This also has the affect that if you are only interested in things that happened recently, the old stuff will fall out of memory and won’t need to be-paged in.
64-BIT:Note that you may think with 64 bits of addressable memory, you might think you have most of that for data files, but in fact you’re only allowed to address 48 bits of memory and the top bit is reserved for kernel memoryThis leaves about 128TB for your data (per process)How many plan to store > 128TB on a single server?Remember, this is just per-process, if you had a sharded system of 100 dbs, you could spread 128TB to each server
Can use pmap to view this layout on linux.So here I have a single document located on disk and show how it’s mapped into memory.Let’s look at how we go about actually accessing this document once it’s loaded into memory
Without journaling, the approach is quite straightforward, there is a one-to-one mapping of data files to memory and when either the OS or an explicit fsync happens, your data is now safe on disk.With journaling we do some tricks.Write ahead log, that is, we write the data to the journal before we update the data itself.Each file is mapped twice, once to a private view which is marked copy-on-write, and once to the shared view – shared in the context that the disk has access to this memory.Every time we do a write, we keep a list of the region of memory that was written to.Batches into group commits, compresses and appends in a group commit to disk by appending to a special journal fileOnce that data has been written to disk, we then do a remapping phase which copies the changes into the shared view, at which point those changes can then be synced to disk.Once that data is synced to disk then it’s safe (barring hardware failure). If there is a failure before the shared/storage view is written to disk, we simply need to apply all the changes in order to the data files since the last time it was synced and we get back to a consistent view of the data
*** Journal is Compressed using snappy to reduce the write volumeLSN (Last Sequence Number) is written to disk as a way to provide a marker to which section of the journal was last flushed to