SlideShare a Scribd company logo
ATechnical Introduction to WiredTiger
info@wiredtiger.com
Presenter
•  Keith Bostic
•  Co-architect WiredTiger
•  Senior Staff Engineer MongoDB
Ask questions as we go, or
keith.bostic@mongodb.com
3
This presentation is not…
•  How to write stand-alone WiredTiger apps
– contact info@wiredtiger.com
•  How to configure MongoDB with WiredTiger for your workload
4
WiredTiger
•  Embedded database engine
– general purpose toolkit
– high performing: scalable throughput with low latency
•  Key-value store (NoSQL)
•  Schema layer
– data typing, indexes
•  Single-node
•  OO APIs
– Python, C, C++, Java
•  Open Source
5
Deployments
•  Amazon AWS
•  ORC/Tbricks: financial trading solution
And, most important of all:
•  MongoDB: next-generation document store
You may have seen this:
or this…
8
MongoDB’s Storage Engine API
•  Allows different storage engines to "plug-in"
– different workloads have different performance characteristics
– mmapV1 is not ideal for all workloads
– more flexibility
•  mix storage engines on same replica set/sharded cluster
•  Opportunity to innovate further
– HDFS, encrypted, other workloads
•  WiredTiger is MongoDB’s general-purpose workhorse
Topics
Ø  WiredTiger Architecture
•  In-memory performance
•  Record-level concurrency
•  Compression
•  Durability and the journal
•  Future features
10
Motivation for WiredTiger
•  Traditional engines struggle with modern hardware:
– lots of CPU cores
– lots of RAM
•  Avoid thread contention for resources
– lock-free algorithms, for example, hazard pointers
– concurrency control without blocking
•  Hotter cache, more work per I/O
– big blocks
– compact file formats
11
WiredTiger Architecture
WiredTiger Engine
Schema &
Cursors
Python API C API Java API
Database
Files
Transactions
Page
read/write
Logging
Column
storage
Block
management
Row
storage
Snapshots
Log Files
Cache
12
Column-store, LSM
•  Column-store
– implemented inside the B+tree
– 64-bit record number keys
– valued by the key’s position in the tree
– variable-length or fixed-length
•  LSM
– forest of B+trees (row-store or column-store)
– bloom filters (fixed-length column-store)
•  Mix-and-match
– sparse, wide table: column-store primary, LSM indexes
Topics
ü  WiredTiger Architecture
Ø  In-memory performance
•  Record-level concurrency
•  Compression
•  Durability and the journal
•  Future features
14
Trees in cache
non-resident
child
ordinary pointer
root page
internal
page
internal
page
root page
leaf page
leaf page leaf page leaf page
15
Hazard Pointers
non-resident
child
ordinary pointer
root page
internal
page
internal
page
root page
leaf page
leaf page leaf page leaf page
1 memory
flush
16
Pages in cache
cache
data files
page images
on-disk
page
image
index
clean
page on-disk
page
image
indexdirty
page
updates
17
Skiplists
•  Updates stored in skiplists
– ordered linked lists with forward “skip” pointers
•  William Pugh, 1989
– simpler, as fast as binary-search, less space
– likely binary-search performance plus cache prefetch
– more space for an existing data set
•  Implementation
– insert without locking
– forward/backward traversal without locking, while inserting
– removal requires locking
18
In-memory performance
•  Cache trees/pages optimized for in-memory access
•  Follow pointers to traverse a tree
•  No locking to read or write
•  Keep updates separate from initial data
– updates are stored in skiplists
– updates are atomic in almost all cases
•  Do structural changes (eviction, splits) in background threads
Topics
ü  WiredTiger Architecture
ü  In-memory performance
Ø  Record-level concurrency
•  Compression
•  Durability and the journal
•  Future features
20
Multiversion Concurrency Control (MVCC)
•  Multiple versions of records maintained in cache
•  Readers see most recently committed version
– read-uncommitted or snapshot isolation available
– configurable per-transaction or per-handle
•  Writers can create new versions concurrent with readers
•  Concurrent updates to a single record cause write conflicts
– one of the updates wins
– other generally retries with back-off
•  No locking, no lock manager
21
Pages in cache
cache
data files
page images
on-disk
page
image
index
clean
page on-disk
page
image
indexdirty
page
updates
skiplist
22
MVCC In Action
on-disk
page
image
index
update1
(txn, value)
on-disk
page
image
index
update2
(txn, value)
update1
(txn, value)
update
on-disk
page
image
index
update
Topics
ü  WiredTiger Architecture
ü  In-memory performance
ü  Record-level concurrency
Ø  Compression
•  Durability and the journal
•  Future features
24
Block manager
•  Block allocation
– fragmentation
– allocation policy
•  Checksums
– block compression is at a higher level
•  Checkpoints
– involved in durability guarantees
•  Opaque address cookie
– stored as internal page key’s “value”
•  Pluggable
25
Write path
cache
data files
page images
on-disk
page
image
index
clean
page on-disk
page
image
indexdirty
page
updates
reconciled
during write
26
In-memory Compression
•  Prefix compression
– index keys usually have a common prefix
– rolling, per-block, requires instantiation for performance
•  Huffman/static encoding
– burns CPU
•  Dictionary lookup
– single value per page
•  Run-length encoding
– column-store values
27
On-disk Compression
•  Compression algorithms:
– snappy [default]: good compression, low overhead
– LZ4: good compression, low overhead, better page layout
– zlib: better compression, high overhead
– pluggable
•  Optional
– compressing filesystem instead
28
Compression in Action
Flights database
Topics
ü  WiredTiger Architecture
ü  In-memory performance
ü  Record-level concurrency
ü  Compression
Ø  Durability and the journal
•  Future features
30
Journal and Recovery
•  Write-ahead logging (aka journal) enabled by default
•  Only written at transaction commit
– only write redo records
•  Log records are compressed
•  Group commit for concurrency
•  Automatic log archival / removal
– bounded by checkpoint frequency
•  On startup, find a consistent checkpoint in the metadata
– use the checkpoint to figure out how much to roll forward
31
Durability without Journaling
•  MongoDB’s MMAP storage requires the journal for consistency
– running with “nojournal” is unsafe
•  WiredTiger is a no-overwrite data store
– with “nojournal”, updates since the last checkpoint may be lost
– data will still be consistent
– checkpoints every N seconds by default
•  Replication can guarantee durability
– the network is generally faster than disk I/O
Topics
ü  WiredTiger Architecture
ü  In-memory performance
ü  Record-level concurrency
ü  Compression
ü  Durability and the journal
Ø  Future features
33
34
What’s next for WiredTiger?
•  Our Big Year of Tuning
– applications doing “interesting” things
– stalls during checkpoints with 100GB+ caches
– MongoDB capped collections
•  Encryption
•  Advanced transactional semantics
– updates not stable until confirmed by replica majority
35
WiredTiger LSM support
•  Random insert workloads
•  Data set much larger than cache
•  Query performance less important
•  Background maintenance overhead acceptable
•  Bloom filters
36
37
Benchmarks
Mark Callaghan at Facebook: http://smalldatum.blogspot.com/
Thanks!
Questions?
Keith Bostic
keith.bostic@mongodb.com

More Related Content

What's hot

RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
MIJIN AN
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
confluent
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
Derek Stainer
 
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustShipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Altinity Ltd
 
mongodb와 mysql의 CRUD 연산의 성능 비교
mongodb와 mysql의 CRUD 연산의 성능 비교mongodb와 mysql의 CRUD 연산의 성능 비교
mongodb와 mysql의 CRUD 연산의 성능 비교Woo Yeong Choi
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
Dremio Corporation
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
Jun Rao
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
Norberto Leite
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
Thejas Nair
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
University of California, Santa Cruz
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
Brendan Gregg
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Altinity Ltd
 
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Spark Summit
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
Michael Keane
 
Galera cluster for MySQL - Introduction Slides
Galera cluster for MySQL - Introduction SlidesGalera cluster for MySQL - Introduction Slides
Galera cluster for MySQL - Introduction Slides
Severalnines
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 
Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...
Ontico
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdf
Alkin Tezuysal
 

What's hot (20)

RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustShipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
 
mongodb와 mysql의 CRUD 연산의 성능 비교
mongodb와 mysql의 CRUD 연산의 성능 비교mongodb와 mysql의 CRUD 연산의 성능 비교
mongodb와 mysql의 CRUD 연산의 성능 비교
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
 
How Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for PerformanceHow Netflix Tunes EC2 Instances for Performance
How Netflix Tunes EC2 Instances for Performance
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
Galera cluster for MySQL - Introduction Slides
Galera cluster for MySQL - Introduction SlidesGalera cluster for MySQL - Introduction Slides
Galera cluster for MySQL - Introduction Slides
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdf
 

Similar to A Technical Introduction to WiredTiger

MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage EngineMongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB
 
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
MongoDB
 
WiredTiger & What's New in 3.0
WiredTiger & What's New in 3.0WiredTiger & What's New in 3.0
WiredTiger & What's New in 3.0
MongoDB
 
Innodb 和 XtraDB 结构和性能优化
Innodb 和 XtraDB 结构和性能优化Innodb 和 XtraDB 结构和性能优化
Innodb 和 XtraDB 结构和性能优化
YUCHENG HU
 
Let the Tiger Roar - MongoDB 3.0
Let the Tiger Roar - MongoDB 3.0Let the Tiger Roar - MongoDB 3.0
Let the Tiger Roar - MongoDB 3.0
Norberto Leite
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Alluxio, Inc.
 
What'sNnew in 3.0 Webinar
What'sNnew in 3.0 WebinarWhat'sNnew in 3.0 Webinar
What'sNnew in 3.0 Webinar
MongoDB
 
InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)
Ontico
 
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Lars Marowsky-Brée
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
Andriy Zabavskyy
 
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Community
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
MongoDB
 
9.6_Course Material-Postgresql_002.pdf
9.6_Course Material-Postgresql_002.pdf9.6_Course Material-Postgresql_002.pdf
9.6_Course Material-Postgresql_002.pdf
sreedb2
 
InnoDB Architecture and Performance Optimization, Peter Zaitsev
InnoDB Architecture and Performance Optimization, Peter ZaitsevInnoDB Architecture and Performance Optimization, Peter Zaitsev
InnoDB Architecture and Performance Optimization, Peter Zaitsev
Fuenteovejuna
 
MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)
MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)
MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)
MongoDB
 
Beyond the Basics 1: Storage Engines
Beyond the Basics 1: Storage Engines	Beyond the Basics 1: Storage Engines
Beyond the Basics 1: Storage Engines
MongoDB
 
SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)
Lars Marowsky-Brée
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014
marvin herrera
 
071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen
Steve Feldman
 
High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2
ScribbleLive
 

Similar to A Technical Introduction to WiredTiger (20)

MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage EngineMongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
 
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
https://docs.google.com/presentation/d/1DcL4zK6i3HZRDD4xTGX1VpSOwyu2xBeWLT6a_...
 
WiredTiger & What's New in 3.0
WiredTiger & What's New in 3.0WiredTiger & What's New in 3.0
WiredTiger & What's New in 3.0
 
Innodb 和 XtraDB 结构和性能优化
Innodb 和 XtraDB 结构和性能优化Innodb 和 XtraDB 结构和性能优化
Innodb 和 XtraDB 结构和性能优化
 
Let the Tiger Roar - MongoDB 3.0
Let the Tiger Roar - MongoDB 3.0Let the Tiger Roar - MongoDB 3.0
Let the Tiger Roar - MongoDB 3.0
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
 
What'sNnew in 3.0 Webinar
What'sNnew in 3.0 WebinarWhat'sNnew in 3.0 Webinar
What'sNnew in 3.0 Webinar
 
InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)InnoDB architecture and performance optimization (Пётр Зайцев)
InnoDB architecture and performance optimization (Пётр Зайцев)
 
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
Modeling, estimating, and predicting Ceph (Linux Foundation - Vault 2015)
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
Ceph Day Amsterdam 2015: Measuring and predicting performance of Ceph clusters
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
 
9.6_Course Material-Postgresql_002.pdf
9.6_Course Material-Postgresql_002.pdf9.6_Course Material-Postgresql_002.pdf
9.6_Course Material-Postgresql_002.pdf
 
InnoDB Architecture and Performance Optimization, Peter Zaitsev
InnoDB Architecture and Performance Optimization, Peter ZaitsevInnoDB Architecture and Performance Optimization, Peter Zaitsev
InnoDB Architecture and Performance Optimization, Peter Zaitsev
 
MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)
MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)
MongoDB 3.0 and WiredTiger (Event: An Evening with MongoDB Dallas 3/10/15)
 
Beyond the Basics 1: Storage Engines
Beyond the Basics 1: Storage Engines	Beyond the Basics 1: Storage Engines
Beyond the Basics 1: Storage Engines
 
SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014
 
071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen071410 sun a_1515_feldman_stephen
071410 sun a_1515_feldman_stephen
 
High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 

Recently uploaded (20)

Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 

A Technical Introduction to WiredTiger

  • 1. ATechnical Introduction to WiredTiger info@wiredtiger.com
  • 2. Presenter •  Keith Bostic •  Co-architect WiredTiger •  Senior Staff Engineer MongoDB Ask questions as we go, or keith.bostic@mongodb.com
  • 3. 3 This presentation is not… •  How to write stand-alone WiredTiger apps – contact info@wiredtiger.com •  How to configure MongoDB with WiredTiger for your workload
  • 4. 4 WiredTiger •  Embedded database engine – general purpose toolkit – high performing: scalable throughput with low latency •  Key-value store (NoSQL) •  Schema layer – data typing, indexes •  Single-node •  OO APIs – Python, C, C++, Java •  Open Source
  • 5. 5 Deployments •  Amazon AWS •  ORC/Tbricks: financial trading solution And, most important of all: •  MongoDB: next-generation document store
  • 6. You may have seen this:
  • 8. 8 MongoDB’s Storage Engine API •  Allows different storage engines to "plug-in" – different workloads have different performance characteristics – mmapV1 is not ideal for all workloads – more flexibility •  mix storage engines on same replica set/sharded cluster •  Opportunity to innovate further – HDFS, encrypted, other workloads •  WiredTiger is MongoDB’s general-purpose workhorse
  • 9. Topics Ø  WiredTiger Architecture •  In-memory performance •  Record-level concurrency •  Compression •  Durability and the journal •  Future features
  • 10. 10 Motivation for WiredTiger •  Traditional engines struggle with modern hardware: – lots of CPU cores – lots of RAM •  Avoid thread contention for resources – lock-free algorithms, for example, hazard pointers – concurrency control without blocking •  Hotter cache, more work per I/O – big blocks – compact file formats
  • 11. 11 WiredTiger Architecture WiredTiger Engine Schema & Cursors Python API C API Java API Database Files Transactions Page read/write Logging Column storage Block management Row storage Snapshots Log Files Cache
  • 12. 12 Column-store, LSM •  Column-store – implemented inside the B+tree – 64-bit record number keys – valued by the key’s position in the tree – variable-length or fixed-length •  LSM – forest of B+trees (row-store or column-store) – bloom filters (fixed-length column-store) •  Mix-and-match – sparse, wide table: column-store primary, LSM indexes
  • 13. Topics ü  WiredTiger Architecture Ø  In-memory performance •  Record-level concurrency •  Compression •  Durability and the journal •  Future features
  • 14. 14 Trees in cache non-resident child ordinary pointer root page internal page internal page root page leaf page leaf page leaf page leaf page
  • 15. 15 Hazard Pointers non-resident child ordinary pointer root page internal page internal page root page leaf page leaf page leaf page leaf page 1 memory flush
  • 16. 16 Pages in cache cache data files page images on-disk page image index clean page on-disk page image indexdirty page updates
  • 17. 17 Skiplists •  Updates stored in skiplists – ordered linked lists with forward “skip” pointers •  William Pugh, 1989 – simpler, as fast as binary-search, less space – likely binary-search performance plus cache prefetch – more space for an existing data set •  Implementation – insert without locking – forward/backward traversal without locking, while inserting – removal requires locking
  • 18. 18 In-memory performance •  Cache trees/pages optimized for in-memory access •  Follow pointers to traverse a tree •  No locking to read or write •  Keep updates separate from initial data – updates are stored in skiplists – updates are atomic in almost all cases •  Do structural changes (eviction, splits) in background threads
  • 19. Topics ü  WiredTiger Architecture ü  In-memory performance Ø  Record-level concurrency •  Compression •  Durability and the journal •  Future features
  • 20. 20 Multiversion Concurrency Control (MVCC) •  Multiple versions of records maintained in cache •  Readers see most recently committed version – read-uncommitted or snapshot isolation available – configurable per-transaction or per-handle •  Writers can create new versions concurrent with readers •  Concurrent updates to a single record cause write conflicts – one of the updates wins – other generally retries with back-off •  No locking, no lock manager
  • 21. 21 Pages in cache cache data files page images on-disk page image index clean page on-disk page image indexdirty page updates skiplist
  • 22. 22 MVCC In Action on-disk page image index update1 (txn, value) on-disk page image index update2 (txn, value) update1 (txn, value) update on-disk page image index update
  • 23. Topics ü  WiredTiger Architecture ü  In-memory performance ü  Record-level concurrency Ø  Compression •  Durability and the journal •  Future features
  • 24. 24 Block manager •  Block allocation – fragmentation – allocation policy •  Checksums – block compression is at a higher level •  Checkpoints – involved in durability guarantees •  Opaque address cookie – stored as internal page key’s “value” •  Pluggable
  • 25. 25 Write path cache data files page images on-disk page image index clean page on-disk page image indexdirty page updates reconciled during write
  • 26. 26 In-memory Compression •  Prefix compression – index keys usually have a common prefix – rolling, per-block, requires instantiation for performance •  Huffman/static encoding – burns CPU •  Dictionary lookup – single value per page •  Run-length encoding – column-store values
  • 27. 27 On-disk Compression •  Compression algorithms: – snappy [default]: good compression, low overhead – LZ4: good compression, low overhead, better page layout – zlib: better compression, high overhead – pluggable •  Optional – compressing filesystem instead
  • 29. Topics ü  WiredTiger Architecture ü  In-memory performance ü  Record-level concurrency ü  Compression Ø  Durability and the journal •  Future features
  • 30. 30 Journal and Recovery •  Write-ahead logging (aka journal) enabled by default •  Only written at transaction commit – only write redo records •  Log records are compressed •  Group commit for concurrency •  Automatic log archival / removal – bounded by checkpoint frequency •  On startup, find a consistent checkpoint in the metadata – use the checkpoint to figure out how much to roll forward
  • 31. 31 Durability without Journaling •  MongoDB’s MMAP storage requires the journal for consistency – running with “nojournal” is unsafe •  WiredTiger is a no-overwrite data store – with “nojournal”, updates since the last checkpoint may be lost – data will still be consistent – checkpoints every N seconds by default •  Replication can guarantee durability – the network is generally faster than disk I/O
  • 32. Topics ü  WiredTiger Architecture ü  In-memory performance ü  Record-level concurrency ü  Compression ü  Durability and the journal Ø  Future features
  • 33. 33
  • 34. 34 What’s next for WiredTiger? •  Our Big Year of Tuning – applications doing “interesting” things – stalls during checkpoints with 100GB+ caches – MongoDB capped collections •  Encryption •  Advanced transactional semantics – updates not stable until confirmed by replica majority
  • 35. 35 WiredTiger LSM support •  Random insert workloads •  Data set much larger than cache •  Query performance less important •  Background maintenance overhead acceptable •  Bloom filters
  • 36. 36
  • 37. 37 Benchmarks Mark Callaghan at Facebook: http://smalldatum.blogspot.com/

Editor's Notes

  1. Feel free to ask questions as we go, hopefully there will be a few minutes for Q&A at the end. Also happy to discuss by email, let me know how we can help.
  2. Build a toolkit: one path is to build special-purpose engines to handle specific workloads, another path is to handle complex/changing workloads.
  3. This kind of positive feedback isn’t common in engineering groups.
  4. The structure for the rest of the talk.
  5. Traditional storage engine designs struggle with modern hardware. I/O, in relationship to memory, a worse outcome than ever before: trade CPU for I/O wherever possible. Big block I/O: if we have to do I/O, bring in a lot of data.
  6. Moderately complex inside. Outside APIs, handle + method Top-level schema layer, where every table and associated indexes Operations are transactionally protected, implemented in terms of in-memory snapshots. Operations are to a row-store engine (ordered key/value pair), column-store (key is a 64-bit record number) Block management is intended to be pluggable itself. On disk, key/value pair files and log files.
  7. From now on, I’m going to mostly be talking about “trees” in-memory, without distinguishing what type of tree – here’s the overview, after which it’s just a page in-memory.
  8. WiredTiger focuses on in-memory performance: I/O means you’ve lost the war, you’re only trying to retreat gracefully.
  9. WiredTiger does have root pages and internal pages, with leaf pages at the bottom: binary search of each page yields the child page for the subsequent search. Importantly, pointers in memory are not disk offsets (in many engines in-memory objects find each other use disk offsets, so, for example, a transition from an internal page to a leaf page implies giving the cache subsystem a disk offset, and the cache returns the in-memory pointer, possibly after a read. The WiredTiger in-memory tree is exactly that, it’s an in-memory optimized tree. This is good (fast in-memory operations), this is bad (we have a translation step after reading, and before writing, disk blocks). We knew we wanted to modify our in-memory representation without changing our on-disk format (avoid upgrades!), and we knew a lot of our compression algorithms would require translation before writing anyway (for example, our on-disk pages have no indexing information, saves about 15-20% of the disk footprint in some workloads).
  10. To make this efficient, pointers need to be protected: once data is larger than cache, there needs to be a check to ensure a pointer is valid. There’s a background thread doing eviction of pages. Hazard pointers can be thought of as micro-logging. Readers/writers publishes the memory address of a page it wants on a non-shared cache line; after that publish, if the pointer is still valid, can proceed. Eviction threads must check those locations to ensure a page is not currently in use: eviction bears the burden, readers/writers go fast. Design principal: application threads never wait, shift work from application threads to system-internal threads.
  11. Different in-memory representation vs on-disk – opportunities to optimize plus some interesting features. Read on-disk page into cache, generally a read-only image. Add indexing information (binary search) on top of that image. Updates are layered on top of that image, including new key/value pairs inserted between existing keys. if the page grows too large, background threads will deal with it. Lots of magic in traversal, threads must go back-and-forth between the original image and the updated image. Writing a page: Combine the previous page image with in-memory changes If the result is too big, split Allocate new space in the file Always write multiples of the allocation size
  12. To summarize: Note we haven’t talked about locking at all: application threads can retrieve and update data without every acquiring a lock. Justin Levandoski’s BW-Tree work: they’ve avoided taking pages out of circulation during splits, interesting.
  13. Readers don’t block writers, writers don’t block readers, writers don’t block writers, again, no locks have been acquired.
  14. If there have been no updates, then it’s easy, the on-page item is the correct item for any query. If there are updates, each update has associated with it a transaction ID, and that transaction ID combined with the transaction’s snapshot, determines the correct value.
  15. Index references the original page image, once updates installed readers/writers have to check for updates. The first update in the list is generally the one we want, if it’s not yet visible, other updates are checked. if no update is correct, the value on the original page must be the one we want. All updates done by atomic updates, swapping a new pointer into place, readers concurrent to updates. Writing the page to disk requires processing all of this information, and determining the values to write.
  16. Page-Write: transform the in-memory version: selecting values to write, page-splitting, all sorts of compression, checksums. Checkpoints are simply another “snapshot reader”, so they run concurrently with other readers and writers. Writing a page: Combine the previous page image with in-memory changes Allocate new space in the file If the result is too big, split Always write multiples of the allocation size Can configure direct I/O to keep reads and writes out of the filesystem cache
  17. All of these apply in-memory and on disk, so we save both on disk and in the cache.
  18. in the same code paths, we compress the data.
  19. In WiredTiger, compact file sizes, and that certain types of compression cannot be turned off, WiredTiger without “compression”, is 50%.
  20. Storage engines are all about not losing your stuff. Pretty standard WAL implementation: before a commit is visible, a log record with all of the changes in the transaction has been flushed to stable storage. Group commit: concurrent log writes are done with a single storage flush. Started with “Scalability of write-ahead logging on multicore and multisocket hardware by Ryan Johnson, Ippokratis Pandis, Radu Stoica, Manos Athanassoulis and Anastasia Ailamaki.”, and there’s lots more engineering there.
  21. Checkpoints move from one stable point to another one. Our goal was to build a single-node engine, and for that reason, we had to run without a log without losing durability.
  22. Lookaside tables.
  23. A very large data-set over time: blue: a btree tails off over time as the probability of a page being found in cache decreases (that’s why the random nature of the insert matters). red/green flatter: only maintain the recent updates in cache, and merge the updates in the background. LSM is write-optimized, though: the reason the btree is primary is that read-mostly workloads generally behave better in a btree than in LSM. What we want to do eventually, is enable the conversion of an LSM tree into a btree (if you think of a forest of btrees collapsing into a single btree, that matches nicely with the typical workload of inserting a lot of data and then processing that data. Ideally, we’d also be able to reverse that process on demand.