Making Big Data Roar
Data Centers are expensive 
Company Location Data Center Cost Data Center Size MW 
NSA Camp Williams, UT $2B 133 
Apple Maiden, NC $1B 67 
Internet Villages Annandale, Scot. $1.6B 107 
Lockerbie DC Lockerbie, Scotland $1.5B 100 
Social Security Baltimore, MD $400M 27 
Next Generation Data Wales, UK $300M 20 
Facebook Princeville, OR $215M 15
WiredTiger Mission 
WiredTiger is rethinking data 
management for modern hardware 
with a focus on multi-core scalability 
and maximizing the value of every 
byte of RAM.
Database/Storage Ecosystem
A New Data Management Engine 
● Architected for modern computer systems 
● Scalable and able to handle big data 
● High throughput, consistent low latency 
● Row-store, column-store, log structured merge 
● ACID transactions, standard isolation levels 
● Checkpoint and fine-grained durability 
● Supporting columns, indices, projections 
● Production quality, fully supported 
● NoSQL, Open Source
Flexible Storage 
● Access methods tailored to workload 
o Row store (read mostly of all columns) 
o Column store (read mostly of some columns) 
o Log-structured merge trees (mostly random writes) 
● Compact storage format 
o RLE, key-prefix, dictionary and static compression 
o Stream compression 
● Adapt workload to storage (RAM, SSD, HDD)
Flexible Configuration 
● API offers a simple key/value store, or 
● A complete schema layer 
o Specify data types 
o Map columns to files 
o Automatically maintain indices 
o Queries only read required columns 
o Projections, index-only scans 
● Checkpoint or fine-grained durability
Improved Efficiency 
● Higher CPU Utilization 
o Multi-core scalability 
o Minimize contention 
between threads 
o Non-locking 
algorithms 
o Hazard pointers 
● Lower Power Costs 
● Flash Optimized Block 
Layout
Consistent High Performance 
● In-cache or I/O bound 
● Workload Configuration 
o Efficient sparse data 
(column-store) 
o Bounded queries and 
updates (row-store) 
o Write-optimized 
(LSM) 
● Data structures for 
access at RAM speed
Consistent Low Latency 
● Non-locking algorithms 
● Multi-versioned data 
● Optimistic concurrency 
control 
● Deadlock-free 
transactions 
● I/O shifted to 
background threads
Cost Effective 
Metric 
iiBench run cost $6.44 $12.88 
Cost per Billion 
$20.30 $40.60 
inserts* 
● WiredTiger provides a 50% cost savings for the same AWS workload 
● More details on this benchmark are available here.
Customers
Management Team 
Keith Bostic is a founder and architect at WiredTiger. He was a founder of Sleepycat Software, 
(acquired by Oracle Corp. in 2006), and one of the architects of the Berkeley DB, the most widely-used 
embedded data management software in the world. 
Mr. Bostic was one of architects of the University of California, Berkeley, 2.10BSD and 4BSD releases, 
where he lead the 4BSD release Open Source effort. He is the recipient of a USENIX Association 
Lifetime Achievement Award (The Flame), which recognizes singular contributions to the UNIX 
community. 
Dr. Michael Cahill is a founder and architect at WiredTiger. He was an architect of Berkeley DB at 
Sleepycat Software and Oracle Corp., responsible for design and implementation of multiversion 
concurrency control, as well as SQL interfaces and programming language APIs. Previously, Dr. 
Cahill was CTO at Bullant Technology, which grew tenfold and raised over US$30 million from 
investors including Intel Capital and JP Morgan during his three year tenure. 
Dr. Cahill’s PhD from the University of Sydney is in the area of transaction processing and 
concurrency control. His work on a new algorithm for implementing serializable isolation received an 
ACM SIGMOD Best Paper award and was added to PostgreSQL 9.1.
Summary and Next Steps 
We’d like to discuss how we could help you 
with your solution. 
Thanks! Questions? info@wiredtiger.com

WiredTiger Overview

  • 1.
  • 2.
    Data Centers areexpensive Company Location Data Center Cost Data Center Size MW NSA Camp Williams, UT $2B 133 Apple Maiden, NC $1B 67 Internet Villages Annandale, Scot. $1.6B 107 Lockerbie DC Lockerbie, Scotland $1.5B 100 Social Security Baltimore, MD $400M 27 Next Generation Data Wales, UK $300M 20 Facebook Princeville, OR $215M 15
  • 3.
    WiredTiger Mission WiredTigeris rethinking data management for modern hardware with a focus on multi-core scalability and maximizing the value of every byte of RAM.
  • 4.
  • 5.
    A New DataManagement Engine ● Architected for modern computer systems ● Scalable and able to handle big data ● High throughput, consistent low latency ● Row-store, column-store, log structured merge ● ACID transactions, standard isolation levels ● Checkpoint and fine-grained durability ● Supporting columns, indices, projections ● Production quality, fully supported ● NoSQL, Open Source
  • 6.
    Flexible Storage ●Access methods tailored to workload o Row store (read mostly of all columns) o Column store (read mostly of some columns) o Log-structured merge trees (mostly random writes) ● Compact storage format o RLE, key-prefix, dictionary and static compression o Stream compression ● Adapt workload to storage (RAM, SSD, HDD)
  • 7.
    Flexible Configuration ●API offers a simple key/value store, or ● A complete schema layer o Specify data types o Map columns to files o Automatically maintain indices o Queries only read required columns o Projections, index-only scans ● Checkpoint or fine-grained durability
  • 8.
    Improved Efficiency ●Higher CPU Utilization o Multi-core scalability o Minimize contention between threads o Non-locking algorithms o Hazard pointers ● Lower Power Costs ● Flash Optimized Block Layout
  • 9.
    Consistent High Performance ● In-cache or I/O bound ● Workload Configuration o Efficient sparse data (column-store) o Bounded queries and updates (row-store) o Write-optimized (LSM) ● Data structures for access at RAM speed
  • 10.
    Consistent Low Latency ● Non-locking algorithms ● Multi-versioned data ● Optimistic concurrency control ● Deadlock-free transactions ● I/O shifted to background threads
  • 11.
    Cost Effective Metric iiBench run cost $6.44 $12.88 Cost per Billion $20.30 $40.60 inserts* ● WiredTiger provides a 50% cost savings for the same AWS workload ● More details on this benchmark are available here.
  • 12.
  • 13.
    Management Team KeithBostic is a founder and architect at WiredTiger. He was a founder of Sleepycat Software, (acquired by Oracle Corp. in 2006), and one of the architects of the Berkeley DB, the most widely-used embedded data management software in the world. Mr. Bostic was one of architects of the University of California, Berkeley, 2.10BSD and 4BSD releases, where he lead the 4BSD release Open Source effort. He is the recipient of a USENIX Association Lifetime Achievement Award (The Flame), which recognizes singular contributions to the UNIX community. Dr. Michael Cahill is a founder and architect at WiredTiger. He was an architect of Berkeley DB at Sleepycat Software and Oracle Corp., responsible for design and implementation of multiversion concurrency control, as well as SQL interfaces and programming language APIs. Previously, Dr. Cahill was CTO at Bullant Technology, which grew tenfold and raised over US$30 million from investors including Intel Capital and JP Morgan during his three year tenure. Dr. Cahill’s PhD from the University of Sydney is in the area of transaction processing and concurrency control. His work on a new algorithm for implementing serializable isolation received an ACM SIGMOD Best Paper award and was added to PostgreSQL 9.1.
  • 14.
    Summary and NextSteps We’d like to discuss how we could help you with your solution. Thanks! Questions? info@wiredtiger.com

Editor's Notes

  • #3 The best number available to estimate the cost of a data center is the number of power supplies: that number determines heating and cooling costs, as well as hardware and software (license units) costs. While the number of CPUs per power supply continues to increase, CPUs are no longer getting faster, and at the data center level we need to look at software efficiencies to gain further scale beyond what the hardware can deliver. For the foreseeable future, multi-core scaling is key to better performance and increased efficiency. Common indexing technology in use today was written for computer architectures of the early 1990s, better software efficiency yields huge benefits
  • #4 WiredTiger is focused on single-node data management in service of high-end applications, improving application scalability and efficiency via software innovation.
  • #5 WiredTiger is entirely focused on single-node resource cost per transaction. WiredTiger does not include data distribution or other horizontal scaling software. WiredTiger is intended for applications running on a single node which require the maximum possible performance from the indexing technology, or as a storage technology for applications supporting their own horizontal scaling solutions.
  • #7 Row-store is a traditional database object, where keys are byte strings and all columns of a row are stored together, best for read-mostly workloads where all columns are equally valuable. Column-store groups columns in storage and only the necessary columns are read to satisfy a query. Log-structured merge trees (LSM) support high-speed random inserts, at the cost of slower reads. WiredTiger supports all three access methods and the access methods can be combined (for example, a sparse, wide table configured with a column-store primary, where indexes are stored in an LSM tree). WiredTiger supports a large number of compression algorithms: RLE: run-length encoding when columns repeat Key-prefix: Btree key-prefix compression Dictionary: unique columns only stored once per write block Static: Huffman encoding Stream: pluggable stream compression (for example, snappy or zlib); because WiredTiger supports variable-length blocks, stream compression can be applied in all cases, unlike engines where compression must operate in block-sized units.
  • #9 Unlike other indexing technologies, for example LevelDB and InnoDB, WiredTiger scales linearly as additional cores are added.
  • #10 iiBench is a standard benchmark used to measure MySQL performance. Compared to InnoDB WiredTiger showed consistently better query rates . . .
  • #11 . . . and much more consistent latency as you scale rows in the data-store.
  • #12 The ultimate benefit to the customer is reduced cost. This chart shows the cost of a billion inserts on an Amazon Web Services instance for the popular engine InnoDB versus WiredTiger: WiredTiger returns twice the performance on a typical AWS instance.