Денис Нелюбин, "Тамтэк"
Upcoming SlideShare
Loading in...5
×
 

Денис Нелюбин, "Тамтэк"

on

  • 1,591 views

HighLoad++ 2013

HighLoad++ 2013

Statistics

Views

Total Views
1,591
Views on SlideShare
1,114
Embed Views
477

Actions

Likes
1
Downloads
11
Comments
0

3 Embeds 477

http://www.highload.ru 438
http://2012.highload.co 38
http://2012.highload.ru 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Денис Нелюбин, "Тамтэк" Денис Нелюбин, "Тамтэк" Presentation Transcript

    • Тестирование производительности NoSQL БД Денис Нелюбин
    • Thumbtack Technology Inc.
    • A.K.A.: Citrusleaf Creator: Aerospike, August 2012 License: Proprietary, Community edition Category: Key-value, Complex data types + Secondary indexes (from v.3.0)
    • A.K.A.: CouchDB + Membase Creator: Couchbase, Inc. (CouchOne + Membase), January 2012 License: Apache 2.0, Proprietary (Enterprise edition) Category: Key-value, Document + Secondary indexes (from v.2.0)
    • A.K.A.: Apache Cassandra Creator: Facebook, July 2008 License: Apache 2.0 Category: Key-value, BigTable, Columnoriented
    • A.K.A.: Mongo Creator: 10gen (MongoDB, Inc.), March 2010 License: AGPL, Commercial license Category: Document-oriented
    • YCSB A.K.A.: Yahoo! Cloud Serving Benchmark Creator: Yahoo! Research, June 2010 License: Apache 2.0 Category: NoSQL benchmark
    • YCSB Data set ● key: "user" + 64-bit Fowler-Noll-Vo hash ● value: 10 fields of random data Load ● insert N records Run ● update and read on N records by the key
    • YCSB
    • YCSB Does NOT do, does NOT check: ● join ● secondary index ● where clause ● partial update
    • Why YCSB? ● applicable to any database ● popular ● de-facto standard
    • Hardware Servers: 4 * (8 * Xeon + 32GB RAM + 4 * 120GB SSD) Clients: 8 * (4 * i5 + 4GB RAM) Single client is not enough
    • Hardware: CPU 8 cores Xeon ≈ 4 cores i5 * *(unproved)
    • Hardware: Network 1 Gbps is not enough 1 Gbit/sec / 1 KB of data ≈ 100 000 ops/sec Single IO queue on single CPU is not enough # cat /proc/interrupts | grep eth 90: 0 0 0 0 IR-PCI-MSI-edge eth0 91: 275107859 0 0 0 IR-PCI-MSI-edge eth0-TxRx-0 92: 227858040 0 0 0 IR-PCI-MSI-edge eth0-TxRx-1 93: 242082684 0 0 0 IR-PCI-MSI-edge eth0-TxRx-2 94: 230651008 0 0 0 IR-PCI-MSI-edge eth0-TxRx-3 95: 217273950 0 0 0 IR-PCI-MSI-edge eth0-TxRx-4 96: 240149262 0 0 0 IR-PCI-MSI-edge eth0-TxRx-5 97: 194736879 0 0 0 IR-PCI-MSI-edge eth0-TxRx-6 98: 270089080 0 0 0 IR-PCI-MSI-edge eth0-TxRx-7
    • Hardware: SSD Overprovisioning ● hdparm ● fdisk http://en.wikipedia.org/wiki/Write_amplification
    • OS (GNU/Linux) ulimit ● nofile > 4k RAID (RAID 0?) ● mdadm ● lvm Read-ahead ● minimal http://upload.wikimedia.org/wikipedia/commons/a/a4/Gnu-linux-on-white.png
    • Test Data sets: ● RAM: 50M * 100 byte ≈ 5GB ● SSD: 500M * 100 byte ≈ 50GB ● replication factor = 2 Workloads: ● Heavy Write: 50% update / 50% read ● Mostly Read: 5% update / 95% read http://kushsrivastava.files.wordpress.com/2012/11/test.gif Consistency: ● Sync replication ● Async replication
    • Insert, RAM
    • Insert, SSD
    • Heavy Update, RAM
    • Heavy Update, SSD
    • Heavy Update, Latency
    • Mostly Read, RAM
    • Mostly Read, SSD
    • Speed Insert Couchbase* Aerospike Cassandra MongoDB Update Couchbase* Aerospike Cassandra MongoDB * in memory or on smaller data set Read Aerospike Couchbase* MongoDB Cassandra
    • Failover test ● 50%, 75%, 100% of max throughput ● Heavy Update ● ● ● ● ● 10min warmup kill -9 10min without one node service start 20min after restore
    • Aerospike, Sync, SSD, 50%
    • Cassandra, Async, SSD, 50%
    • Couchbase, Async, RAM, 100%
    • MongoDB, Async, SSD, 50%
    • Replication MongoDB Cassandra Couchbase/Aerospike
    • Data storing reliability Cassandra MongoDB Aerospike Couchbase archive live data fast cache, eviction cache (async only)
    • Capacity Cassandra MongoDB Aerospike Couchbase* packed archive unpacked live data indexes in RAM + SSD metadata and cache in RAM * was able to take only 200M records
    • Deployment Couchbase Aerospike Cassandra MongoDB four clicks powerful config config+config+calculator shards of replica-sets
    • Managing Couchbase MongoDB Cassandra Aerospike superduperwebconsole commands and docs * exists ** raw *** * use MMS (MongoDB Management Service) ** use DataStax products *** try AMC (Aerospike Monitoring Console)
    • Unique features Aerospike ● SSD support, speed Couchbase ● good web console, easy deployment Cassandra ● writes faster than reads ;) MongoDB ● documents
    • Troublesomes Aerospike ● eviction ● secret config options ● long start Couchbase ● big data ● strange client behaviour ● long start ● long shutdown http://www.spreadshirt.com/herecomes-trouble-women-s-t-shirtsC3376A9069098
    • Troublesomes Cassandra ● need to think about the config ;) MongoDB ● mongos have to be restarted ● replica-set is too surviving ;) http://www.spreadshirt.com/herecomes-trouble-women-s-t-shirtsC3376A9069098
    • When to use: Aerospike Big Fast Cache http://x-celestia-x.deviantart.com/art/I-am-the-best-Rainbow-Dash-358472521
    • When to use: Couchbase In-memory Cache with Persistence http://zutheskunk.deviantart.com/art/MLP-Resource-Shadowbolt-Female-02-238973870
    • When to use: Cassandra Big-Data Archive http://www.deviantart.com/art/Zecora-324988216
    • When to use: MongoDB Universal DB for Web http://www.deviantart.com/art/Trixie-221583239
    • Not only YCSB ● From scratch, inspired by YCSB ● More tests ○ Secondary indexes (cardinality, overhead) ○ Aggregation (average value) ○ Collection data types (stack, array, wide row)
    • Thanks Denis Nelubin <dnelubin@gmail.com> Alexey Remnev <aremnev@thumbtack.net>