Sergey Sverchkov
Project Manager
sergey.sverchkov@altoros.com

© ALTOROS Systems | CONFIDENTIAL
•
•
•
•
•
•

© ALTOROS Systems | CONFIDENTIAL

2
•
•

•
•
•
•

© ALTOROS Systems | CONFIDENTIAL

3
•



• Workload is defined by different distributions



•

Operations of the following types:





© ALTOROS Syst...
•





•





© ALTOROS Systems | CONFIDENTIAL

5
© ALTOROS Systems | CONFIDENTIAL

6
•
 Single availability zone eu-west-1b, Ireland region
 Single security group with all required port opened
 4 m1.xlarg...
© ALTOROS Systems | CONFIDENTIAL

8
•
 partitioner: org.apache.cassandra.dht.Murmur3Partitioner
 key_cache_size_in_mb: 1024
 row_cache_size_in_mb: 6096
 J...
•
 Replica factor 1
 Memory + disk mode

•
 JVM heap size 12GB
 Replica factor 1

 Snappy compressor

© ALTOROS Syste...
Performance of the systems was evaluated under different workloads:







© ALTOROS Systems | CONFIDENTIAL

11
Load phase, 100.000.000 records * 1 KB, [INSERT]
9

Average latency, ms

8
7
6
5

hbase

4

cassandra

3

couchbase
mongod...
Workload A: Update (Update 50%, Read 50%)
120
100

cassandra

80

couchbase
hbase

60

mongodb
40
20
0
0

500

1000

1500
...
Workload A: Read (Update 50%, Read 50%)

80
70
60

50

cassandra
couch

40

hbase
mongo

30
20
10
0
0

500

1000

1500

20...
Workload B: Update (update 5% , read 95%)
120
100
80
cassandra
60

couch
hbase

40

mongo

20
0
0

500

1000

1500

© ALTO...
Workload B: Read (update 5% , read 95%)
90

80
70
60
cassandra

50

couch

40

hbase

30

mongo

20
10
0
0

500

1000

150...
Workload C: 100% Read
80
70
60
50

cassandra

40

couch
hbase

30

mongo
20
10
0
0

500

1000

1500

2000

© ALTOROS Syste...
Workload D: Insert (insert 5% , read 95%)
60
50
40
cassandra
30

couch
hbase

20

mongo

10
0
0

500

1000

1500

2000

© ...
Workload D: Read (insert 5% , read 95%)
90
80
70
60
cassandra

50

couch

40

hbase

30

mongo

20
10
0
0

500

1000

1500...
400

Workload E: Insert (Insert 5%, Scan 95%)

350
300
250
200

cassandra

150

hbase

100
50
0

0

50

100

150

© ALTORO...
Workload F: read (Read-Modify-Write 50%, Read 50%)
80
70

60
50

cassandra

40

couch
hbase

30

mongo
20
10
0
0

500

100...
Workload F: Update (Read-Modify-Write 50%, Read 50%)
140
120

100
cassandra

80

couch
60

hbase
mongo

40
20
0
0

500

10...
Workload F: Read-Modify-Write (Read-Modify-Write 50%, Read 50%)
200
180
160
140
120

cassandra

100

couch

80

hbase

60
...
Workload G: Insert (Insert 90%, Read 10%)
35

30
25
cassandra

20

couch
15

hbase
mongo

10
5
0
0

1000

2000

3000

4000...
Workload G: Read (Insert 90%, Read 10%)
60
50
40
cassandra
30

couch
hbase

20

mongo

10
0
0

1000

2000

3000

4000

500...
•
•

•
•
•
•
•
•

© ALTOROS Systems | CONFIDENTIAL

26
Upcoming SlideShare
Loading in …5
×

Сергей Сверчков - Оцениваем решения NoSQL: какая база данных подходит для вашей системы

866 views
819 views

Published on

IT_Share. Highload 2.0

Published in: Business, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
866
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
16
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Abstract:Often referred to as NoSQL, non-relational databases feature elasticity and scalability in combination with a capability to store big data and work with cloud computing systems, all of which make them extremely popular. NoSQL data management systems are inherently schema-free (with no obsessive complexity and a flexible data model) and eventually consistent (complying with BASE rather than ACID). They have a simple API, serve huge amounts of data and provide high throughput. In 2013, the number of NoSQL products reached 150+ and the figure is still growing. That variety makes it difficult to select the best tool for a particular case. Database vendors usually measure productivity of their products with custom hardware and software settings designed to demonstrate the advantages of their solutions.
  • Mostly NoSQL databases differ from relational databases in their data model. These systems are classified into 4 groups.A. Key Value StoresKey value stores are similar to maps or dictionaries where data is addressed by a unique key.B. Document StoresDocument Stores encapsulate key value pairs in JSON or JSON like documents. Within documents, keys have to be unique. In contrast to key value stores,values are not opaque to the system and can be queried as well. Therefore, complex data structures like nested objects can be handled more conveniently. Storing data in interpretable JSONdocuments have the additional advantage of supporting data types, which makes document stores very developer-friendly. C. Column Family StoresColumn Family Stores are also known as column oriented stores, extensible record stores and wide columnar stores.D. Graph databasesKey value stores, document stores and column family stores have in common, that they do store denormalized data in order to gain advantages in distribution.In contrast to relational databases and the already introduced key oriented NoSQL databases, graph databases are specialized on efficient management of heavily linked data.NoSQL databases differ strongly in their offered query functionalities. Besides considering the supported data model and how it influences queries on specific attributes, it isnecessary to have a closer look on the offered interfaces in order to find a suitable database for a specific use case. If a simple, language unspecific API is required, REST interfacescan be a suitable solution especially for web applications, whereas performance critical queries should be exchanged over language specific APls which are available for nearly every common programming language like Java. Query languages offering a higher abstraction level in order to reduce complexity. Therefore, their use is very helpful when more complicated queries should be handled. If calculation intensive queries over large datasets are required, MapReduce frameworks should be used.Multiversion concurrency control (MVCC) relaxes strict consistency in favor of performance. Concurrent access is not managed with locks but by organization of many unmodifiable chronological ordered versions.In order to support transactions without reserving multiple datasets for exclusive access, optimistic locking is provided by many stores. Before changed data is committed, each transaction checks, whether another transactions made any conflicting modifications to the same datasets.NoSQL databases differ in their way they distribute data on multiple machines. Since data models of key value stores, document stores and column family stores are key oriented, the two common partition strategies are based on keys, too.The first strategy distributes datasets by the range of their keys. A routing server splits the whole keyset into blocks andallocates these blocks to different nodes. Afterwards, one node is responsible for storage and request handling of his specifickey ranges. In order to find a certain key, clients have to contact the routing server for getting the partition table.Higher availability and much simpler cluster architecture can be achieved with the second distributionstrategy called consistent hashing [27]. In this shared nothing architecture, there exists no single point of failure. In contrastto range based partitioning, keys are distributed by using hash functions. Since every server is responsible for a certain hashregion, addresses of certain keys within the cluster can be calculated very fast. Good hash functions distribute keysintuitively even wherefore an additional load balancer is not required.In addition to better read performance through load balancing, replication brings also better availability and durability, because failing nodes can be replaced by otherservers. Since distributed databases should be able to cope with temporary node and network failures, only full availability or full consistency can be guaranteed atone time in distributed systems. If all replicas of a master server were updated synchronously, the system would not be available until all slaves had committed a write operation. Ifmessages got lost due to network problems, the system would not be available for a longer period of time. For platformswhich rely on high availability, this solution is not suitable ?ecause even a few milliseconds of latency can have big Influences on user behavior.
  • For benchmarking, we used Yahoo Cloud Serving Benchmark, which consists of the following components:a framework with a workload generatora set of workload scenariosThe workload defines the data that will be loaded into the database during the loading phase, and the operations that will be executed against the data set during the transaction phase.ypically, a workload is a combination of: Workload java class (subclass of com.yahoo.ycsb.Workload) Parameter file (in the Java Properties format)Because the properties of the dataset must be known during the loading phase (so that the proper kind of record can be constructed and inserted) and during the transaction phase (so that the correct record ids and fields can be referred to) a single set of properties is shared among both phases. Thus the parameter file is used in both phases. The workload java class uses those properties to either insert records (loading phase) or execute transactions against those records (transaction phase). We have measured database performance under certain types of workloads. A workload was defined by different distributions assigned to the two main choices:which operation to performwhich record to read or write Operations against a data store were randomly selected and could be of the following types:Insert: Inserts a new record.Update: Updates a record by replacing the value of one field.Read: Reads a record, either one randomly selected field, or all fields.Scan: Scans records in order, starting at a randomly selected record key. The number of records to scan is also selected randomly from the range between 1 and 100.
  • Each workload was targeted at a table of 100,000,000 records; each record was 1,000 bytes in size and contained 10 fields. A primary key identified each record, which was a string, such as “user234123.” Each field was named field0, field1, and so on. The values in each field were random strings of ASCII characters, 100 bytes each. Database performance was defined by the speed at which a database computed basic operations. A basic operation is an action performed by the workload executor, which drives multiple client threads. Each thread executes a sequential series of operations by making calls to the database interface layer both to load the database (the load phase) and to execute the workload (the transaction phase). The threads throttle the rate at which they generate requests, so that we may directly control the offered load against the database. In addition, the threads measure the latency and achieved throughput of their operations and report these measurements to the statistics module.
  • For every benchmark we define what to test – database client and how to test - target throughput – how many operations is to run per second, number of concurrent threads running on YCSBclient side and how many operations to execute for particular database. Every client thread reports it’s progress to Statistics module, which prints the output of test to console where benchmark is started.
  • Cassandra configuration conf/cassandra.yaml# Maximum size of the row cache in memory.# NOTE: if you reduce the size, you may not get you hottest keys loaded on startup.## Default value is 0, to disable row caching.row_cache_size_in_mb: 6096Hbaseconfig file /etc/hbase – when configured from RPM, and /hbase/config
  • Workload A: Update-heavily mode. Workload A is an update-heavily scenario that simulates the database work, during which typical actions of an e-commerce solution user are recorded.Settings for the workload: Read/update ratio: 50/50Zipfian request distributionWorkload BWorkload B is a read-mostly workload that has 95/5 read/update ratio. It recaps content tagging, when adding a tag is an update, but most operations include reading tags.Workload CWorkload C is a read-only workload that simulates a data caching layer, for example a user profile cache.Workload D Workload D has 95/5 read/insert ratio. The workload simulates access to the latest data, such as user status updates or working with inbox messages first.Workload EWorkload E is a scan-short-ranges workload with a scan/insert percentile proportion of 95/5. It corresponds to threaded conversations that are clustered by a thread ID. Each scan is performed for the posts of a given thread.Workload F Workload F has read-modify-write/read ops in a proportion of 50/50. It simulates access to user database, where user records are read and modified by the user. User activity is also recorded to this database.Workload G Workload G has a 10/90 read/insert ratio. It simulates data migration process or highly intensive data creation.
  • hbase - наименьшая производительность ожидаема, так как был включен AutoFlush, это позволяет достичь strong consistency, но заметно влияет на производительность записи. AutoFlush - опция которая включает запись данных на сервер, сразу после того как мы сделали put в клиенте, когда опция выключена, то для записи нужно либо явно вызывать метод flush(), либо он вызывается по мере заполнения клиентского буфера (размер буфера также настраивается).cassandra, couchbase - кассандра обновляет данные в памяти и синхронно пишет логтранзакций на диск, couchbase пишет в память и ставит в очередь записи на диск, работа с диском происходит в асинхронном режиме.
  • As you can see, there is no perfect NoSQL database. Every database has its advantages and disadvantages that become more or less important depending on your preferences and the type of tasks.  For example, a database can demonstrate excellent performance, but once the amount of records exceeds a certain limit, the speed falls dramatically. It means that this particular solution can be good for moderate data loads and extremely fast computations, but it would not be suitable for jobs that require a lot of reads and writes. In addition, database performance also depends on the capacity of your hardware.
  • Сергей Сверчков - Оцениваем решения NoSQL: какая база данных подходит для вашей системы

    1. 1. Sergey Sverchkov Project Manager sergey.sverchkov@altoros.com © ALTOROS Systems | CONFIDENTIAL
    2. 2. • • • • • • © ALTOROS Systems | CONFIDENTIAL 2
    3. 3. • • • • • • © ALTOROS Systems | CONFIDENTIAL 3
    4. 4. •   • Workload is defined by different distributions   • Operations of the following types:     © ALTOROS Systems | CONFIDENTIAL 4
    5. 5. •     •     © ALTOROS Systems | CONFIDENTIAL 5
    6. 6. © ALTOROS Systems | CONFIDENTIAL 6
    7. 7. •  Single availability zone eu-west-1b, Ireland region  Single security group with all required port opened  4 m1.xlarge 64bit instances for cluster nodes: 16GB RAM, 4 vCPU, 8 ECU, highperformance network  1 c1.xlarge 64bit instance for YSCB client: 7GB RAM, 8 vCPU, 20 ECU, highperformance network  2 additional c1.medium 64bit instances for mongo routers: 1.7GB RAM, 2 vCPU, 5 ECU, moderate network •  4 EBS volumes by 25 GB each in RAID0  EBS optimized volumes, no Provisioned IOPS © ALTOROS Systems | CONFIDENTIAL 7
    8. 8. © ALTOROS Systems | CONFIDENTIAL 8
    9. 9. •  partitioner: org.apache.cassandra.dht.Murmur3Partitioner  key_cache_size_in_mb: 1024  row_cache_size_in_mb: 6096  JVM heap size: 6GB  Snappy compressor  Replica factor 1 •  2 c1.medium nodes with mongo router process - mongos  Replica factor 1  Sharding by internal key “_id” © ALTOROS Systems | CONFIDENTIAL 9
    10. 10. •  Replica factor 1  Memory + disk mode •  JVM heap size 12GB  Replica factor 1  Snappy compressor © ALTOROS Systems | CONFIDENTIAL 10
    11. 11. Performance of the systems was evaluated under different workloads:       © ALTOROS Systems | CONFIDENTIAL 11
    12. 12. Load phase, 100.000.000 records * 1 KB, [INSERT] 9 Average latency, ms 8 7 6 5 hbase 4 cassandra 3 couchbase mongodb 2 1 0 0 10000 20000 30000 40000 Throughput, ops/sec © ALTOROS Systems | CONFIDENTIAL 12
    13. 13. Workload A: Update (Update 50%, Read 50%) 120 100 cassandra 80 couchbase hbase 60 mongodb 40 20 0 0 500 1000 1500 2000 © ALTOROS Systems | CONFIDENTIAL 2500 3000 13
    14. 14. Workload A: Read (Update 50%, Read 50%) 80 70 60 50 cassandra couch 40 hbase mongo 30 20 10 0 0 500 1000 1500 2000 © ALTOROS Systems | CONFIDENTIAL 2500 3000 14
    15. 15. Workload B: Update (update 5% , read 95%) 120 100 80 cassandra 60 couch hbase 40 mongo 20 0 0 500 1000 1500 © ALTOROS Systems | CONFIDENTIAL 2000 2500 15
    16. 16. Workload B: Read (update 5% , read 95%) 90 80 70 60 cassandra 50 couch 40 hbase 30 mongo 20 10 0 0 500 1000 1500 © ALTOROS Systems | CONFIDENTIAL 2000 2500 16
    17. 17. Workload C: 100% Read 80 70 60 50 cassandra 40 couch hbase 30 mongo 20 10 0 0 500 1000 1500 2000 © ALTOROS Systems | CONFIDENTIAL 2500 3000 17
    18. 18. Workload D: Insert (insert 5% , read 95%) 60 50 40 cassandra 30 couch hbase 20 mongo 10 0 0 500 1000 1500 2000 © ALTOROS Systems | CONFIDENTIAL 2500 3000 18
    19. 19. Workload D: Read (insert 5% , read 95%) 90 80 70 60 cassandra 50 couch 40 hbase 30 mongo 20 10 0 0 500 1000 1500 2000 © ALTOROS Systems | CONFIDENTIAL 2500 3000 19
    20. 20. 400 Workload E: Insert (Insert 5%, Scan 95%) 350 300 250 200 cassandra 150 hbase 100 50 0 0 50 100 150 © ALTOROS Systems | CONFIDENTIAL 200 250 20
    21. 21. Workload F: read (Read-Modify-Write 50%, Read 50%) 80 70 60 50 cassandra 40 couch hbase 30 mongo 20 10 0 0 500 1000 1500 © ALTOROS Systems | CONFIDENTIAL 2000 2500 21
    22. 22. Workload F: Update (Read-Modify-Write 50%, Read 50%) 140 120 100 cassandra 80 couch 60 hbase mongo 40 20 0 0 500 1000 1500 © ALTOROS Systems | CONFIDENTIAL 2000 2500 22
    23. 23. Workload F: Read-Modify-Write (Read-Modify-Write 50%, Read 50%) 200 180 160 140 120 cassandra 100 couch 80 hbase 60 mongo 40 20 0 0 500 1000 1500 © ALTOROS Systems | CONFIDENTIAL 2000 2500 23
    24. 24. Workload G: Insert (Insert 90%, Read 10%) 35 30 25 cassandra 20 couch 15 hbase mongo 10 5 0 0 1000 2000 3000 4000 5000 © ALTOROS Systems | CONFIDENTIAL 6000 7000 24
    25. 25. Workload G: Read (Insert 90%, Read 10%) 60 50 40 cassandra 30 couch hbase 20 mongo 10 0 0 1000 2000 3000 4000 5000 © ALTOROS Systems | CONFIDENTIAL 6000 7000 25
    26. 26. • • • • • • • • © ALTOROS Systems | CONFIDENTIAL 26

    ×