SlideShare a Scribd company logo
Sergey Sverchkov
Project Manager
sergey.sverchkov@altoros.com

© ALTOROS Systems | CONFIDENTIAL
ORDER
Order
ID: 1001
Order Date: 15.9.2012
Customer






Billing Address
Street: Somestreet 10
City: Somewhere
Postal Code: 55901



ADDRESS

Line Items
Quantity

Price

Ipod Touch

1

220.95

Monster Beat

2

190.00

Apple Mouse

1

69.90

Name



CUSTOMER

First Name: Peter
Last Name: Sample

ORDER_LINES






© ALTOROS Systems | CONFIDENTIAL

2
•
•
•
•
•
•
•
•
•

© ALTOROS Systems | CONFIDENTIAL

3
•
•

•
•
•
•

© ALTOROS Systems | CONFIDENTIAL

4
•



• Workload is defined by different distributions



•

Operations of the following types:





© ALTOROS Systems | CONFIDENTIAL

5
•





•





© ALTOROS Systems | CONFIDENTIAL

6
© ALTOROS Systems | CONFIDENTIAL

7
© ALTOROS Systems | CONFIDENTIAL

8
•
 Single availability zone eu-west-1b, Ireland region
 Single security group with all required port opened
 4 m1.xlarge 64bit instances for cluster nodes: 16GB RAM, 4 vCPU, 8 ECU, highperformance network
 1 c1.xlarge 64bit instance for YSCB client: 7GB RAM, 8 vCPU, 20 ECU, highperformance network
 2 additional c1.medium 64bit instances for mongo routers: 1.7GB RAM, 2 vCPU, 5
ECU, moderate network

•
 4 EBS volumes by 25 GB each in RAID0
 EBS optimized volumes, no Provisioned IOPS
© ALTOROS Systems | CONFIDENTIAL

9
•
 partitioner: org.apache.cassandra.dht.Murmur3Partitioner
 key_cache_size_in_mb: 1024
 row_cache_size_in_mb: 6096
 JVM heap size: 6GB
 Snappy compressor
 Replica factor 1

•
 2 c1.medium nodes with mongo router process - mongos
 Replica factor 1
 Sharding by internal key “_id”

© ALTOROS Systems | CONFIDENTIAL

10
•
 Replica factor 1
 Memory + disk mode

•
 JVM heap size 12GB
 Replica factor 1

 Snappy compressor

© ALTOROS Systems | CONFIDENTIAL

11
Performance of the systems was evaluated under different workloads:







© ALTOROS Systems | CONFIDENTIAL

12
Load phase, 100.000.000 records * 1 KB, [INSERT]
9

Average latency, ms

8
7
6
5

hbase

4

cassandra

3

couchbase
mongodb

2
1
0
0

10000

20000

30000

40000

Throughput, ops/sec

© ALTOROS Systems | CONFIDENTIAL

13
Workload A: Update (Update 50%, Read 50%)
120
100

cassandra

80

couchbase
hbase

60

mongodb
40
20
0
0

500

1000

1500

2000

© ALTOROS Systems | CONFIDENTIAL

2500

3000
14
Workload A: Read (Update 50%, Read 50%)

80
70
60

50

cassandra
couch

40

hbase
mongo

30
20
10
0
0

500

1000

1500

2000

© ALTOROS Systems | CONFIDENTIAL

2500

3000
15
Workload B: Update (update 5% , read 95%)
120
100
80
cassandra
60

couch
hbase

40

mongo

20
0
0

500

1000

1500

© ALTOROS Systems | CONFIDENTIAL

2000

2500

16
Workload B: Read (update 5% , read 95%)
90

80
70
60
cassandra

50

couch

40

hbase

30

mongo

20
10
0
0

500

1000

1500

© ALTOROS Systems | CONFIDENTIAL

2000

2500

17
Workload C: 100% Read
80
70
60
50

cassandra

40

couch
hbase

30

mongo
20
10
0
0

500

1000

1500

2000

© ALTOROS Systems | CONFIDENTIAL

2500

3000

18
Workload D: Insert (insert 5% , read 95%)
60
50
40
cassandra
30

couch
hbase

20

mongo

10
0
0

500

1000

1500

2000

© ALTOROS Systems | CONFIDENTIAL

2500

3000

19
Workload D: Read (insert 5% , read 95%)
90
80
70
60
cassandra

50

couch

40

hbase

30

mongo

20
10
0
0

500

1000

1500

2000

© ALTOROS Systems | CONFIDENTIAL

2500

3000

20
400

Workload E: Insert (Insert 5%, Scan 95%)

350
300
250
200

cassandra

150

hbase

100
50
0

0

50

100

150

© ALTOROS Systems | CONFIDENTIAL

200

250
21
Workload F: read (Read-Modify-Write 50%, Read 50%)
80
70

60
50

cassandra

40

couch
hbase

30

mongo
20
10
0
0

500

1000

1500

© ALTOROS Systems | CONFIDENTIAL

2000

2500

22
Workload F: Update (Read-Modify-Write 50%, Read 50%)
140
120

100
cassandra

80

couch
60

hbase
mongo

40
20
0
0

500

1000

1500

© ALTOROS Systems | CONFIDENTIAL

2000

2500

23
Workload F: Read-Modify-Write (Read-Modify-Write 50%, Read 50%)
200
180
160
140
120

cassandra

100

couch

80

hbase

60

mongo

40
20
0
0

500

1000

1500

© ALTOROS Systems | CONFIDENTIAL

2000

2500

24
Workload G: Insert (Insert 90%, Read 10%)
35

30
25
cassandra

20

couch
15

hbase
mongo

10
5
0
0

1000

2000

3000

4000

5000

© ALTOROS Systems | CONFIDENTIAL

6000

7000

25
Workload G: Read (Insert 90%, Read 10%)
60
50
40
cassandra
30

couch
hbase

20

mongo

10
0
0

1000

2000

3000

4000

5000

© ALTOROS Systems | CONFIDENTIAL

6000

7000

26
•
•

•
•
•
•
•
•

© ALTOROS Systems | CONFIDENTIAL

27

More Related Content

What's hot

Building a Converged Infrastructure based on FCoE, Dell Blades and Force10 sw...
Building a Converged Infrastructure based on FCoE, Dell Blades and Force10 sw...Building a Converged Infrastructure based on FCoE, Dell Blades and Force10 sw...
Building a Converged Infrastructure based on FCoE, Dell Blades and Force10 sw...juet-y
 
Introduction to AegisSAN Q500 Series
Introduction to AegisSAN Q500 Series Introduction to AegisSAN Q500 Series
Introduction to AegisSAN Q500 Series
qsantechnology
 
GuideIT Storage Requirements Template
GuideIT Storage Requirements TemplateGuideIT Storage Requirements Template
GuideIT Storage Requirements Template
Vision Concepts Infrastructure Services Solution
 
Nexus 1000 v access guide
Nexus 1000 v access guideNexus 1000 v access guide
Nexus 1000 v access guide
networkershome
 
NCompass Live: Let's Get Real About Virtual Reality
NCompass Live: Let's Get Real About Virtual Reality NCompass Live: Let's Get Real About Virtual Reality
NCompass Live: Let's Get Real About Virtual Reality
Nebraska Library Commission
 

What's hot (6)

Building a Converged Infrastructure based on FCoE, Dell Blades and Force10 sw...
Building a Converged Infrastructure based on FCoE, Dell Blades and Force10 sw...Building a Converged Infrastructure based on FCoE, Dell Blades and Force10 sw...
Building a Converged Infrastructure based on FCoE, Dell Blades and Force10 sw...
 
Introduction to AegisSAN Q500 Series
Introduction to AegisSAN Q500 Series Introduction to AegisSAN Q500 Series
Introduction to AegisSAN Q500 Series
 
GuideIT Storage Requirements Template
GuideIT Storage Requirements TemplateGuideIT Storage Requirements Template
GuideIT Storage Requirements Template
 
Nexus 1000 v access guide
Nexus 1000 v access guideNexus 1000 v access guide
Nexus 1000 v access guide
 
NCompass Live: Let's Get Real About Virtual Reality
NCompass Live: Let's Get Real About Virtual Reality NCompass Live: Let's Get Real About Virtual Reality
NCompass Live: Let's Get Real About Virtual Reality
 
Vps hosting
Vps hostingVps hosting
Vps hosting
 

Similar to Оцениваем решения NoSQL: какая база данных подходит для вашей системы

Сергей Сверчков - Оцениваем решения NoSQL: какая база данных подходит для ваш...
Сергей Сверчков - Оцениваем решения NoSQL: какая база данных подходит для ваш...Сергей Сверчков - Оцениваем решения NoSQL: какая база данных подходит для ваш...
Сергей Сверчков - Оцениваем решения NoSQL: какая база данных подходит для ваш...
IT Share
 
The Real World - Plugging the Enterprise Into It (nodejs)
The Real World - Plugging  the Enterprise Into It (nodejs)The Real World - Plugging  the Enterprise Into It (nodejs)
The Real World - Plugging the Enterprise Into It (nodejs)
Aman Kohli
 
EC2 NoSQL Benchmarking
EC2 NoSQL BenchmarkingEC2 NoSQL Benchmarking
EC2 NoSQL Benchmarking
Altoros
 
Being HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on PurposeBeing HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on Purpose
Aman Kohli
 
eMagic-Data Center Management System
eMagic-Data Center Management SystemeMagic-Data Center Management System
eMagic-Data Center Management SystemSandesh Sonar
 
VSPEX Blue, une infrastructure hyper-convergée simple et sûre pour votre SDDC
VSPEX Blue, une infrastructure hyper-convergée simple et sûre pour votre SDDCVSPEX Blue, une infrastructure hyper-convergée simple et sûre pour votre SDDC
VSPEX Blue, une infrastructure hyper-convergée simple et sûre pour votre SDDC
RSD
 
Solarwinds NPM 10.5 webcast
Solarwinds NPM 10.5 webcastSolarwinds NPM 10.5 webcast
Solarwinds NPM 10.5 webcast
Michal Hrncirik
 
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
EMC
 
Commscope-Andrew AVA5-50FX
Commscope-Andrew AVA5-50FXCommscope-Andrew AVA5-50FX
Commscope-Andrew AVA5-50FX
savomir
 
Cisco Cloud Networking Workshop
Cisco Cloud Networking Workshop Cisco Cloud Networking Workshop
Cisco Cloud Networking Workshop
Cisco Canada
 
Commscope-Andrew LDF12-50
Commscope-Andrew LDF12-50Commscope-Andrew LDF12-50
Commscope-Andrew LDF12-50
savomir
 
3 Ways to Connect to the Oracle Cloud
3 Ways to Connect to the Oracle Cloud3 Ways to Connect to the Oracle Cloud
3 Ways to Connect to the Oracle Cloud
Simon Haslam
 
huawei-s1730s-s24t4x-a-brochure-datasheet.pdf
huawei-s1730s-s24t4x-a-brochure-datasheet.pdfhuawei-s1730s-s24t4x-a-brochure-datasheet.pdf
huawei-s1730s-s24t4x-a-brochure-datasheet.pdf
Hi-Network.com
 
Commscope-Andrew LDF5-50A
Commscope-Andrew LDF5-50ACommscope-Andrew LDF5-50A
Commscope-Andrew LDF5-50A
savomir
 
Datacenter 2014: Commscope - Arne Keller
Datacenter 2014: Commscope - Arne KellerDatacenter 2014: Commscope - Arne Keller
Datacenter 2014: Commscope - Arne Keller
Mediehuset Ingeniøren Live
 
Cooking security sans@night
Cooking security sans@nightCooking security sans@night
Cooking security sans@nightjtimberman
 
Emc vnx2 technical deep dive workshop
Emc vnx2 technical deep dive workshopEmc vnx2 technical deep dive workshop
Emc vnx2 technical deep dive workshop
solarisyougood
 
2. Seamless Surveillance with Juniper networks.pdf
2. Seamless Surveillance with Juniper networks.pdf2. Seamless Surveillance with Juniper networks.pdf
2. Seamless Surveillance with Juniper networks.pdf
PawachMetharattanara
 
JomaSoft VDCF - Solaris Private Cloud
JomaSoft VDCF - Solaris Private CloudJomaSoft VDCF - Solaris Private Cloud
JomaSoft VDCF - Solaris Private Cloud
JomaSoft
 

Similar to Оцениваем решения NoSQL: какая база данных подходит для вашей системы (20)

Сергей Сверчков - Оцениваем решения NoSQL: какая база данных подходит для ваш...
Сергей Сверчков - Оцениваем решения NoSQL: какая база данных подходит для ваш...Сергей Сверчков - Оцениваем решения NoSQL: какая база данных подходит для ваш...
Сергей Сверчков - Оцениваем решения NoSQL: какая база данных подходит для ваш...
 
The Real World - Plugging the Enterprise Into It (nodejs)
The Real World - Plugging  the Enterprise Into It (nodejs)The Real World - Plugging  the Enterprise Into It (nodejs)
The Real World - Plugging the Enterprise Into It (nodejs)
 
EC2 NoSQL Benchmarking
EC2 NoSQL BenchmarkingEC2 NoSQL Benchmarking
EC2 NoSQL Benchmarking
 
Being HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on PurposeBeing HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on Purpose
 
eMagic-Data Center Management System
eMagic-Data Center Management SystemeMagic-Data Center Management System
eMagic-Data Center Management System
 
VSPEX Blue, une infrastructure hyper-convergée simple et sûre pour votre SDDC
VSPEX Blue, une infrastructure hyper-convergée simple et sûre pour votre SDDCVSPEX Blue, une infrastructure hyper-convergée simple et sûre pour votre SDDC
VSPEX Blue, une infrastructure hyper-convergée simple et sûre pour votre SDDC
 
Solarwinds NPM 10.5 webcast
Solarwinds NPM 10.5 webcastSolarwinds NPM 10.5 webcast
Solarwinds NPM 10.5 webcast
 
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
Commscope-Andrew AVA5-50FX
Commscope-Andrew AVA5-50FXCommscope-Andrew AVA5-50FX
Commscope-Andrew AVA5-50FX
 
Mobility switch security architecture scott calzia madani adjali
Mobility switch security architecture scott calzia madani adjaliMobility switch security architecture scott calzia madani adjali
Mobility switch security architecture scott calzia madani adjali
 
Cisco Cloud Networking Workshop
Cisco Cloud Networking Workshop Cisco Cloud Networking Workshop
Cisco Cloud Networking Workshop
 
Commscope-Andrew LDF12-50
Commscope-Andrew LDF12-50Commscope-Andrew LDF12-50
Commscope-Andrew LDF12-50
 
3 Ways to Connect to the Oracle Cloud
3 Ways to Connect to the Oracle Cloud3 Ways to Connect to the Oracle Cloud
3 Ways to Connect to the Oracle Cloud
 
huawei-s1730s-s24t4x-a-brochure-datasheet.pdf
huawei-s1730s-s24t4x-a-brochure-datasheet.pdfhuawei-s1730s-s24t4x-a-brochure-datasheet.pdf
huawei-s1730s-s24t4x-a-brochure-datasheet.pdf
 
Commscope-Andrew LDF5-50A
Commscope-Andrew LDF5-50ACommscope-Andrew LDF5-50A
Commscope-Andrew LDF5-50A
 
Datacenter 2014: Commscope - Arne Keller
Datacenter 2014: Commscope - Arne KellerDatacenter 2014: Commscope - Arne Keller
Datacenter 2014: Commscope - Arne Keller
 
Cooking security sans@night
Cooking security sans@nightCooking security sans@night
Cooking security sans@night
 
Emc vnx2 technical deep dive workshop
Emc vnx2 technical deep dive workshopEmc vnx2 technical deep dive workshop
Emc vnx2 technical deep dive workshop
 
2. Seamless Surveillance with Juniper networks.pdf
2. Seamless Surveillance with Juniper networks.pdf2. Seamless Surveillance with Juniper networks.pdf
2. Seamless Surveillance with Juniper networks.pdf
 
JomaSoft VDCF - Solaris Private Cloud
JomaSoft VDCF - Solaris Private CloudJomaSoft VDCF - Solaris Private Cloud
JomaSoft VDCF - Solaris Private Cloud
 

More from Olga Lavrentieva

15 10-22 altoros-fact_sheet_st_v4
15 10-22 altoros-fact_sheet_st_v415 10-22 altoros-fact_sheet_st_v4
15 10-22 altoros-fact_sheet_st_v4
Olga Lavrentieva
 
Сергей Ковалёв (Altoros): Practical Steps to Improve Apache Hive Performance
Сергей Ковалёв (Altoros): Practical Steps to Improve Apache Hive PerformanceСергей Ковалёв (Altoros): Practical Steps to Improve Apache Hive Performance
Сергей Ковалёв (Altoros): Practical Steps to Improve Apache Hive Performance
Olga Lavrentieva
 
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности CassandraАндрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Olga Lavrentieva
 
Владимир Иванов (Oracle): Java: прошлое и будущее
Владимир Иванов (Oracle): Java: прошлое и будущееВладимир Иванов (Oracle): Java: прошлое и будущее
Владимир Иванов (Oracle): Java: прошлое и будущее
Olga Lavrentieva
 
Brug - Web push notification
Brug  - Web push notificationBrug  - Web push notification
Brug - Web push notification
Olga Lavrentieva
 
Александр Ломов: "Reactjs + Haskell + Cloud Foundry = Love"
Александр Ломов: "Reactjs + Haskell + Cloud Foundry = Love"Александр Ломов: "Reactjs + Haskell + Cloud Foundry = Love"
Александр Ломов: "Reactjs + Haskell + Cloud Foundry = Love"
Olga Lavrentieva
 
Максим Жилинский: "Контейнеры: под капотом"
Максим Жилинский: "Контейнеры: под капотом"Максим Жилинский: "Контейнеры: под капотом"
Максим Жилинский: "Контейнеры: под капотом"
Olga Lavrentieva
 
Александр Протасеня: "PayPal. Различные способы интеграции"
Александр Протасеня: "PayPal. Различные способы интеграции"Александр Протасеня: "PayPal. Различные способы интеграции"
Александр Протасеня: "PayPal. Различные способы интеграции"
Olga Lavrentieva
 
Сергей Черничков: "Интеграция платежных систем в .Net приложения"
Сергей Черничков: "Интеграция платежных систем в .Net приложения"Сергей Черничков: "Интеграция платежных систем в .Net приложения"
Сергей Черничков: "Интеграция платежных систем в .Net приложения"
Olga Lavrentieva
 
Антон Шемерей «Single responsibility principle в руби или почему instanceclas...
Антон Шемерей «Single responsibility principle в руби или почему instanceclas...Антон Шемерей «Single responsibility principle в руби или почему instanceclas...
Антон Шемерей «Single responsibility principle в руби или почему instanceclas...
Olga Lavrentieva
 
Егор Воробьёв: «Ruby internals»
Егор Воробьёв: «Ruby internals»Егор Воробьёв: «Ruby internals»
Егор Воробьёв: «Ruby internals»
Olga Lavrentieva
 
Андрей Колешко «Что не так с Rails»
Андрей Колешко «Что не так с Rails»Андрей Колешко «Что не так с Rails»
Андрей Колешко «Что не так с Rails»
Olga Lavrentieva
 
Дмитрий Савицкий «Ruby Anti Magic Shield»
Дмитрий Савицкий «Ruby Anti Magic Shield»Дмитрий Савицкий «Ruby Anti Magic Shield»
Дмитрий Савицкий «Ruby Anti Magic Shield»
Olga Lavrentieva
 
Сергей Алексеев «Парное программирование. Удаленно»
Сергей Алексеев «Парное программирование. Удаленно»Сергей Алексеев «Парное программирование. Удаленно»
Сергей Алексеев «Парное программирование. Удаленно»
Olga Lavrentieva
 
«Почему Spark отнюдь не так хорош»
«Почему Spark отнюдь не так хорош»«Почему Spark отнюдь не так хорош»
«Почему Spark отнюдь не так хорош»
Olga Lavrentieva
 
«Cassandra data modeling – моделирование данных для NoSQL СУБД Cassandra»
«Cassandra data modeling – моделирование данных для NoSQL СУБД Cassandra»«Cassandra data modeling – моделирование данных для NoSQL СУБД Cassandra»
«Cassandra data modeling – моделирование данных для NoSQL СУБД Cassandra»
Olga Lavrentieva
 
«Практика построения высокодоступного решения на базе Cloud Foundry Paas»
«Практика построения высокодоступного решения на базе Cloud Foundry Paas»«Практика построения высокодоступного решения на базе Cloud Foundry Paas»
«Практика построения высокодоступного решения на базе Cloud Foundry Paas»
Olga Lavrentieva
 
«Дизайн продвинутых нереляционных схем для Big Data»
«Дизайн продвинутых нереляционных схем для Big Data»«Дизайн продвинутых нереляционных схем для Big Data»
«Дизайн продвинутых нереляционных схем для Big Data»
Olga Lavrentieva
 
«Обзор возможностей Open cv»
«Обзор возможностей Open cv»«Обзор возможностей Open cv»
«Обзор возможностей Open cv»Olga Lavrentieva
 
«Нужно больше шин! Eventbus based framework vertx.io»
«Нужно больше шин! Eventbus based framework vertx.io»«Нужно больше шин! Eventbus based framework vertx.io»
«Нужно больше шин! Eventbus based framework vertx.io»
Olga Lavrentieva
 

More from Olga Lavrentieva (20)

15 10-22 altoros-fact_sheet_st_v4
15 10-22 altoros-fact_sheet_st_v415 10-22 altoros-fact_sheet_st_v4
15 10-22 altoros-fact_sheet_st_v4
 
Сергей Ковалёв (Altoros): Practical Steps to Improve Apache Hive Performance
Сергей Ковалёв (Altoros): Practical Steps to Improve Apache Hive PerformanceСергей Ковалёв (Altoros): Practical Steps to Improve Apache Hive Performance
Сергей Ковалёв (Altoros): Practical Steps to Improve Apache Hive Performance
 
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности CassandraАндрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
 
Владимир Иванов (Oracle): Java: прошлое и будущее
Владимир Иванов (Oracle): Java: прошлое и будущееВладимир Иванов (Oracle): Java: прошлое и будущее
Владимир Иванов (Oracle): Java: прошлое и будущее
 
Brug - Web push notification
Brug  - Web push notificationBrug  - Web push notification
Brug - Web push notification
 
Александр Ломов: "Reactjs + Haskell + Cloud Foundry = Love"
Александр Ломов: "Reactjs + Haskell + Cloud Foundry = Love"Александр Ломов: "Reactjs + Haskell + Cloud Foundry = Love"
Александр Ломов: "Reactjs + Haskell + Cloud Foundry = Love"
 
Максим Жилинский: "Контейнеры: под капотом"
Максим Жилинский: "Контейнеры: под капотом"Максим Жилинский: "Контейнеры: под капотом"
Максим Жилинский: "Контейнеры: под капотом"
 
Александр Протасеня: "PayPal. Различные способы интеграции"
Александр Протасеня: "PayPal. Различные способы интеграции"Александр Протасеня: "PayPal. Различные способы интеграции"
Александр Протасеня: "PayPal. Различные способы интеграции"
 
Сергей Черничков: "Интеграция платежных систем в .Net приложения"
Сергей Черничков: "Интеграция платежных систем в .Net приложения"Сергей Черничков: "Интеграция платежных систем в .Net приложения"
Сергей Черничков: "Интеграция платежных систем в .Net приложения"
 
Антон Шемерей «Single responsibility principle в руби или почему instanceclas...
Антон Шемерей «Single responsibility principle в руби или почему instanceclas...Антон Шемерей «Single responsibility principle в руби или почему instanceclas...
Антон Шемерей «Single responsibility principle в руби или почему instanceclas...
 
Егор Воробьёв: «Ruby internals»
Егор Воробьёв: «Ruby internals»Егор Воробьёв: «Ruby internals»
Егор Воробьёв: «Ruby internals»
 
Андрей Колешко «Что не так с Rails»
Андрей Колешко «Что не так с Rails»Андрей Колешко «Что не так с Rails»
Андрей Колешко «Что не так с Rails»
 
Дмитрий Савицкий «Ruby Anti Magic Shield»
Дмитрий Савицкий «Ruby Anti Magic Shield»Дмитрий Савицкий «Ruby Anti Magic Shield»
Дмитрий Савицкий «Ruby Anti Magic Shield»
 
Сергей Алексеев «Парное программирование. Удаленно»
Сергей Алексеев «Парное программирование. Удаленно»Сергей Алексеев «Парное программирование. Удаленно»
Сергей Алексеев «Парное программирование. Удаленно»
 
«Почему Spark отнюдь не так хорош»
«Почему Spark отнюдь не так хорош»«Почему Spark отнюдь не так хорош»
«Почему Spark отнюдь не так хорош»
 
«Cassandra data modeling – моделирование данных для NoSQL СУБД Cassandra»
«Cassandra data modeling – моделирование данных для NoSQL СУБД Cassandra»«Cassandra data modeling – моделирование данных для NoSQL СУБД Cassandra»
«Cassandra data modeling – моделирование данных для NoSQL СУБД Cassandra»
 
«Практика построения высокодоступного решения на базе Cloud Foundry Paas»
«Практика построения высокодоступного решения на базе Cloud Foundry Paas»«Практика построения высокодоступного решения на базе Cloud Foundry Paas»
«Практика построения высокодоступного решения на базе Cloud Foundry Paas»
 
«Дизайн продвинутых нереляционных схем для Big Data»
«Дизайн продвинутых нереляционных схем для Big Data»«Дизайн продвинутых нереляционных схем для Big Data»
«Дизайн продвинутых нереляционных схем для Big Data»
 
«Обзор возможностей Open cv»
«Обзор возможностей Open cv»«Обзор возможностей Open cv»
«Обзор возможностей Open cv»
 
«Нужно больше шин! Eventbus based framework vertx.io»
«Нужно больше шин! Eventbus based framework vertx.io»«Нужно больше шин! Eventbus based framework vertx.io»
«Нужно больше шин! Eventbus based framework vertx.io»
 

Recently uploaded

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 

Recently uploaded (20)

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 

Оцениваем решения NoSQL: какая база данных подходит для вашей системы

  • 2. ORDER Order ID: 1001 Order Date: 15.9.2012 Customer    Billing Address Street: Somestreet 10 City: Somewhere Postal Code: 55901  ADDRESS Line Items Quantity Price Ipod Touch 1 220.95 Monster Beat 2 190.00 Apple Mouse 1 69.90 Name  CUSTOMER First Name: Peter Last Name: Sample ORDER_LINES     © ALTOROS Systems | CONFIDENTIAL 2
  • 5. •   • Workload is defined by different distributions   • Operations of the following types:     © ALTOROS Systems | CONFIDENTIAL 5
  • 7. © ALTOROS Systems | CONFIDENTIAL 7
  • 8. © ALTOROS Systems | CONFIDENTIAL 8
  • 9. •  Single availability zone eu-west-1b, Ireland region  Single security group with all required port opened  4 m1.xlarge 64bit instances for cluster nodes: 16GB RAM, 4 vCPU, 8 ECU, highperformance network  1 c1.xlarge 64bit instance for YSCB client: 7GB RAM, 8 vCPU, 20 ECU, highperformance network  2 additional c1.medium 64bit instances for mongo routers: 1.7GB RAM, 2 vCPU, 5 ECU, moderate network •  4 EBS volumes by 25 GB each in RAID0  EBS optimized volumes, no Provisioned IOPS © ALTOROS Systems | CONFIDENTIAL 9
  • 10. •  partitioner: org.apache.cassandra.dht.Murmur3Partitioner  key_cache_size_in_mb: 1024  row_cache_size_in_mb: 6096  JVM heap size: 6GB  Snappy compressor  Replica factor 1 •  2 c1.medium nodes with mongo router process - mongos  Replica factor 1  Sharding by internal key “_id” © ALTOROS Systems | CONFIDENTIAL 10
  • 11. •  Replica factor 1  Memory + disk mode •  JVM heap size 12GB  Replica factor 1  Snappy compressor © ALTOROS Systems | CONFIDENTIAL 11
  • 12. Performance of the systems was evaluated under different workloads:       © ALTOROS Systems | CONFIDENTIAL 12
  • 13. Load phase, 100.000.000 records * 1 KB, [INSERT] 9 Average latency, ms 8 7 6 5 hbase 4 cassandra 3 couchbase mongodb 2 1 0 0 10000 20000 30000 40000 Throughput, ops/sec © ALTOROS Systems | CONFIDENTIAL 13
  • 14. Workload A: Update (Update 50%, Read 50%) 120 100 cassandra 80 couchbase hbase 60 mongodb 40 20 0 0 500 1000 1500 2000 © ALTOROS Systems | CONFIDENTIAL 2500 3000 14
  • 15. Workload A: Read (Update 50%, Read 50%) 80 70 60 50 cassandra couch 40 hbase mongo 30 20 10 0 0 500 1000 1500 2000 © ALTOROS Systems | CONFIDENTIAL 2500 3000 15
  • 16. Workload B: Update (update 5% , read 95%) 120 100 80 cassandra 60 couch hbase 40 mongo 20 0 0 500 1000 1500 © ALTOROS Systems | CONFIDENTIAL 2000 2500 16
  • 17. Workload B: Read (update 5% , read 95%) 90 80 70 60 cassandra 50 couch 40 hbase 30 mongo 20 10 0 0 500 1000 1500 © ALTOROS Systems | CONFIDENTIAL 2000 2500 17
  • 18. Workload C: 100% Read 80 70 60 50 cassandra 40 couch hbase 30 mongo 20 10 0 0 500 1000 1500 2000 © ALTOROS Systems | CONFIDENTIAL 2500 3000 18
  • 19. Workload D: Insert (insert 5% , read 95%) 60 50 40 cassandra 30 couch hbase 20 mongo 10 0 0 500 1000 1500 2000 © ALTOROS Systems | CONFIDENTIAL 2500 3000 19
  • 20. Workload D: Read (insert 5% , read 95%) 90 80 70 60 cassandra 50 couch 40 hbase 30 mongo 20 10 0 0 500 1000 1500 2000 © ALTOROS Systems | CONFIDENTIAL 2500 3000 20
  • 21. 400 Workload E: Insert (Insert 5%, Scan 95%) 350 300 250 200 cassandra 150 hbase 100 50 0 0 50 100 150 © ALTOROS Systems | CONFIDENTIAL 200 250 21
  • 22. Workload F: read (Read-Modify-Write 50%, Read 50%) 80 70 60 50 cassandra 40 couch hbase 30 mongo 20 10 0 0 500 1000 1500 © ALTOROS Systems | CONFIDENTIAL 2000 2500 22
  • 23. Workload F: Update (Read-Modify-Write 50%, Read 50%) 140 120 100 cassandra 80 couch 60 hbase mongo 40 20 0 0 500 1000 1500 © ALTOROS Systems | CONFIDENTIAL 2000 2500 23
  • 24. Workload F: Read-Modify-Write (Read-Modify-Write 50%, Read 50%) 200 180 160 140 120 cassandra 100 couch 80 hbase 60 mongo 40 20 0 0 500 1000 1500 © ALTOROS Systems | CONFIDENTIAL 2000 2500 24
  • 25. Workload G: Insert (Insert 90%, Read 10%) 35 30 25 cassandra 20 couch 15 hbase mongo 10 5 0 0 1000 2000 3000 4000 5000 © ALTOROS Systems | CONFIDENTIAL 6000 7000 25
  • 26. Workload G: Read (Insert 90%, Read 10%) 60 50 40 cassandra 30 couch hbase 20 mongo 10 0 0 1000 2000 3000 4000 5000 © ALTOROS Systems | CONFIDENTIAL 6000 7000 26

Editor's Notes

  1. Often referred to as NoSQL, non-relational databases feature elasticity and scalability. In addition, they can store big data and work with cloud computing systems. All of these factors make them extremely popular.
  2. Why did NoSQL data stores appear? Mostly because relational databases (RDBMS) have a number of disadvantages, if you have to work with large datasets.For example, RDBMS are hard to scale and their architecture is designed to work on a single machine. - Scaling write operations is either hard, expensive, or impossible.- Vertical scaling (or upgrading equipment) is either limited or very expensive. Unfortunately, this is often the only possible way you can scale.- Horizontal scaling (or adding new nodes to the cluster) is either unavailable or you can only implement it partially. There are some solutions from Oracle and Microsoft that make it possible to have computing instances on several servers. Still, the database itself remains in shared storage.In addition to poor scalability, RDBMS have strict schemas. The schema is created together with the database and you will need a lot of time and effort to change this structure. In most cases it is an extremely complex task. Apart from that, RDBMS have difficulties with semistructured data.There is also another peculiarity of RDBMS. For the relational model to be normalized, real-life objects are usually divided to be stored as several items. This is called object-relational impedance mismatch. ______________________________( ORIGINAL SLIDE COMMENTS: Relational databases provide many advantages, but they are by no means perfect. Even from their early days, there have been lots of frustrations with them.For application developers, the biggest frustration has been what’s commonly called the impedance mismatch (сопротивление согласованию): the difference between the relational model and the in-memory data structures. The relational data model organizes data into a structure of tables and rows, or more properly, relations and tuples. In the relational model, a tuple[ˈtjuːp(ə)l] is a set of name-value pairs and a relation is a set of tuples. (The relational definition of a tuple is slightly different from that in mathematics and many programming languages with a tuple data type, where a tuple is a sequence of values.) All operationsin SQL consume and return relations, which leads to the mathematically elegant relational algebra. This foundation on relations provides a certain elegance and simplicity, but it also introduceslimitations. In particular, the values in a relational tuple have to be simple—they cannot contain any structure, such as a nested record or a list. This limitation isn’t true for in-memory data structures,which can take on much richer structures than relations. As a result, if you want to use a richer in memory data structure, you have to translate it to a relational representation to store it on disk. Hencethe impedance mismatch—two different representations that require translationImpedance mismatch has been made much easier to deal with by the wide availability of object-relationalmapping frameworks, such as Hibernate and iBATIS that implement well-known mappingpatterns, but the mapping problem is still an issue. Object-relational mappingframeworks remove a lot of grunt work, but can become a problem of their own when people try toohard to ignore the database and query performance suffers.)
  3. NoSQL solutions address many of these problems.POINT 1: In 2013, the number of NoSQL products reached 150+ and the figure is still growing. That variety makes it difficult to select the best tool for a particular case.POINT 2: They come in many types--key-value, columnar, document-oriented, and graph.POINT 3: There is one thing in common for all NoSQL databases. They don't use the relational data model. This means they do not use the SQL query language.POINT 4: NoSQL data management systems are inherently schema-free (with no obsessive complexity and a flexible data model) and eventually consistent (complying with BASE rather than ACID)POINT 5: They provide APIs to perform various operations. Some of NoSQL data stores support query language operations, for example, Cassandra and Hbase. However, there is no standard. This is another difference between NoSQL databases and traditional RDBMS.POINT 6: RDBMS usually have strong data consistency. This feature is implemented with different mechanisms. In contrast to that, NoSQL data stores operate with eventual consistency. When you add data to the system, it becomes consistent after some time. This means there is a certain risk that an operation will not be completed and the data will remain inconsistent.POINT 7: NoSQL architectures can work as clusters that consist of several nodes. This makes it possible to scale them horizontally by increasing the number of nodes.In addition, NoSQL data stores serve huge amounts of data and provide high throughput.
  4. POINT 1: NoSQL databases differ from RDBMS in their data models. These systems can be divided into 4 groups:A. Key Value StoresKey value stores are similar to maps or dictionaries where data is addressed by a unique key.B. Document StoresDocument Stores encapsulate key value pairs in JSON or JSON like documents. Within documents, keys have to be unique. In contrast to key-value stores, values are not opaque to the system and can be queried as well. Therefore, complex data structures like nested objects can be handled more conveniently. Storing data in interpretable JSON documents has the additional advantage of supporting data types, which makes document stores very developer-friendly.C. Column Family StoresColumn Family Stores are also known as column oriented stores, extensible record stores and wide columnar stores.D. Graph databasesKey-value stores, document stores, and column family stores have a common feature. They do store denormalized data in order to gain advantages in distribution.In contrast to relational databases and the already introduced key oriented NoSQL databases, graph databases are specialized on efficient management of heavily linked data.POINT 2: NoSQL databases differ strongly in their offered query functionalities. Besides considering the supported data model and how it influences queries on specific attributes, it is necessary to have a closer look at the offered interfaces in order to find a suitable database for a specific use case. If a simple, language unspecific API is required, REST interfaces can be a suitable solution, especially for web applications, whereas performance critical queries should be exchanged over language specific APls which are available for nearly every common programming language like Java. Query languages offer a higher abstraction level in order to reduce complexity. Therefore, their use is very helpful when more complicated queries should be handled. If calculation intensive queries over large datasets are required, MapReduce frameworks should be used.POINT 3: Multiversion concurrency control (MVCC) relaxes strict consistency in favor of performance. Concurrent access is not managed with locks but by organization of many unmodifiable chronological ordered versions. In order to support transactions without reserving multiple datasets for exclusive access, optimistic locking is provided by many stores. Before changed data is committed, each transaction checks, whether another transactions made any conflicting modifications to the same datasets.POINT 4: NoSQL databases differ in the way they distribute data on multiple machines. Since data models of key-value stores, document stores and column family stores are key oriented, the two common partition strategies are based on keys, too.The first strategy distributes datasets by the range of their keys. A routing server splits the whole keyset into blocks and allocates these blocks to different nodes. Afterwards, one node is responsible for storage and request handling of his specific key ranges. In order to find a certain key, clients have to contact the routing server for getting the partition table.Higher availability and much simpler cluster architecture can be achieved with the second distributionstrategy called consistent hashing. In this shared nothing architecture, there exists no single point of failure. In contrast to range based partitioning, keys are distributed by using hash functions. Since every server is responsible for a certain hash region, addresses of certain keys within the cluster can be calculated very fast. Good hash functions distribute keys intuitively even wherefore an additional load balancer is not required.In addition to better read performance through load balancing, replication also brings better availability and durability, because failing nodes can be replaced by other servers. Since distributed databases should be able to cope with temporary node and network failures, only full availability or full consistency can be guaranteed at one time in distributed systems. If all replicas of a master server were updated synchronously, the system would not be available until all slaves had committed a write operation. Ifmessages got lost due to network problems, the system would not be available for a longer period of time. This solution is not suitable for platforms that rely on high availability, because even a few milliseconds of latency can have a big influence on user behavior.POINT 5: (PERFORMANCE: TYPICAL WORKLOADS)Obviously, performance is a very important factor. Performance of data storage solutions can be evaluated using typical scenarios. These scenarios simulate the most common operations performed by applications that use the data store, also known as typical workloads. The tests that we performed to compare performance of several NoSQL data stores also used typical workloads.
  5. Database vendors usually measure productivity of their products with custom hardware and software settings designed to demonstrate the advantages of their solutions. In our tests we tried to see how NoSQL data stores perform under the same conditions.POINT 1: For benchmarking, we used the Yahoo Cloud Serving Benchmark (YCSB)The kernel of YCSB has a a framework with a workload generator that creates test workload and a set of workload scenarios.POINT 2: Developers need to describe the scenario of the workload by operation type: what operations are performed on what types of records. POINT 3: Supported operations include: insert, add new, update (change one of the fields), read (one random field or all the field of one record), and scan (read the records in the order of the key starting from the selected record).We can define the workload by the data that will be loaded into the database during the loading phase and the operations that will be executed against the data set during the transaction phase.Typically, a workload is a combination of: Workload java class (subclass of com.yahoo.ycsb.Workload) Parameter file (in the Java Properties format)Because the properties of the dataset must be known during the loading phase (so that the proper kind of record can be constructed and inserted) and during the transaction phase (so that the correct record IDs and fields can be referred to) a single set of properties is shared among both phases. Thus the parameter file is used in both phases. The workload java class uses those properties to either insert records (the loading phase) or execute transactions against those records (the transaction phase). We have measured database performance under several types of workloads. Each workload was defined by different distributions assigned to the two main choices:- which operation to perform- which record to read or write Operations against a data store were randomly selected and could be of the following types:Insert: Inserts a new record.Update: Updates a record by replacing the value of one field.Read: Reads a record, either one randomly selected field, or all fields.Scan: Scans records in order, starting at a randomly selected record key. The number of records to scan is also selected randomly from the range between 1 and 100.
  6. Each workload was targeted at a table of 100,000,000 records; each record was 1,000 bytes in size and contained 10 fields. A primary key identified each record, which was a string, such as “user234123.” Each field was named field0, field1, and so on. The values in each field were random strings of ASCII characters, 100 bytes each. Database performance was defined by the speed at which a database computed basic operations. A basic operation is an action performed by the workload executor, which drives multiple client threads. Each thread executes a sequential series of operations by making calls to the database interface layer both to load the database (the load phase) and to execute the workload (the transaction phase). The threads throttle the rate at which they generate requests, so that we may directly control the offered load against the database. In addition, the threads measure the latency and achieved throughput of their operations and report these measurements to the statistics module.-- The tests:We defined the following values for the workload executor:the number of threadsthe types of operations in the workload and the desired number of operations per second (target throughput)Then we measured the time it took to perform these transactions (latency). Performance of a database was calculated as the time it took for the client application to perform the transactions (client – db – client).Each client thread does the same transaction and the threads work in parallel.The resulting values reflected how latencies changed as we increased the workload.
  7. This is a component diagram of the YCSB framework. It consists of several modules.Workload executor applies the workload to the data store. For each session, when the client accesses the DB, a client thread is initiated. Each thread performs a set of operations from the workload. The results in the form of statistics are then sent to the statistics module, which prints the output of the test to console where benchmark is started. These tests are consequently repeated for all the selected solutions.The YCSB framework has connectors for a wide range of DBs. For each database tested with YCSB, a developer needs to determine the type of DB for the connector, target throughput, the number of concurrent threads on the client side, and how many operations we want to perform. This is necessary to create and start a test.
  8. (смотри комментарии к предыдущему слайду)
  9. Now let's take a look at the NoSQL data stores that we tested:Cassandra 2.0: This is a column-value data store. We ran it on a virtual machine with Java 1.7.40 installed. The transactions were performed with the non-default configuration. In particular, we used a random partitioner to section data by nodes. The amount of data cash for the keys was 1 GB. The size of row cash was 6 GB. The size of JVM heap was 6 GB. Data was not replicated (there were no copies). This approach was intentional, we wanted to test performance, not failure tolerance of the cluster. MongoDB: This is a document-oriented DB. Here, we didn't do much additional configuration or tuning. As I have mentioned before, for Mongo, we added two VMs that served as routers because according to official documentation, the mongo router process should run on a separate machine. However, if you need to simplify the model, mongo router may run on the same machine, where the YCSB client is. However, in one of our earlier tests, we discovered that it uses a lot of CPU power. This is why it should be placed on a separate machine. Data sharding for MongoDB was based on document key.
  10. We used the following workloads: Workload A: Update-heavily mode. Workload A is an update-heavily scenario that simulates how a database works, when recording typical actions of an e-commerce solution user.Settings for the workload: Read/update ratio: 50/50Zipfian request distributionWorkload BWorkload B is a read-mostly workload that has a 95/5 (ninety five to five percent) read/update ratio. It recaps content tagging, when adding a tag is an update, but most operations include reading tags.Workload CWorkload C is a read-only workload that simulates a data caching layer, for example a user profile cache.Workload D Workload D has 95/5 read/insert ratio. The workload simulates access to the latest data, such as user status updates or working with inbox messages first.Workload EWorkload E is a scan-short-ranges workload with a scan/insert percentile proportion of 95/5. It corresponds to threaded conversations that are clustered by a thread ID. Each scan is performed for the posts of a given thread.Workload F Workload F has read-modify-write/read ops in a proportion of 50/50. It simulates access to user database, where user records are read and modified by the user. User activity is also recorded to this database.Workload G Workload G has a 10/90 read/insert ratio. It simulates data migration process or highly intensive data creation.
  11. The first test was the load phase. We uploaded the selected data of 100 mln records of 1 kb each to the data store. Here we measured average performance by ops per seconds and the latency, the time required to perform a single operation.*russian text aboveHbase had the lowest performance because we turned on auto-flash mode. This mode guarantees that the operation of creating a record will be sent from the client to a server and will be persisted to the database. There is also an alternative mode supported by Hbase, when additional cash is generated on the client side. When the client is out of storage space it sends this cash to the server. This makes it possible to persist data to disk in batches.As expected, Couchbase and Cassandra had good results. Cassandra updates data in the memory and writes it to the transaction journal on disk simultaneously. Couchbase writes data to memory and then asynchronously persists it to disc. The result of the transaction returns after everything has been saved to memory. In this test, data was loaded in a single iteration. In contrast, insert, update, and read operations were performed in several iterations. We measured the number of ops per second under this workload. The workload was generated based on a target throughput and what we measured was the actual throughput of the database. We also measured the latency, or how long it took for each operation to be performed. -------------------------------On many of the diagrams that you will see later, DB performance is limited and starts to decline at certain throughput levels. It is important to take into account that the results might have been influenced by the fact that we used AWS and network storage. So these values might differ, if you use physical hardware.
  12. The last workload G consisted mostly of insert ops. It simulates the process of data migration or when a lot of data is created. Insert operations, similarly to what we saw on the previous graph, were best performed by Hbase, Cassandra and Hbase. These DBs had low latencies and high throughput. For MongoDB, the performance was capped at about 4000 ops per second with an average latency of up to five times greater than in other databases.
  13. The last diagram shows the results of read operations that make up 10% of workload G. The distribution of latencies here is uneven for all the solutions. Possibly, this is due to the fact that the data is in network storage in the cloud. At the same time, Hbase and Cassandra show a maximum throughput of up to 6000 operations per second.
  14. What you choose depends on your needs. You must answer the following questions before choosing: determine what your datasets will be like and your data model. The data model will depend on the datasets and typical operations that your app will perform; determine the requirements to transaction support or lack of this support; decide whether you need the transactions; decide whether you need replication, and what requirements you have to data consistency; determine your performance requirements (how fast your DB should be). Next, if your project is created based on an existing solution, you should see if data migration is possible, this may influence your choice, as well.Then, taking into account these factors, evaluate different solutions, and test their performance (that’s what this presentation was all about). It is very useful to build a prototype and perform proof of concept. Then based on this prototype, you can select the solution for your system. Prototyping makes it possible to do a real evaluation of how the solution/approach will work in a real-life project. If it doesn’t work well enough, you need to review the architecture, components, and build a new prototype. This means there are no perfect solutions and there are no bad NoSQL or RDBMS data stores. The solution and its implementation depends on a particular situation. The tests we have performed show that in different use cases, different solutions have very different results. Your final choice might be a compromise. The main determinant will be what you want to achieve and what properties you need most. For instance, max performance or consistency, etc.