0
NewSQL Overview
Ivan Glushkov
ivan.glushkov@gmail.com
@gliush
About myself
• MIPT
• MCST, Elbrus compiler project
• Echo, real-time social platform
(PaaS)
History of SQL
• Relational Model in 1970:

disk-oriented, row, sql
• “One size fits all” doesn’t
work:
– Column-oriented ...
Startups lifecycle
• Start: no money, no users,
open source
• Middle: more users, storage
optimization
• Final: plenty of ...
Sharding in application
• Additional logic in application
• Difficulties with cross-sharding
transactions
• More servers t...
New requirements
• Huge and growing data sets

9M msg per hour in Facebook

50M msg per day in Twitter
• Generated by devi...
Architecture change
Client Side
Server Side
Cloud Storage
Client Side
Server Side
Database
Architecture change
• ‘P’ in CAP is not discrete
• ACID vs BASE
• Managing partitions: detection,
limitations in operation...
NoSQL
• CAP: first ‘A’, then ‘C’
• Horizontal scaling
• Custom API
• Schemaless
• Types: Key-Value, Document,
Graph, …
NewSQL: definition
“A DBMS that delivers the scalability !
and flexibility promised by NoSQL!
while retaining the support f...
NewSQL: definition
❖ SQL as the primary interface!
❖ ACID support for transactions!
❖ Non-locking concurrency control!
❖ H...
Shared nothing
• Each node is independent and
self-sufficient
• No shared memory or disk
• Scale infinitely
• Data partiti...
Column-oriented DBMS
• Store content by column
• Efficient in hard disk access
• Good for sparse data
• Higher data compre...
• Useful work is
only 12%
• By Stonebraker
and research
group
Traditional DBMS
12%
10%
11%
18%
20%
29%
In-memory storage
• High throughput
• Low latency
• No Buffer Management
• If serialized, no Locking or
Latching
In-memory storage
• Amazon price reduction
!
!
!
!
• Current price for 1TB: $14K
NewSQL: categories
• New approaches: VoltDB,
Clustrix, NuoDB
• New storage engines: TokuDB,
ScaleDB
• Transparent clusteri...
NuoDB
• Multi-tier architecture:
– Administrative: managing,
stats, cli, web-ui
– Transactional: ACID except
‘D’, cache
– ...
NuoDB
• Everything is an ‘Atom’
• P2P, encrypted sessions
• MVCC + Append-only storage
YCSB
• Yahoo Cloud Serving
Benchmark
• Key-value: 

insert/read/update/scan
• Measures
– Performance
– Scaling
NuoDB: YCSB
Throughput, tps/nodes
0
275 000
550 000
825 000
1 100 000
1 2 4 8 16 24
Update latency, s
0
25
50
1 2 4 8 16 2...
VoltDB
• In-memory storage
• Stored procedure interface
• Serializing all data access
• Horizontal partitioning
• Replicat...
VoltDB
• Open-source
• Partitioning and Replication
control
• Java + C++
VoltDB: kv bench
90%reads, 10%writes 3 nodes: 64GB, dual 2.93GHz intel 6 core processors
VoltDB: sql bench
26 SQL statements "
per transaction
• Shared data
• No single point

of failure
ScaleDB
Application
MySQL MySQL MySQL…
Mirrored"
Storage
! Mirrored"
Storage
!...
ClustrixDB
• “Query fragment”:
• read/write/execute funs
• perform synchronisation
• send rows to query
fragments
• Partit...
ClustrixDB
• “Move query to the data”
• Dynamic and transparent data
layout
• Linear scale
TPC-C
• OLTP benchmark
• 9 types of tables
• “New-order transaction”
ClustrixDB
• ~ 400GB
• linear
scale
ClustrixDB: examples
• 30M users, 10M logins per day
• 4.4B transactions per day
• 1.08/4.69 Petabytes per month
writes/re...
Conclusions
• NewSQL is an established trend
with a number of options
• Hard to pick one because
they're not on a common s...
Questions?
Ivan Glushkov
ivan.glushkov@gmail.com
@gliush
Upcoming SlideShare
Loading in...5
×

Иван Глушков (Echo)

1,519

Published on

Published in: Internet, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,519
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
21
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Иван Глушков (Echo)"

  1. 1. NewSQL Overview Ivan Glushkov ivan.glushkov@gmail.com @gliush
  2. 2. About myself • MIPT • MCST, Elbrus compiler project • Echo, real-time social platform (PaaS)
  3. 3. History of SQL • Relational Model in 1970:
 disk-oriented, row, sql • “One size fits all” doesn’t work: – Column-oriented for OLAP. – Key-Value storages, Document storages
  4. 4. Startups lifecycle • Start: no money, no users, open source • Middle: more users, storage optimization • Final: plenty of users, storage failure
  5. 5. Sharding in application • Additional logic in application • Difficulties with cross-sharding transactions • More servers to maintain • More components — higher prob for breakdown
  6. 6. New requirements • Huge and growing data sets
 9M msg per hour in Facebook
 50M msg per day in Twitter • Generated by devices • High concurrency • Data model with some relations • Transactional integrity
  7. 7. Architecture change Client Side Server Side Cloud Storage Client Side Server Side Database
  8. 8. Architecture change • ‘P’ in CAP is not discrete • ACID vs BASE • Managing partitions: detection, limitations in operations, recovery
  9. 9. NoSQL • CAP: first ‘A’, then ‘C’ • Horizontal scaling • Custom API • Schemaless • Types: Key-Value, Document, Graph, …
  10. 10. NewSQL: definition “A DBMS that delivers the scalability ! and flexibility promised by NoSQL! while retaining the support for SQL ! queries and/or ACID, or to improve ! performance for appropriate workloads.”! 451 Group
  11. 11. NewSQL: definition ❖ SQL as the primary interface! ❖ ACID support for transactions! ❖ Non-locking concurrency control! ❖ High per-node performance! ❖ Parallel, shared nothing architecture! Michael Stonebraker
  12. 12. Shared nothing • Each node is independent and self-sufficient • No shared memory or disk • Scale infinitely • Data partitioning • Slow multi-shards requests
  13. 13. Column-oriented DBMS • Store content by column • Efficient in hard disk access • Good for sparse data • Higher data compression • More reads/writes for large records with a lot of fields
  14. 14. • Useful work is only 12% • By Stonebraker and research group Traditional DBMS 12% 10% 11% 18% 20% 29%
  15. 15. In-memory storage • High throughput • Low latency • No Buffer Management • If serialized, no Locking or Latching
  16. 16. In-memory storage • Amazon price reduction ! ! ! ! • Current price for 1TB: $14K
  17. 17. NewSQL: categories • New approaches: VoltDB, Clustrix, NuoDB • New storage engines: TokuDB, ScaleDB • Transparent clustering: ScaleBase, dbShards
  18. 18. NuoDB • Multi-tier architecture: – Administrative: managing, stats, cli, web-ui – Transactional: ACID except ‘D’, cache – Storage: key-value store
  19. 19. NuoDB • Everything is an ‘Atom’ • P2P, encrypted sessions • MVCC + Append-only storage
  20. 20. YCSB • Yahoo Cloud Serving Benchmark • Key-value: 
 insert/read/update/scan • Measures – Performance – Scaling
  21. 21. NuoDB: YCSB Throughput, tps/nodes 0 275 000 550 000 825 000 1 100 000 1 2 4 8 16 24 Update latency, s 0 25 50 1 2 4 8 16 24 Read latency, s 0 1.5 3 1 2 4 8 16 24 Hosts: 32GB, Xeon 8 cores, 1TB HDD, 1Gb LAN5% updates, 95% reads
  22. 22. VoltDB • In-memory storage • Stored procedure interface • Serializing all data access • Horizontal partitioning • Replication (“K-safety”) • Snapshots + Command Logging
  23. 23. VoltDB • Open-source • Partitioning and Replication control • Java + C++
  24. 24. VoltDB: kv bench 90%reads, 10%writes 3 nodes: 64GB, dual 2.93GHz intel 6 core processors
  25. 25. VoltDB: sql bench 26 SQL statements " per transaction
  26. 26. • Shared data • No single point
 of failure ScaleDB Application MySQL MySQL MySQL… Mirrored" Storage ! Mirrored" Storage ! … Mirrored" Storage !
  27. 27. ClustrixDB • “Query fragment”: • read/write/execute funs • perform synchronisation • send rows to query fragments • Partition, replication: “slices”
  28. 28. ClustrixDB • “Move query to the data” • Dynamic and transparent data layout • Linear scale
  29. 29. TPC-C • OLTP benchmark • 9 types of tables • “New-order transaction”
  30. 30. ClustrixDB • ~ 400GB • linear scale
  31. 31. ClustrixDB: examples • 30M users, 10M logins per day • 4.4B transactions per day • 1.08/4.69 Petabytes per month writes/reads • 42 nodes, 336 cores, 2TB memory, 46TB SSD
  32. 32. Conclusions • NewSQL is an established trend with a number of options • Hard to pick one because they're not on a common scale • No silver bullet • Need more efficient ways to store and process it
  33. 33. Questions? Ivan Glushkov ivan.glushkov@gmail.com @gliush
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×