Иван Глушков (Echo)

2,096 views
1,928 views

Published on

Published in: Internet, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,096
On SlideShare
0
From Embeds
0
Number of Embeds
907
Actions
Shares
0
Downloads
27
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Иван Глушков (Echo)

  1. 1. NewSQL Overview Ivan Glushkov ivan.glushkov@gmail.com @gliush
  2. 2. About myself • MIPT • MCST, Elbrus compiler project • Echo, real-time social platform (PaaS)
  3. 3. History of SQL • Relational Model in 1970:
 disk-oriented, row, sql • “One size fits all” doesn’t work: – Column-oriented for OLAP. – Key-Value storages, Document storages
  4. 4. Startups lifecycle • Start: no money, no users, open source • Middle: more users, storage optimization • Final: plenty of users, storage failure
  5. 5. Sharding in application • Additional logic in application • Difficulties with cross-sharding transactions • More servers to maintain • More components — higher prob for breakdown
  6. 6. New requirements • Huge and growing data sets
 9M msg per hour in Facebook
 50M msg per day in Twitter • Generated by devices • High concurrency • Data model with some relations • Transactional integrity
  7. 7. Architecture change Client Side Server Side Cloud Storage Client Side Server Side Database
  8. 8. Architecture change • ‘P’ in CAP is not discrete • ACID vs BASE • Managing partitions: detection, limitations in operations, recovery
  9. 9. NoSQL • CAP: first ‘A’, then ‘C’ • Horizontal scaling • Custom API • Schemaless • Types: Key-Value, Document, Graph, …
  10. 10. NewSQL: definition “A DBMS that delivers the scalability ! and flexibility promised by NoSQL! while retaining the support for SQL ! queries and/or ACID, or to improve ! performance for appropriate workloads.”! 451 Group
  11. 11. NewSQL: definition ❖ SQL as the primary interface! ❖ ACID support for transactions! ❖ Non-locking concurrency control! ❖ High per-node performance! ❖ Parallel, shared nothing architecture! Michael Stonebraker
  12. 12. Shared nothing • Each node is independent and self-sufficient • No shared memory or disk • Scale infinitely • Data partitioning • Slow multi-shards requests
  13. 13. Column-oriented DBMS • Store content by column • Efficient in hard disk access • Good for sparse data • Higher data compression • More reads/writes for large records with a lot of fields
  14. 14. • Useful work is only 12% • By Stonebraker and research group Traditional DBMS 12% 10% 11% 18% 20% 29%
  15. 15. In-memory storage • High throughput • Low latency • No Buffer Management • If serialized, no Locking or Latching
  16. 16. In-memory storage • Amazon price reduction ! ! ! ! • Current price for 1TB: $14K
  17. 17. NewSQL: categories • New approaches: VoltDB, Clustrix, NuoDB • New storage engines: TokuDB, ScaleDB • Transparent clustering: ScaleBase, dbShards
  18. 18. NuoDB • Multi-tier architecture: – Administrative: managing, stats, cli, web-ui – Transactional: ACID except ‘D’, cache – Storage: key-value store
  19. 19. NuoDB • Everything is an ‘Atom’ • P2P, encrypted sessions • MVCC + Append-only storage
  20. 20. YCSB • Yahoo Cloud Serving Benchmark • Key-value: 
 insert/read/update/scan • Measures – Performance – Scaling
  21. 21. NuoDB: YCSB Throughput, tps/nodes 0 275 000 550 000 825 000 1 100 000 1 2 4 8 16 24 Update latency, s 0 25 50 1 2 4 8 16 24 Read latency, s 0 1.5 3 1 2 4 8 16 24 Hosts: 32GB, Xeon 8 cores, 1TB HDD, 1Gb LAN5% updates, 95% reads
  22. 22. VoltDB • In-memory storage • Stored procedure interface • Serializing all data access • Horizontal partitioning • Replication (“K-safety”) • Snapshots + Command Logging
  23. 23. VoltDB • Open-source • Partitioning and Replication control • Java + C++
  24. 24. VoltDB: kv bench 90%reads, 10%writes 3 nodes: 64GB, dual 2.93GHz intel 6 core processors
  25. 25. VoltDB: sql bench 26 SQL statements " per transaction
  26. 26. • Shared data • No single point
 of failure ScaleDB Application MySQL MySQL MySQL… Mirrored" Storage ! Mirrored" Storage ! … Mirrored" Storage !
  27. 27. ClustrixDB • “Query fragment”: • read/write/execute funs • perform synchronisation • send rows to query fragments • Partition, replication: “slices”
  28. 28. ClustrixDB • “Move query to the data” • Dynamic and transparent data layout • Linear scale
  29. 29. TPC-C • OLTP benchmark • 9 types of tables • “New-order transaction”
  30. 30. ClustrixDB • ~ 400GB • linear scale
  31. 31. ClustrixDB: examples • 30M users, 10M logins per day • 4.4B transactions per day • 1.08/4.69 Petabytes per month writes/reads • 42 nodes, 336 cores, 2TB memory, 46TB SSD
  32. 32. Conclusions • NewSQL is an established trend with a number of options • Hard to pick one because they're not on a common scale • No silver bullet • Need more efficient ways to store and process it
  33. 33. Questions? Ivan Glushkov ivan.glushkov@gmail.com @gliush

×