Иван Глушков (Echo)
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Иван Глушков (Echo)

on

  • 1,043 views

 

Statistics

Views

Total Views
1,043
Views on SlideShare
279
Embed Views
764

Actions

Likes
1
Downloads
6
Comments
0

4 Embeds 764

http://www.xakep.ru 710
http://xakep.ru 26
http://ritconf.ru 25
http://www.highload.ru 3

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Иван Глушков (Echo) Presentation Transcript

  • 1. NewSQL Overview Ivan Glushkov ivan.glushkov@gmail.com @gliush
  • 2. About myself • MIPT • MCST, Elbrus compiler project • Echo, real-time social platform (PaaS)
  • 3. History of SQL • Relational Model in 1970:
 disk-oriented, row, sql • “One size fits all” doesn’t work: – Column-oriented for OLAP. – Key-Value storages, Document storages
  • 4. Startups lifecycle • Start: no money, no users, open source • Middle: more users, storage optimization • Final: plenty of users, storage failure
  • 5. Sharding in application • Additional logic in application • Difficulties with cross-sharding transactions • More servers to maintain • More components — higher prob for breakdown
  • 6. New requirements • Huge and growing data sets
 9M msg per hour in Facebook
 50M msg per day in Twitter • Generated by devices • High concurrency • Data model with some relations • Transactional integrity
  • 7. Architecture change Client Side Server Side Cloud Storage Client Side Server Side Database
  • 8. Architecture change • ‘P’ in CAP is not discrete • ACID vs BASE • Managing partitions: detection, limitations in operations, recovery
  • 9. NoSQL • CAP: first ‘A’, then ‘C’ • Horizontal scaling • Custom API • Schemaless • Types: Key-Value, Document, Graph, …
  • 10. NewSQL: definition “A DBMS that delivers the scalability ! and flexibility promised by NoSQL! while retaining the support for SQL ! queries and/or ACID, or to improve ! performance for appropriate workloads.”! 451 Group
  • 11. NewSQL: definition ❖ SQL as the primary interface! ❖ ACID support for transactions! ❖ Non-locking concurrency control! ❖ High per-node performance! ❖ Parallel, shared nothing architecture! Michael Stonebraker
  • 12. Shared nothing • Each node is independent and self-sufficient • No shared memory or disk • Scale infinitely • Data partitioning • Slow multi-shards requests
  • 13. Column-oriented DBMS • Store content by column • Efficient in hard disk access • Good for sparse data • Higher data compression • More reads/writes for large records with a lot of fields
  • 14. • Useful work is only 12% • By Stonebraker and research group Traditional DBMS 12% 10% 11% 18% 20% 29%
  • 15. In-memory storage • High throughput • Low latency • No Buffer Management • If serialized, no Locking or Latching
  • 16. In-memory storage • Amazon price reduction ! ! ! ! • Current price for 1TB: $14K
  • 17. NewSQL: categories • New approaches: VoltDB, Clustrix, NuoDB • New storage engines: TokuDB, ScaleDB • Transparent clustering: ScaleBase, dbShards
  • 18. NuoDB • Multi-tier architecture: – Administrative: managing, stats, cli, web-ui – Transactional: ACID except ‘D’, cache – Storage: key-value store
  • 19. NuoDB • Everything is an ‘Atom’ • P2P, encrypted sessions • MVCC + Append-only storage
  • 20. YCSB • Yahoo Cloud Serving Benchmark • Key-value: 
 insert/read/update/scan • Measures – Performance – Scaling
  • 21. NuoDB: YCSB Throughput, tps/nodes 0 275 000 550 000 825 000 1 100 000 1 2 4 8 16 24 Update latency, s 0 25 50 1 2 4 8 16 24 Read latency, s 0 1.5 3 1 2 4 8 16 24 Hosts: 32GB, Xeon 8 cores, 1TB HDD, 1Gb LAN5% updates, 95% reads
  • 22. VoltDB • In-memory storage • Stored procedure interface • Serializing all data access • Horizontal partitioning • Replication (“K-safety”) • Snapshots + Command Logging
  • 23. VoltDB • Open-source • Partitioning and Replication control • Java + C++
  • 24. VoltDB: kv bench 90%reads, 10%writes 3 nodes: 64GB, dual 2.93GHz intel 6 core processors
  • 25. VoltDB: sql bench 26 SQL statements " per transaction
  • 26. • Shared data • No single point
 of failure ScaleDB Application MySQL MySQL MySQL… Mirrored" Storage ! Mirrored" Storage ! … Mirrored" Storage !
  • 27. ClustrixDB • “Query fragment”: • read/write/execute funs • perform synchronisation • send rows to query fragments • Partition, replication: “slices”
  • 28. ClustrixDB • “Move query to the data” • Dynamic and transparent data layout • Linear scale
  • 29. TPC-C • OLTP benchmark • 9 types of tables • “New-order transaction”
  • 30. ClustrixDB • ~ 400GB • linear scale
  • 31. ClustrixDB: examples • 30M users, 10M logins per day • 4.4B transactions per day • 1.08/4.69 Petabytes per month writes/reads • 42 nodes, 336 cores, 2TB memory, 46TB SSD
  • 32. Conclusions • NewSQL is an established trend with a number of options • Hard to pick one because they're not on a common scale • No silver bullet • Need more efficient ways to store and process it
  • 33. Questions? Ivan Glushkov ivan.glushkov@gmail.com @gliush