Иван Глушков (Echo)
Upcoming SlideShare
Loading in...5
×
 

Иван Глушков (Echo)

on

  • 983 views

 

Statistics

Views

Total Views
983
Views on SlideShare
226
Embed Views
757

Actions

Likes
0
Downloads
6
Comments
0

3 Embeds 757

http://www.xakep.ru 710
http://xakep.ru 25
http://ritconf.ru 22

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Иван Глушков (Echo) Иван Глушков (Echo) Presentation Transcript

  • NewSQL Overview Ivan Glushkov ivan.glushkov@gmail.com @gliush
  • About myself • MIPT • MCST, Elbrus compiler project • Echo, real-time social platform (PaaS)
  • History of SQL • Relational Model in 1970:
 disk-oriented, row, sql • “One size fits all” doesn’t work: – Column-oriented for OLAP. – Key-Value storages, Document storages
  • Startups lifecycle • Start: no money, no users, open source • Middle: more users, storage optimization • Final: plenty of users, storage failure
  • Sharding in application • Additional logic in application • Difficulties with cross-sharding transactions • More servers to maintain • More components — higher prob for breakdown
  • New requirements • Huge and growing data sets
 9M msg per hour in Facebook
 50M msg per day in Twitter • Generated by devices • High concurrency • Data model with some relations • Transactional integrity
  • Architecture change Client Side Server Side Cloud Storage Client Side Server Side Database
  • Architecture change • ‘P’ in CAP is not discrete • ACID vs BASE • Managing partitions: detection, limitations in operations, recovery
  • NoSQL • CAP: first ‘A’, then ‘C’ • Horizontal scaling • Custom API • Schemaless • Types: Key-Value, Document, Graph, …
  • NewSQL: definition “A DBMS that delivers the scalability ! and flexibility promised by NoSQL! while retaining the support for SQL ! queries and/or ACID, or to improve ! performance for appropriate workloads.”! 451 Group
  • NewSQL: definition ❖ SQL as the primary interface! ❖ ACID support for transactions! ❖ Non-locking concurrency control! ❖ High per-node performance! ❖ Parallel, shared nothing architecture! Michael Stonebraker
  • Shared nothing • Each node is independent and self-sufficient • No shared memory or disk • Scale infinitely • Data partitioning • Slow multi-shards requests
  • Column-oriented DBMS • Store content by column • Efficient in hard disk access • Good for sparse data • Higher data compression • More reads/writes for large records with a lot of fields
  • • Useful work is only 12% • By Stonebraker and research group Traditional DBMS 12% 10% 11% 18% 20% 29%
  • In-memory storage • High throughput • Low latency • No Buffer Management • If serialized, no Locking or Latching
  • In-memory storage • Amazon price reduction ! ! ! ! • Current price for 1TB: $14K
  • NewSQL: categories • New approaches: VoltDB, Clustrix, NuoDB • New storage engines: TokuDB, ScaleDB • Transparent clustering: ScaleBase, dbShards
  • NuoDB • Multi-tier architecture: – Administrative: managing, stats, cli, web-ui – Transactional: ACID except ‘D’, cache – Storage: key-value store
  • NuoDB • Everything is an ‘Atom’ • P2P, encrypted sessions • MVCC + Append-only storage
  • YCSB • Yahoo Cloud Serving Benchmark • Key-value: 
 insert/read/update/scan • Measures – Performance – Scaling
  • NuoDB: YCSB Throughput, tps/nodes 0 275 000 550 000 825 000 1 100 000 1 2 4 8 16 24 Update latency, s 0 25 50 1 2 4 8 16 24 Read latency, s 0 1.5 3 1 2 4 8 16 24 Hosts: 32GB, Xeon 8 cores, 1TB HDD, 1Gb LAN5% updates, 95% reads
  • VoltDB • In-memory storage • Stored procedure interface • Serializing all data access • Horizontal partitioning • Replication (“K-safety”) • Snapshots + Command Logging
  • VoltDB • Open-source • Partitioning and Replication control • Java + C++
  • VoltDB: kv bench 90%reads, 10%writes 3 nodes: 64GB, dual 2.93GHz intel 6 core processors
  • VoltDB: sql bench 26 SQL statements " per transaction
  • • Shared data • No single point
 of failure ScaleDB Application MySQL MySQL MySQL… Mirrored" Storage ! Mirrored" Storage ! … Mirrored" Storage !
  • ClustrixDB • “Query fragment”: • read/write/execute funs • perform synchronisation • send rows to query fragments • Partition, replication: “slices”
  • ClustrixDB • “Move query to the data” • Dynamic and transparent data layout • Linear scale
  • TPC-C • OLTP benchmark • 9 types of tables • “New-order transaction”
  • ClustrixDB • ~ 400GB • linear scale
  • ClustrixDB: examples • 30M users, 10M logins per day • 4.4B transactions per day • 1.08/4.69 Petabytes per month writes/reads • 42 nodes, 336 cores, 2TB memory, 46TB SSD
  • Conclusions • NewSQL is an established trend with a number of options • Hard to pick one because they're not on a common scale • No silver bullet • Need more efficient ways to store and process it
  • Questions? Ivan Glushkov ivan.glushkov@gmail.com @gliush