Your SlideShare is downloading. ×
Real World NoSQL (by Chris Yuen)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Real World NoSQL (by Chris Yuen)

185
views

Published on

The Hong Kong Big Data community had a guest speaker at our Tuesday, 18 February meeting. Chris Yuen from Demyst Data discussed his experience with three NoSQL solutions: Cassandra, MongoDB, and …

The Hong Kong Big Data community had a guest speaker at our Tuesday, 18 February meeting. Chris Yuen from Demyst Data discussed his experience with three NoSQL solutions: Cassandra, MongoDB, and HBase. For more information see http://www.infoincog.com/hong-kong-big-data-meeting-tuesday-18-february/.

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
185
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Real World NoSQL x Big Data
  • 2. Overview  Introduction  Motivation for NoSQL  The NoSQL landscape  Experience sharing  HBase  MongoDB  Cassandra  Tying it up – how does it really matter
  • 3. Motivation  Too much data – the need to “scale out”  CAP theorem
  • 4. Motivation  Too much data – the need to “scale out”  CAP theorem  Performance  RDMBS joining is slow  Denormalization  Key value data store  Alternative data representation  Schemaless “No SQL”
  • 5. Motivation  Too much data – the need to “scale out”  CAP theorem  Performance  RDMBS joining is slow  Denormalization  Key value data store  Alternative data representation  Schemaless “No SQL”  Document data store
  • 6. HBase  Builds on top of HDFS  Consistent “big-data” database  Automatically scales out
  • 7. HBase  … but we didn’t use it in the end
  • 8. HBase  A nightmare to set up and maintain  Depends on Hadoop, HDFS, Zookeeper
  • 9. HBase  A nightmare to set up and maintain  Depends on Hadoop, HDFS, Zookeeper  No secondary index  “Table” alteration requires downtime  Not spectacular latency for OLTP usage
  • 10. MongoDB  De-facto “big-data” “NoSQL” database  Document based data representation
  • 11. MongoDB  De-facto “big-data” “NoSQL” database  Document based data representation
  • 12. MongoDB  A good balance of “traditional” usage and “NoSQL” usage  Supports secondary index  Range query  Can do table scan
  • 13. MongoDB  “Big-data” features: sharding, replica set
  • 14. MongoDB  … but it got ugly pretty fast  Devil’s in the details  Replica set management fiasco  Sharding is difficult to set up and poorly implemented  https://github.com/kizzx2/mongolab
  • 15. MongoDB
  • 16. MongoDB  Reality – it doesn’t scale beyond one machine  Replica set
  • 17. Cassandra  Column Family data store
  • 18. Cassandra  Column Family data store
  • 19. Cassandra  Column Family data store  More “NoSQL” than MongoDB. Less features  Column data store – strictly key/value query
  • 20. Cassandra  Auto-sharding just works  Replica set requires 0 configuration  Append only, LSM-tree based storage format  Good for SSD  High insert throughput  For storing analytic data
  • 21. Cassandra  Has rudimentary support for secondary index  Difficult to do table scan or range scan  Require substantial application / paradigm shift
  • 22. Real World Implications  Why does NoSQL matter to Big Data?  Schemaless storage model  Performance  Scalability  Rapidly incorporate unstructured new data sources without extensive planning
  • 23. How to Choose  Maintenance / Scalability  Supported operations  OLAP vs. OLTP
  • 24. Thank You Chris Yuen http://cfc.kizzx2.com http://github.com/kizzx2 @kizzx2 chris@kizzx2.com