Real World NoSQL Big Data Overview and Experience with HBase MongoDB Cassandra

•

0 likes•565 views

The Hong Kong Big Data community had a guest speaker at our Tuesday, 18 February meeting. Chris Yuen from Demyst Data discussed his experience with three NoSQL solutions: Cassandra, MongoDB, and HBase. For more information see http://www.infoincog.com/hong-kong-big-data-meeting-tuesday-18-february/.

Technology

Overview
 Introduction
 Motivation for NoSQL
 The NoSQL landscape

 Experience sharing
 HBase
 MongoDB
 Cassandra

 Tying it up – how does it really matter

Motivation
 Too much data – the need to “scale out”
 CAP theorem

Motivation
 Too much data – the need to “scale out”
 CAP theorem

 Performance
 RDMBS joining is slow
 Denormalization
 Key value data store

 Alternative data representation
 Schemaless “No SQL”

HBase
 Builds on top of HDFS

 Consistent “big-data” database
 Automatically scales out

HBase
 … but we didn’t use it in the end

HBase
 A nightmare to set up and maintain
 Depends on Hadoop, HDFS, Zookeeper

HBase
 A nightmare to set up and maintain
 Depends on Hadoop, HDFS, Zookeeper

 No secondary index
 “Table” alteration requires downtime

 Not spectacular latency for OLTP usage

MongoDB
 De-facto “big-data” “NoSQL” database

 Document based data representation

MongoDB
 A good balance of “traditional” usage and “NoSQL”
usage
 Supports secondary index
 Range query

 Can do table scan

MongoDB
 “Big-data” features: sharding, replica set

MongoDB
 … but it got ugly pretty fast

 Devil’s in the details
 Replica set management fiasco
 Sharding is difficult to set up and poorly implemented
 https://github.com/kizzx2/mongolab

MongoDB
 Reality – it doesn’t scale beyond one machine
 Replica set

Cassandra
 Column Family data store

 More “NoSQL” than MongoDB. Less features
 Column data store – strictly key/value query

Cassandra
 Auto-sharding just works

 Replica set requires 0 configuration
 Append only, LSM-tree based storage format
 Good for SSD
 High insert throughput
 For storing analytic data

Cassandra
 Has rudimentary support for secondary index

 Difficult to do table scan or range scan
 Require substantial application / paradigm shift

Real World Implications
 Why does NoSQL matter to Big Data?
 Schemaless storage model
 Performance
 Scalability

 Rapidly incorporate unstructured new data sources
without extensive planning

How to Choose
 Maintenance / Scalability

 Supported operations
 OLAP vs. OLTP

Thank You
Chris Yuen
http://cfc.kizzx2.com
http://github.com/kizzx2
@kizzx2

chris@kizzx2.com

What's hot

Apache Cassandra Lunch #71: Creating a User Profile Using DataStax Astra and ...Anant Corporation

Introducing Azure SQL Data WarehouseGrant Fritchey

Database ChoicesLynn Langit

Vitalii Bondarenko "Machine Learning on Fast Data"DataConf

Proven Low-Cost Database for Your BusinessEmbarcadero Technologies

Cisco: Cassandra adoption on Cisco UCS & OpenStackDataStax Academy

Strata+Hadoop World NY 2016 - Avinash RamineniAvinash Ramineni

Why no sql ? Why Couchbase ?Ahmed Rashwan

Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownDataStax

Treasure Data From MySQL to RedshiftTreasure Data, Inc.

Azure SQL Data Warehouse Antonios Chatzipavlis

The Practice of Presto & Alluxio in E-Commerce Big Data PlatformAlluxio, Inc.

Build Your Own Data Beast : Greenplum + Dellskahler

Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...DataStax

Introduction to NoSQL DatabaseMohammad Alghanem

Get Results, Build Your Own Big Data Beast : Greenplum + Dellskahler

Building tiered data stores using aesop to bridge sql and no sql systemsRegunath B

Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...DataStax

Membase Meetup 2010Membase

Aesop change data propagationRegunath B

What's hot (20)

Apache Cassandra Lunch #71: Creating a User Profile Using DataStax Astra and ...

Introducing Azure SQL Data Warehouse

Database Choices

Vitalii Bondarenko "Machine Learning on Fast Data"

Proven Low-Cost Database for Your Business

Cisco: Cassandra adoption on Cisco UCS & OpenStack

Strata+Hadoop World NY 2016 - Avinash Ramineni

Why no sql ? Why Couchbase ?

Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known

Treasure Data From MySQL to Redshift

Azure SQL Data Warehouse

The Practice of Presto & Alluxio in E-Commerce Big Data Platform

Build Your Own Data Beast : Greenplum + Dell

Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...

Introduction to NoSQL Database

Get Results, Build Your Own Big Data Beast : Greenplum + Dell

Building tiered data stores using aesop to bridge sql and no sql systems

Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...

Membase Meetup 2010

Aesop change data propagation

Similar to Real World NoSQL Big Data Overview and Experience with HBase MongoDB Cassandra

Nosql seminarShreyashkumar Nangnurwar

If NoSQL is your answer, you are probably asking the wrong question.Lukas Smith

Navigating NoSQL in cloudy skiesshnkr_rmchndrn

NoSQL and MapReduceJ Singh

Introduction to NoSQLbalwinders

7 Databases in 70 minutesKaren Lopez

Minnebar 2013 - Scaling with CassandraJeff Bollinger

NoSQLdbulic

001 hbase introductionScott Miao

Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.

NoSQL and MongoDBRajesh Menon

Making Big Data, smallMarcinJedyk

Boston Hadoop Meetup, April 26 2012Daniel Abadi

Beyond Relational DatabasesGregory Boissinot

Horizon for Big DataSchubert Zhang

Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays

Big data vahidamiri-tabriz-13960226-datastack.irdatastack

No sqlNeeraj Kaushik

Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime

NoSQLkirandanduprolu

Similar to Real World NoSQL Big Data Overview and Experience with HBase MongoDB Cassandra (20)

Nosql seminar

If NoSQL is your answer, you are probably asking the wrong question.

Navigating NoSQL in cloudy skies

NoSQL and MapReduce

Introduction to NoSQL

7 Databases in 70 minutes

Minnebar 2013 - Scaling with Cassandra

NoSQL

001 hbase introduction

Hw09 Practical HBase Getting The Most From Your H Base Install

NoSQL and MongoDB

Making Big Data, small

Boston Hadoop Meetup, April 26 2012

Beyond Relational Databases

Horizon for Big Data

Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...

Big data vahidamiri-tabriz-13960226-datastack.ir

No sql

Cloudera Impala - San Diego Big Data Meetup August 13th 2014

NoSQL

Recently uploaded

A Framework for Development in the AI AgeCprime

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq

Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda

UiPath Community: Communication Mining from Zero to HeroUiPathCommunity

React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma

Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood

Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein

Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada

Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

2024 April Patch TuesdayIvanti

Recently uploaded (20)

A Framework for Development in the AI Age

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

Genislab builds better products and faster go-to-market with Lean project man...

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger

UiPath Community: Communication Mining from Zero to Hero

React Native vs Ionic - The Best Mobile App Framework

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

Microservices, Docker deploy and Microservices source code in C#

Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...

Design pattern talk by Kaya Weers - 2024 (v2)

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24

Long journey of Ruby standard library at RubyConf AU 2024

Moving Beyond Passwords: FIDO Paris Seminar.pdf

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...

Testing tools and AI - ideas what to try with some tool examples

The State of Passkeys with FIDO Alliance.pptx

2024 April Patch Tuesday

Real World NoSQL Big Data Overview and Experience with HBase MongoDB Cassandra

1. Real World NoSQL x Big Data

3. Overview  Introduction  Motivation for NoSQL  The NoSQL landscape  Experience sharing  HBase  MongoDB  Cassandra  Tying it up – how does it really matter

4. Motivation  Too much data – the need to “scale out”  CAP theorem

6. Motivation  Too much data – the need to “scale out”  CAP theorem  Performance  RDMBS joining is slow  Denormalization  Key value data store  Alternative data representation  Schemaless “No SQL”

8. Motivation  Too much data – the need to “scale out”  CAP theorem  Performance  RDMBS joining is slow  Denormalization  Key value data store  Alternative data representation  Schemaless “No SQL”  Document data store

9. HBase  Builds on top of HDFS  Consistent “big-data” database  Automatically scales out

10. HBase  … but we didn’t use it in the end

11. HBase  A nightmare to set up and maintain  Depends on Hadoop, HDFS, Zookeeper

12.

13. HBase  A nightmare to set up and maintain  Depends on Hadoop, HDFS, Zookeeper  No secondary index  “Table” alteration requires downtime  Not spectacular latency for OLTP usage

14. MongoDB  De-facto “big-data” “NoSQL” database  Document based data representation

15. MongoDB  De-facto “big-data” “NoSQL” database  Document based data representation

16. MongoDB  A good balance of “traditional” usage and “NoSQL” usage  Supports secondary index  Range query  Can do table scan

17. MongoDB  “Big-data” features: sharding, replica set

18.

19. MongoDB  … but it got ugly pretty fast  Devil’s in the details  Replica set management fiasco  Sharding is difficult to set up and poorly implemented  https://github.com/kizzx2/mongolab

20. MongoDB

21. MongoDB  Reality – it doesn’t scale beyond one machine  Replica set

22. Cassandra  Column Family data store

23. Cassandra  Column Family data store

24. Cassandra  Column Family data store  More “NoSQL” than MongoDB. Less features  Column data store – strictly key/value query

25. Cassandra  Auto-sharding just works  Replica set requires 0 configuration  Append only, LSM-tree based storage format  Good for SSD  High insert throughput  For storing analytic data

26. Cassandra  Has rudimentary support for secondary index  Difficult to do table scan or range scan  Require substantial application / paradigm shift

27. Real World Implications  Why does NoSQL matter to Big Data?  Schemaless storage model  Performance  Scalability  Rapidly incorporate unstructured new data sources without extensive planning

28. How to Choose  Maintenance / Scalability  Supported operations  OLAP vs. OLTP

29. Thank You Chris Yuen http://cfc.kizzx2.com http://github.com/kizzx2 @kizzx2 chris@kizzx2.com

Real World NoSQL Big Data Overview and Experience with HBase MongoDB Cassandra

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Real World NoSQL Big Data Overview and Experience with HBase MongoDB Cassandra

Similar to Real World NoSQL Big Data Overview and Experience with HBase MongoDB Cassandra (20)

Recently uploaded

Recently uploaded (20)

Real World NoSQL Big Data Overview and Experience with HBase MongoDB Cassandra