This document compares NoSQL solutions like Redis, Couchbase, MongoDB, and Membase. It discusses their data models, features, and how they differ from relational databases. Key-value, column-oriented, and document-oriented databases are covered. Specific products like Membase, Redis, MongoDB, and CouchDB are also summarized, including their data models, replication methods, and typical uses in applications.
JSONB in PostgreSQL is one of the main attractive feature for modern
application developers, no matter what some RDBMS purists are thinking.
People often use simple one-column-json schema for their projects and rely
on ability of database to store,index and query json. Postgres has long
history of supporting the non-structured data and has pioneered the
adoption of JSON by relational databases, so eventually JSON became and
official feature (SQL/JSON) of SQL standard.
With the public confession of Facebook, HBase is on everyone's lips when it comes to the discussion around the new "NoSQL" area of databases. In this talk, Lars will introduce and present a comprehensive overview of HBase. This includes the history of HBase, the underlying architecture, available interfaces, and integration with Hadoop.
Introduction to HBase. HBase is a NoSQL databases which experienced a tremendous increase in popularity during the last years. Large companies like Facebook, LinkedIn, Foursquare are using HBase. In this presentation we will address questions like: what is HBase?, and compared to relational databases?, what is the architecture?, how does HBase work?, what about the schema design?, what about the IT ressources?. Questions that should help you consider whether this solution might be suitable in your case.
JSONB in PostgreSQL is one of the main attractive feature for modern
application developers, no matter what some RDBMS purists are thinking.
People often use simple one-column-json schema for their projects and rely
on ability of database to store,index and query json. Postgres has long
history of supporting the non-structured data and has pioneered the
adoption of JSON by relational databases, so eventually JSON became and
official feature (SQL/JSON) of SQL standard.
With the public confession of Facebook, HBase is on everyone's lips when it comes to the discussion around the new "NoSQL" area of databases. In this talk, Lars will introduce and present a comprehensive overview of HBase. This includes the history of HBase, the underlying architecture, available interfaces, and integration with Hadoop.
Introduction to HBase. HBase is a NoSQL databases which experienced a tremendous increase in popularity during the last years. Large companies like Facebook, LinkedIn, Foursquare are using HBase. In this presentation we will address questions like: what is HBase?, and compared to relational databases?, what is the architecture?, how does HBase work?, what about the schema design?, what about the IT ressources?. Questions that should help you consider whether this solution might be suitable in your case.
Speaker: Jesse Anderson (Cloudera)
As optional pre-conference prep for attendees who are new to HBase, this talk will offer a brief Cliff's Notes-level talk covering architecture, API, and schema design. The architecture section will cover the daemons and their functions, the API section will cover HBase's GET, PUT, and SCAN classes; and the schema design section will cover how HBase differs from an RDBMS and the amount of effort to place on schema and row-key design.
HBase can be an intimidating beast for someone considering its adoption. For what kinds of workloads is it well suited? How does it integrate into the rest of my application infrastructure? What are the data semantics upon which applications can be built? What are the deployment and operational concerns? In this talk, I'll address each of these questions in turn. As supporting evidence, both high-level application architecture and internal details will be discussed. This is an interactive talk: bring your questions and your use-cases!
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...Cloudera, Inc.
This session is a case study of how we used our already existing HBase cluster as content addressable storage for BLOBs. We will discuss how we wrote a CAS implementation using HBase as the backend, Scala and Finagle as the application and using caching reverse proxies (i.e. Varnish in our case) for serving BLOBs at scale. The talk will dicuss why content addressable storage is the right pattern for many web use cases, how to foster an already existing HBase cluster for better usage of possibly underutilized resources, and operational gotchas to store and serve BLOBs from HBase at scale.
HBase Advanced Schema Design - Berlin Buzzwords - June 2012larsgeorge
While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per second. This talk will address the architectural challenges when designing for either read or write performance imposed by HBase. It will include examples of real world use-cases and how they
http://berlinbuzzwords.de/sessions/advanced-hbase-schema-design
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter
Apache Tajo: A Big Data Warehouse System on Hadoop
- presented by Jae-hwaJeong, Apache Tajo committer and Gruter research engineer
at Gruter TECHDAY 2014 (Oct. 29 Seoul, Korea)
Sept 17 2013 - THUG - HBase a Technical IntroductionAdam Muise
HBase Technical Introduction. This deck includes a description of memory design, write path, read path, some operational tidbits, SQL on HBase (Phoenix and Hive), as well as HOYA (HBase on YARN).
Some key value stores using log-structureZhichao Liang
This slides presents three key-value stores using log-structure, includes Riak, RethinkDB, LevelDB. BTW, i state that RethinkDB employs append-only B-tree and that is an estimate made by combining guessing wih reasoning!
Apache HBase™ is the Hadoop database, a distributed, salable, big data store.Its a column-oriented database management system that runs on top of HDFS.
Apache HBase is an open source NoSQL database that provides real-time read/write access to those large data sets. ... HBase is natively integrated with Hadoop and works seamlessly alongside other data access engines through YARN.
Speaker: Jesse Anderson (Cloudera)
As optional pre-conference prep for attendees who are new to HBase, this talk will offer a brief Cliff's Notes-level talk covering architecture, API, and schema design. The architecture section will cover the daemons and their functions, the API section will cover HBase's GET, PUT, and SCAN classes; and the schema design section will cover how HBase differs from an RDBMS and the amount of effort to place on schema and row-key design.
HBase can be an intimidating beast for someone considering its adoption. For what kinds of workloads is it well suited? How does it integrate into the rest of my application infrastructure? What are the data semantics upon which applications can be built? What are the deployment and operational concerns? In this talk, I'll address each of these questions in turn. As supporting evidence, both high-level application architecture and internal details will be discussed. This is an interactive talk: bring your questions and your use-cases!
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...Cloudera, Inc.
This session is a case study of how we used our already existing HBase cluster as content addressable storage for BLOBs. We will discuss how we wrote a CAS implementation using HBase as the backend, Scala and Finagle as the application and using caching reverse proxies (i.e. Varnish in our case) for serving BLOBs at scale. The talk will dicuss why content addressable storage is the right pattern for many web use cases, how to foster an already existing HBase cluster for better usage of possibly underutilized resources, and operational gotchas to store and serve BLOBs from HBase at scale.
HBase Advanced Schema Design - Berlin Buzzwords - June 2012larsgeorge
While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per second. This talk will address the architectural challenges when designing for either read or write performance imposed by HBase. It will include examples of real world use-cases and how they
http://berlinbuzzwords.de/sessions/advanced-hbase-schema-design
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter
Apache Tajo: A Big Data Warehouse System on Hadoop
- presented by Jae-hwaJeong, Apache Tajo committer and Gruter research engineer
at Gruter TECHDAY 2014 (Oct. 29 Seoul, Korea)
Sept 17 2013 - THUG - HBase a Technical IntroductionAdam Muise
HBase Technical Introduction. This deck includes a description of memory design, write path, read path, some operational tidbits, SQL on HBase (Phoenix and Hive), as well as HOYA (HBase on YARN).
Some key value stores using log-structureZhichao Liang
This slides presents three key-value stores using log-structure, includes Riak, RethinkDB, LevelDB. BTW, i state that RethinkDB employs append-only B-tree and that is an estimate made by combining guessing wih reasoning!
Apache HBase™ is the Hadoop database, a distributed, salable, big data store.Its a column-oriented database management system that runs on top of HDFS.
Apache HBase is an open source NoSQL database that provides real-time read/write access to those large data sets. ... HBase is natively integrated with Hadoop and works seamlessly alongside other data access engines through YARN.
NoSQL is not a buzzword anymore. The array of non- relational technologies have found wide-scale adoption even in non-Internet scale focus areas. With the advent of the Cloud...the churn has increased even more yet there is no crystal clear guidance on adoption techniques and architectural choices surrounding the plethora of options available. This session initiates you into the whys & wherefores, architectural patterns, caveats and techniques that will augment your decision making process & boost your perception of architecting scalable, fault-tolerant & distributed solutions.
We prepared a small 30 min workshop for the Dutch Java User Group to introduce MongoDB basics. This slideshow contains the mongoDB concepts, which will be workout basic in labs . The labs could be found at: http://mongodb.info/labs/
Big Data and New Challenges for DBAs (Michael Naumov, LivePerson)
Hadoop has become a popular platform for managing large datasets of structured and unstructured data. It does not replace existing infrastructures, but instead augments them. Most companies will still use relational databases for transactional processing and low-latency queries, but can benefit from Hadoop for reporting, machine learning or ETL. This session will cover:
What is Hadoop and why do I care?
What do people do with Hadoop?
How can SQL Server DBAs add Hadoop to their architecture?
Large Scale Machine Learning with Apache SparkCloudera, Inc.
Spark offers a number of advantages over its predecessor MapReduce that make it ideal for large-scale machine learning. For example, Spark includes MLLib, a library of machine learning algorithms for large data. The presentation will cover the state of MLLib and the details of some of the scalable algorithms it includes.
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
The unprecedented scale at which data is consumed and generated today has shown a large demand for scalable data management and given rise to non-relational, distributed "NoSQL" database systems. Two central problems triggered this process: 1) vast amounts of user-generated content in modern applications and the resulting requests loads and data volumes 2) the desire of the developer community to employ problem-specific data models for storage and querying. To address these needs, various data stores have been developed by both industry and research, arguing that the era of one-size-fits-all database systems is over. The heterogeneity and sheer amount of these systems - now commonly referred to as NoSQL data stores - make it increasingly difficult to select the most appropriate system for a given application. Therefore, these systems are frequently combined in polyglot persistence architectures to leverage each system in its respective sweet spot. This tutorial gives an in-depth survey of the most relevant NoSQL databases to provide comparative classification and highlight open challenges. To this end, we analyze the approach of each system to derive its scalability, availability, consistency, data modeling and querying characteristics. We present how each system's design is governed by a central set of trade-offs over irreconcilable system properties. We then cover recent research results in distributed data management to illustrate that some shortcomings of NoSQL systems could already be solved in practice, whereas other NoSQL data management problems pose interesting and unsolved research challenges.
If you'd like to use these slides for e.g. teaching, contact us at gessert at informatik.uni-hamburg.de - we'll send you the PowerPoint.
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL ServicesAmazon Web Services
In this session, we discuss the benefits of NoSQL databases and take a tour of the main NoSQL services offered by AWS—Amazon DynamoDB and Amazon ElastiCache. Then, we hear from two leading customers, Expedia and Mapbox, about their use cases and architectural challenges, and how they addressed them using AWS NoSQL services, including design patterns and best practices. You will walk out of this session having a better understanding of NoSQL and its powerful capabilities, ready to tackle your database challenges with confidence.
10. Document-oriented
FirstName="Jonathan",
Address="15 Wanamassa Point
Road",
K Children=[
E {Name:"Michael",Age:10},
Y {Name:"Jennifer", Age:8},
{Name:"Samantha", Age:5},
{Name:"Elena", Age:2}
]
12. Membase
• Based on Memcached
• Written by C++(Memcached),
Erlang(Membase)
• Distributed, in-memory key-value
database management system
• Optimized for storing data behind
web applications
13. Membase (cont’)
• Persistence
– Asynchronously writes data to disk after
acknowledging write to client
– Guarantees data consistency
• Replication and failover (server
failures recoverable in under 100ms)
• Scalability and performance
– Distributed object store
– Dynamic cluster resizing and rebalancing
– Guaranteed data consistency
– High sustained throughput
– Low, predictable latency
15. Redis (cont’)
• Various Data Models
– List, Set, Sorted Set, Hash
– Support atomic operation about data types
• Persistence
– Data is held in memory but written to disk
asynchronously
• Replication
– Master-Slave replication
• Performance
– Non-blocking I/O. Single threaded
• Publish/Subscribe
16. Membase vs. Redis
Membase Redis
String Set, List, Sorted List,
Hash..
Master-Master Master-Slave
Storing, inc/dec API Various operations
includes pop, push,
extract …
Web management UI Console management
tool
17. How to use…
• Normally use …
$key = md5('SELECT * FROM rest_of_sql_statement_goes_here');
if ($memcache->get($key)) {
return $memcache->get($key);
}
else {
$result = $query_results_mangled_into_most_likely_an_array
$memcache->set($key, $result, TRUE, 86400);
return $result;
}
18. How to use … (cont’)
• Structured Data (array, struct…)
– Serialize
KEY VALUE
user:$user_id name:문병원|call:하겐다즈|…
– Normalization
KEY VALUE
user:$user_id:name 문병원
user:$user_id:call 하겐다즈
19. Application Design using
Membase
• Cache result other than SQL data!
• Use a cache hierarchy
• Update membase as your data
updates
• Race conditions and stale data
• Pre warm your cache
• Storing lists with keys
• Batch your requests with get_multi
From memcached FAQ
21. MongoDB
• Document-oriented Storage
• High Write Performance
• Full index support
• Master/Slave Replication
• Support Map/Reduce
• Auto-Sharding
• Querying
• GridFS
• Written in C++
22. CouchDB
• Document-oriented Storage
• High Read Performance
• ACID Semantics
• Map/Reduce View and Indexes
• Distributed Architecture with
Replication
• REST API
• Eventual Consistency
• Written in Erlang
28. CouchDB MongoDB MySQL
Document-Oriented (JS Document-Oriented
Data Model Relational
ON) (BSON)
string, int, doubl
string, number, boole e, boolean, date, Various Types L
Data Types
an, array, object byte array, object ink
, array, others
Large Object
Yes (attachments) Yes (GridFS) BlobZ
s (Files)
Horizontal p
artitioning CouchDB Lounge Auto-sharding Partitioning
scheme
Master-slave, m
Master-master (with d
Master-slave and r ulti-master, an
Replication eveloper supplied con
eplica sets d circular repl
flict resolution)
ication
Object(row)
One large repository Collection-based Table-based
Storage
29. Map/reduce of ja
vascript functio Dynamic; object-
Query Method ns to lazily bui based query lang Dynamic; SQL
ld an index per uage
query
Secondary Indexes Yes Yes Yes
Atomicity Single document Single document Yes - advanced
Native drivers ;
Interface REST Native drivers
REST add-on
Map/Reduce, serv
Server-side batch d
Map/Reduce er-side javascri Yes (SQL)
ata manipulation
pt
Written in Erlang C++ C++
Eventually consi Strong consisten Strong consiste
stent (master-ma cy. Eventually ncy. Eventuall
Distributed Consist ster replication consistent reads y consistent re
ency Model with versioning from secondaries ads from second
and version reco are available. aries are avail
nciliation) able.
30. References
• NoSQL solutions: Membase, Redis, CouchDB and MongoDB :
http://blog.fedecarg.com/2011/01/25/nosql-solutions-membase-redis-
couchdb-and-mongodb/
• Visual Guide to NoSQL Systems : http://blog.nahurst.com/visual-guide-to-
nosql-systems
• MongoDB, CouchDB, MySQL Compare Grid :
http://www.mongodb.org/display/DOCS/MongoDB,+CouchDB,+MySQL+Compare+Grid
• SQL to Mongo Mapping Chart :
http://www.mongodb.org/display/DOCS/SQL+to+Mongo+Mapping+Chart
• Memcached FAQ :
http://code.google.com/p/memcached/wiki/FAQ#Simple_query_result_caching
• Couchbase 2.0 Manual : http://docs.couchbase.org/couchbase-manual-2.0.pdf
• Building Timeline : Facebook http://www.facebook.com/notes/facebook-
engineering/building-timeline-scaling-up-to-hold-your-life-
story/10150468255628920