Agenda Background What’s NoSQL Why NoSQL How to make a selection of NoSQL Data type Data model Architecture Key technologies Summary
Not What is NoSQL Only SQL Definition NoSQL ,sometimes expanded to "not only SQL“. It is a broad class of database management systems that differ from classic relational database management SQL systems(RDBMSs). These data stores may NOT require FIXED table schemas, usually avoid join operations, and typically scale horizontally. Academia typically refers to these databases as structured storage, a term that would include classic relational databases as a subset. Refer to Wiki page: http://en.wikipedia.org/wiki/NoSQL
SQLSQL Vs. NoSQL NoSQL Transactional ACID semantics Restricted ACID Query Model Complex & Functionality Simple & App Oriented Data Model Relational& Row storage Key-Value, Column Oriented, Document Oriented &graph Schemas Fixed Schema Free/Schema less Data Storage Limited & Costly Horizontal Scalability & MassiveFailure tolerance failure recovery slow Native & fast recovery Hardware Reliable & Expensive Commodity & Inexpensive
Come From Requirement Fast Increasing & Development Increasing number of servers Scale out Inexpensive & unreliable servers Increasing data volume Big Data Scalability Increasing user number High throughputs All about INCREASING High workload
Come From RequirementDifferent application & Ecosystem Rapid change Always beta Flexible data schema Abundant web applications Complex data Larger record size Typically read more and write less Low transaction and consistency requirements Online services Failure tolerance Fast recovery High availability
Data type Classification What kind of data should be storedUnstructured data Dynamo Voldemort • Does not have a pre-defined data model Berkeley DB Memcache DB • And/or does not fit well into relational tables Tokyo cabinet RedisStructured data UNSTRUCTURED, e.g. Documents, Videos, Query • The entities belongs to the same class should Audios, Images have same attributes and attributes order • The data structure should be predefined and ? couldn’t changedSemi-structured data • Is a form of structured data My SQL Oracle • The entities belongs to the same class may STRUCTURED, e.g. have different attributes CRM,ERP • Contains tags or other markers to separate semantic elements and enforce hierarchies of Store records and fields within the data BigTableHBase Cassandra • the entities belongs to the same class may Hyper Table have different attributes even though they are Mongo Couch grouped together, and the attributes order is SEMI-STRUCTURED, e.g. not important. Logs, mails, web pages, Blogs • Is also known as schema less or self- describing structure.
Summary Flexible Flexible Record size Efficiency Scalability Record size Unstructured Transaction Structured Semi-structured al Scalability Transactional Efficiency
Key-Value pair based Simple read and write data item is uniquely identified by a key Key-value stores allow the application to store its data in a schema-less way. The data could be stored in a data type of a Advantages programming language or an object. • Efficiency A key indicates a unique Value Anything can be stored in a value, image, document, even a • Easy to use complex data structure( array, list …) • Flexible data storage Disadvantages • Simple query model Many cloud based databases can be classified to Key- Value store, such as most of column oriented databases.
Column Oriented storeA Simple :Column store Vs. Row store Queries Empty cells are stored Name Language NotesNeo4j Java High- High-performance, scalable,Neo4jNeo4j Java Java performance, scalable, distribute distributed Graph DatabaseOrientDB Java d Graph DatabaseOrientDBOrientDB Java JavaFlockDB Scala Graph database with queryFlockDBFlockDB Scala Scala language called GraphQLSones C# Null isSonesGraphDBSones C# C# Graph database with query Graph database with query freeGraphDBGraphDB language called GraphQL language called GraphQL Query 1 Query 2
Column Oriented store BigTable data model Column Families Cell contents ( Row, Column , Timestamp )Sorted RowKey,Storing Storing Versioned Row Key Content Anchorpages fromthe same domainnear each other Content: Anchor: cnnsi.com Anchor: my.look.ca t3“com.cnn.www” t5 t7 “<HTML>…” t6 “CNN” t8 “CNN.COM” t9“com.cnn.www/index.htm”
Column Oriented storeBigTable liked data model Stores content by column rather •Advantages – Versioned than by row. – Query oriented A key identifies a row, which contains data stored in – Good for OLAP Applications one or more Column Families(CF) – Null is free Within a CF, each row can contain multiple columns – Compression efficient Columns can be added dynamically – Dynamic Columns Distributed multi-dimensional sparse map •Disadvantages (row, column, timestamp) → cell contents – Read entire row is not efficient – Contains tags or other markers to separate semantic elements – Not well-suited for OLTP-like workloads – Simple query model
Document Oriented store The idea is to replace the concept •Advantages of a “row” with a more flexible – Rich RDBMS-like functions – Freedom in modeling model documents The “document.” By allowing embedded documents and •Disadvantages arrays – Query logic complex. the document-oriented approach makes it possible to – Documents are limited in size represent complex hierarchical relationships with a single record. Documents have some similar information and some different Usually store documents in a JSON or JSON-like format
Summary Key-Value Column oriented Document oriented GraphSchema Schema less Dynamic columns Complex and hierarchical Graph data model, JSON-like formatQuery model Key-value pair Key-value Affluent and complexData type Unstructured Semi-structured Semi-structuredAdvantage Efficiency, Easy Query oriented, null is free Functionality and Freedom in modelingDisadvantage Sample Simple query model ComplexSystems
Master-Slave architectureAn example: HBase Architecture Control flaw Zookeeper • One Master and many Slaves • Master manages meta data Data flaw in charge of all slaves, dispatch tasks do load balance Region Server and so on • Slaves, Slaves report status to the master and take over the real data management Region Server • Usually with Data flow and HDFS Control flow detach • Typically with global storage Region Server system(e.g. DFS) for data durability HMaster and fast recovery • Especially some with a distributed coordination mechanism to do master election, maintain configuration, failure detection and synchronization
Master-Slave architecture Is a model of communication where •Advantages one device or process has unidirectional control – Clear Architect over one or more other devices. In some systems a master is elected from a group of eligible – Easy to provide Strong Consistency devices, with the other devices acting in the role of slaves. – Easy for Management – Easy for scalability •Disadvantages – Single Point Failure risk – Hotspot problems
P2P ArchitectureAn example: Cassandra 4 • Peers are equally privileged • Node replica as a factor 3 5 • Gossip protocol for failure detection and maintaining cluster (node in/out) • Every member act as a proxy for 2 one hop routing 6Client 1 7 8
P2P architecture •Advantages Computing or networking is a distributed – High availability application architecture – Efficient for Random Read/write Peers are equally privileged, equipotent participants in the application. – Nature data distribute Peers make a portion of their resources, – Usually One-hop lookup such as processing power, disk storage or – Minimal Administration network bandwidth, directly available to other •Disadvantages network participants, without the need for – Weak of global status central coordination by servers or stable hosts. – More network communications to maintain cluster(log(n)) Usually used in conjunction with the consistent hash
Hierarchy architectureAn example: mongodb Architecture shard1 shard2 shard3 • Clients send queries to mongos Mongod Mongod Mongod Mongod Mongod Mongod servers secondary Arbiter secondary Arbiter secondary Arbiter Replica set Replica set Replica set • Mongoses act as routing servers, queries are automatically routed Mongod Mongod Mongod primary primary primary to the appropriate shard • Each shard consists of multiple replicated servers per shard to Config ensure availability and automatedserver1 failover. The set of servers within the shard comprise a replica set. Configserver2 mongos mongos … • The config servers store the clusters metadata, each config server has a complete copy of all Configserver3 metadata, and if meta data is changed, it will sent to Mongos for client client client … update routing information.
Hierarchy architecture An example: mongo db Architecture(2) client Data storage layer client Routing server is grouped into replica sets, not only Meta data storage act as data serving Routing server …. Routing server also as data and Meta data service availability storage mechanism Meta data storage Meta data Routing server storage Data storage Routing servers Data storage Meta data storage is scalable and is not a single point, Mongod Mongod store nothing Routing servers secondary ArbiterDistinct hierarchy two phase submitdependency can be deployed is used, and the Mongod up to client/APP, responsibilities of primary or down to data meta data servers storage decrease
Hierarchy architecture Distinct hierarchy dependency •Advantages – High availability Especially with a routing layer – No single point failure – Each layer scalable alone Less responsibility of client – Flexible routing layer •Disadvantages No clear data flow and control – Lower efficiency flow – Complex administrate
What about the performance with the system?What about the key features of the system?
CAP Classification • Consistency ,means all nodes see the same data at the same time •Availability ,a guarantee that every request receives a response about whether it was successful or failed •Partition tolerance ,the system continues to operate despite arbitrary message loss
All about Redundancy What’s the problems come from? Request Request Request Redundancy is anywhere in distributed Service systems, especially with Service Service Commodity hardware Consistency Availability Partitioning Data storage Data storage Data storage Reliability Concurrency Throughputs
Consistency mechanism Two phase submit Strong consistency • Consistency is opposite with Performance and Availability Master-slave Master-Slave architecture Eventual consistency systems (such as HBase, BigTable) adopted lower availability and strong Strong consistency consistency Quorum Hierarchy & P2P systems choose to Eventual consistency do strong consistency at the expense of decreasing reading Strong consistency performance Paxos Strong consistency
Two-phase commitAn example: GFS lease implementation •The commit-request phase : client push all data to replicas(step3), and send submit request to primary replica (step4) •The commit phase: Primary replica request replica A and replica B to submit the data(step 5), replica A & replica B response “yes”(step 6), the submit is successful(step 7).
Master-slaveAn Example: MongoDB replica sets Read only Write Read • Master can be read and write •Replicas/slaves are read only Sync Replica Master Eventually Consistency But Read only Performance and Availability higher Sync Replica Write • Only Master can be read and write Read • Replicas/slaves only for backup Sync Replica Master Strong Consistency Sync Replica
Quorum• Configurable consistency N: number of replicas R: minimum number of successful read W: minimum number of successful write• Usually with anti-entropy using Merkle trees for replica synchronization and Read Repair for Keep consistency• (N, R, W) Tradeoff between consistency and performance – Typical configuration: R(2) + W(2) > N(3), – R + W > N yields a quorum-like system, ensure an application can always read the newest data
QuorumAn example: Cassandra Read repair Client Query Result Cassandra Cluster Closest replica Result Read repair if digests differ Replica A Digest Query Digest Response Digest Response Replica B Replica C
Availability mechanism Routing mechanism Typically used in hierarchy architecture See MongoDB mongos implementation, hide the back end server changing Failure detection Distributed coordination. Usually used in master-slave architecture, such as zookeeper in Hbase and chubby in BigTable Gossip protocol Usually used in P2P architecture, e.g. Dynamo & Cassandra Master election Hinted handoff
Availability mechanismMaster election Is Used for failover MongoDB replica set When a cluster consist of a Negotiate New master Mongod Mongod Mongod group of n and one of them act secondary primary Arbiter as master/primary node. If the node fails, the cluster will elect a new master/primary Mongod Mongod Mongod secondary down primary recovering node. •Each node can be primary •Secondary nodes can only act as arbiter or data nodes and arbiter
HBase Master election Zookeeper •Zookeeper act as a Arbiter, and keep a “token” for Hbase master, The node which get the “token” will act as master. Region Server •If HMaster fails, the “token” that it toke form zookeeper will be released , the secondary HMaster will act as Hmaster •Then, Zookeeper will send the change to Every nodes in the cluster HMaster Secondary HMaster
Hinted HandoffFor temporary failure Hash(k) A Writes are performed on the first N healthy nodes found by the coordinator. G B If a node is down, data will be sent to the next node in the ring. F C This node will keep track of the intended recipient and send later. Replicas are stored at multiple data centers for E D handing the failure of the whole data center • So called always writeable in Cassandra
Data partitioning & Scalability mechanism Hierarchically structure Multi-levels hierarchy organization 3 levels in BigTable, HBase and Hypertable(root->meta->user) 2 levels in mongo DB(meta->user) •Advantages Key range split/auto sharding for data partitioning – Automatic balancing for changes in data distribution – High performance in range query – Nearly unlimited data storage •Disadvantages – Sequence write not efficient
Scalability mechanism Consistent hash h(key1) 1 0 E •Advantages – Nature balancing for data A N=3 partitioning &distribution – High performance in C random operations •Disadvantages – Non-uniform data/loadh(key2) F distribution – Disregard of the heterogeneity of node performance – Moving data when nod B D in/out – Not good for sequence operations and range query 1/2 45
Data Durability mechanism Write ahead log Is a family of techniques for providing atomicity and durability (two of the ACID properties) in database systems. In a system using WAL, all modifications are written to a log before they are applied. Usually both redo and undo information is stored in the log. Data replica DFS (Hbase, hypertable,bigtable) Embedded Redundancy(cassandra, mongo DB)
Data Durability mechanismAn example: HBase WAL • Log Flushing Data streams written to a file system • Log Rolling Back check database persistence and the logs, then remove all the logs before last database persistence operations. • Log Replaying Replaying a log is simply done by reading a log and adding its entries to the database and then flush the data to disks. It can be used for fault recovery
Summary Consistency Avalaibility Data Partitioning Data Durability Scalability failover Hierarchically Two phase submit Routing mechanism Table split/auto sharding DFS structure Reassign Master-slave Failure detection consistent Hash Data Redundancy Consistent Hash Master election Multi-routing Quorum Master election process Hinted handoff Hinted handoff replica set/group replica factor