No sql bigdata and postgresql

Contents
● Typical RDBMS and Scaling
● Big Data
– Big Data VS Traditional Data
– Big Data Characteristic
– Big Data Technologies
– NoSQL & Hadoop
● NoSQL
● Benefits of NoSQL
● What does NoSQL not Provide
● NoSQL Database Usage
● BASE VS ACID

Contents ...
● NoSQL Challenges
● Breads of NoSQL Solutions
– Key-Value Stores
– Column Family Store
– Document Database Store
● NoSQL with Relational DBMS (EDB)
● Postgres: Key-Value Store
● Postgres: Document Store
– JSON and SQL
– Bridging Between SQL and JSON
– JSON Data Types

Typical RDBMS
● Fixed table schemas
● Small but frequent reads/write
● Large batch transactions
● Focus on ACID
– Atomicity
– Consistency
– Isolation
– Durability

How We Scale RDBMS
Implementation

Build a Relational database
1st
Step
Database

Table Partition
2nd
Step
Database

Database Partitioning
3rd
Step
Cloud
Instance 1Browser
Customer# 1
Web Tier
Business Logic
Tier
Cloud
Instance 2Browser
Customer# 2
Web Tier
Business Logic
Tier
Cloud
Instance 3Browser
Customer# 3
Web Tier
Business Logic
Tier

Big Data
● Lots of structured and sami structured data collected
and warehoused and PB of transactions performed
day by day like on ...
– Web data
– Social networking data
– User personal identify
– Users transactions
● Due to big volume of data which increases day by
day traditional database management solution fail to
provide more performance, elastic scalability for
wider audience e.g...
– Google processes 20 PB + a day (2008)
– Facebook has 2.5 PB of user data (2009)
– Ebay has 6.5 PB of user data (2009)

Big Data VS Traditional Data
● Photograph
● Audio & Video
● 3D model
● Simulations
● Location Data
● ..
● Documents
● Finances
● Inventory records
● Personal files
● ..

Big Data Characteristic
● Volume (High volume of data)
● Velocity (Changes occurrence in data rapidly)
● Variety (Number of new data types)

NoSQL & Hadoop
NoSQL Hadoop
● Real time
read/write system
● Interactive
● Fast read/writes
● Batch data use for
analysis
● Large scale processing
● Massive computer power
User
Transactions
Sensor
data
Both support
● Big volume of data
● Incremental, horizontal scaling
● Varying / Changing data formats
Customer
profiles
Predictive
Analytics
Fraud
Deduction
Recommendations

NoSQL
● Stands for No-SQL or Not only SQL
● Class of non-relational data storage systems
● Usually do not require a fixed table schema nor do
they use the concept of joins
● NoSQL is not ACID compliance.

Benefits of NoSQL
● Elastic scaling: RDBMS might not scale out easily on
commodity clusters, but the new breed of NoSQL
databases are designed to expand transparently to take
advantage of new nodes.
● Flexible Data Model: Enable to work with new data types
like mobile interactions, machine data, social connections
etc.
● Enable you to work in new ways of incremental
development and continuous release.
● Cheap, easy to implement (open source)

Benefits of NoSQL
● Data are replicated to multiple nodes (therefore
identical and fault-tolerant) and can be partitioned
When data is written, the latest version is on at least
one node and then replicated to other nodes.
● No single point of failure
● Easy to distribute
● Don't require a schema

What does NoSQL Not Provide
● Joins
● Group by
● ACID transactions
● SQL
– Integration with applications that are based on SQL

NoSQL Database Usage
● NoSQL Data storage systems makes sense for applications
that need to deal with very very large semi-structured data
– Log Analysis
– Social Networking Feeds
● Scalable replication and distribution
– Potential of thousands of machines
– Potentially distributed around the word
● Query needs to answer quickly
● Mostly data retrieval with few updates
● Schema less with no relation
● ACID transaction properties not needed
● Open Source development

NoSQL Real-World Application
● Emergency Management System
– High variability among data sources required high schema
flexibility
● Massively Open Online Course
– Massive read stability, content integration, low latency
● Patient Data and Prescription Records
– Efficient write stability
● Social Marketing Analytics
– Map reduce analytical approaches
Source: Gartner , A Tour of NoSQL in 8 Use Cases

Where No-SQL Used
● Google (BigTable, LevelDB)
● LinkedIn (Voldemort)
● Facebook (Cassandra)
● Twitter (Hadoop/Hbase, FlockDB, Cassandra)
● Netflix (SimpleDB, Hadoop/HBase, Cassandra)
● CERN (CouchDB)

BASE Transactions
● Autonomic
● Consistency
● Isolation
● Durability
● Basically Available: Highly Available but not
always consistent
● Soft State: Background cleanup mechanism
● Eventually Consistent: copies becomes
consistent at some later time if there are no
more updates to that data item
SQL No-SQL

No-SQL Challenges
● Lack of maturity -- numerous solutions still in their
beta stages
● Lack of commercial support for enterprise users
● Lack of support for data analysis
● Maintenance efforts and skills are required. Experts
are hard to find

Breads of No-SQL Solutions
● Key-Value Stores
● Column Family Stores
● Document Databases
● Graph Databases

Key-Value Stores
● Dynamo, Voldemort, Rhino
DHT …
● Key-Value is based on a hash
table where there is a unique
key and a pointer to a
particular item of data.
● Mappings are usually
accompanied by cache
mechanisms to maximize
performance.

Column Family Store
● BigTable, Cassandra, HBase, Hadoop etc.
● Store and process very large amounts of data
distributed over many machines. "Petabytes of data
across thousands of servers"
● Keys point to multiple columns.

Document Database Stores
● CouchDB, MongoDB, Lotus Notes, Redis …
● Documents are addressed in the database via a
unique key that represents that document.
● Semi-structured documents can be XML or JSON
formatted, for instance.
● In addition to the key, documents can be retrieved
with queries.

Document Database Stores
{
FirstName: "Bart",
LastName: "Loews",
Children: [ {
FirstName:"Tadd",
Age: 4},
{
FirstName:"Todd",
Age:4}
],
Age: 35,
Address:{
number:1234,
street: "Fake road",
City: "Fake City",
state: "VA",
Country: "USA"
}
}

Relational VS Graph DS
Relational Database Store
Graph Database Store

Relational VS NoSQL DBMS Compare
(Functionality, Scalability, Performance)

In EDB, NoSQL implemented through different data types
● HSTORE
– Key-value pair
– Simple, fast and easy
– Ideal for flat data structures
● JSON
– Hierarchical document model
– Introduced in PPAS 9.2/9.3
● JSONB
– Binary version of JSON
– Faster, more operators and even more robust
– Introduced in PPAS 9.4
NoSQL with Relational DBMS
(EDB)

Postgres: Key-Value Store
● HStore contrib module enables storing key/value pairs
with in a single column.
● Allows you to create a schema less ACID complaint data
store with in Postgres.
● Create single HStore column and include, for each row,
only those keys which pertain to record.
● Add attributes to a table and query without advance
planning.
● Combine flexibility with ACID compliance

HStore - Example
● Create a table with Hstore field
– Create table hstore_data (my_data HSTORE);
● Insert a record into hstore_data
– Insert into hstore_data (my_data) values('
“cost”=>”60000”,
“product”=>”iphone”,
“provider”=>”Apple” ');
● Select my_data from hstore_data
– Select my_data from hstore_data;
=========================
“cost”=>”60000”,”product”=>”iphone”, “provider”=>”Apple”
(1 row)

Postgres: Document Store
● JSON is the most popular data-interchange format on
the web.
● Derived from ECMAScript Programming language
standard.
● Supported by virtually every programing language.
● JSON datatype implemented in PPAS 9.2/9.3
● JSONB datatype implemented in PPAS 9.4.

JSONB - Example
● Create a table with JSONB field
– Create table jsonb_data (data JSONB);
● Insert a record into jsonb_data
– Insert into jsonb_data (data) values
(' { “name”: “Apple Phone”,
“type”: “phone”,
“product”: ”iphone”,
“available”: true,
“warranty_years”: 1
} ')

A Query That Return JSON Data
● Select data from JSON_data;
data
============================
{ “name”: “Apple Phone”, “type”: “phone”, “product”:
”iphone”, “available”: true, “warranty_years”: 1 }
Note: This Query return JSON data in its
original format

JSON and SQL
● JSON is naturally integrated
with SQL in Postgres.
● JSON and SQL queries use
the same language, the same
planner, and the same ACID complaint transaction
framework.
● JSON and HSTORE are elegant and easy to use
extensions of the underlying object relational model.

JSON and SQL Example
No need for programming logic to combine SQL and
NoSQL in the application – Postgres does it all

Bridging Between SQL and JSON
● Simple SQL table definition.
– Create table products (id integer, product_name text);
● Select query returning data set
– Select * from products;
● Select query return the same result as a JSON data set
– Select ROW_TO_JSON(products) from products;

JSON Data Types
● Number
– Signed decimal number may contain a fractional part.
– No distinguish between integer and floating point.
● String
– A sequence of zero or more unicode characters.
– Strings are delimited with double quotes mark.
– Supports a backslash escaping character.
● Boolean
– Either of the value of true or false.
● Array
– An ordered list of zero or more values.
– Each value may be of any type.

JSON Data Types
● Array
– Arrays use square bracket notation with element being comma-
separated.
● Objects
– An unordered associative array (name/value pairs).
– Objects are delimited with curly brackets { }
– Comma to separate each pair.
– Each pair the colon ':' character separates the key or name from
its value.
– All keys must be strings and should be distinct from each other
within the object.
● Null
– An empty value, using the word null

JSON, JSONB or HSTORE ?
● JSON/JSONB is more versatile than HSTORE.
● HSTORE provides more structure but its only deal with
text and you can not nest objects.
● JSON or JSONB ?
– If you need any of the following then use JSON
● Storage of validated JSON, without processing or
indexing it.
● Preservation of white spaces in json text.
● Preservation of object key order
● Preservation of duplicate object keys.
● Maximum input/output speed.
– For any other case use JSONB.

Structured or Unstructured ?
“No SQL Only” or “Not Only SQL” ?
● Structure and standard emerge.
● Data has reference
● When the database has duplicate data entries , then the
application has to manage updates in multiple places –
what happens when there is no ACID transactional
model.

Say yes to “Not Only SQL”
● Postgres overcomes many of the standard objections “It
can't be done with conventional database system.”
● Postgres
– Combines both structured and unstructured data.
– Is faster ( for many workloads) than the leading No-
SQL only solutions.
– Integrate easily with web 2.0 application development
environment.
– Can be deploy on client premises or in the
cloud(public/private).
● Do more with Postgres – The enterprise NoSQL Solution.

No sql bigdata and postgresql

More Related Content

What's hot

Viewers also liked

Similar to No sql bigdata and postgresql

More from Zaid Shabbir

Recently uploaded

No sql bigdata and postgresql