Big data, Cloud Computing and No SQL

SELA DEVELOPER PRACTICE
December 15-19, 2013

Manu Cohen-Yashar

The Cloud, Big Data and
NoSQL

© Copyright SELA software & Education Labs Ltd. | 14-18 Baruch Hirsch St Bnei Brak, 51202 Israel | www.selagroup.com

Agenda
What is the cloud
Data boom
No SQL
Big Data
Cloud Distributions
What’s next

Make sense of : Cloud , Big Data and No SQL
How they fit together

Make money !!!

What is the cloud
Cloud Computing is an Idea …
Infrastructure is provisioned by a cloud
provider.
Automatic Scale.
Elasticity. Pay as you use.
Availability.
Simple, Automatic, Economic.

Type of Clouds
IAAS
PAAS
SAAS
and more…
Identity As A Service
Connectivity As A Service

Storage As A Service

Lots of Data
Data is doubles every 18 month
Pictures
Web site
emails
Sensors
Geo Information
Financial Information
Science
Art
. . . (Infinite list)

No Limits
With the cloud it is now possible to mount any
size if cluster and conduct any computation in
any scale.
The one who will make sense of all available
data will rule the world.

The conclusion:
Use the cloud to analyze large scale of data.

Lets Talk about data
When we think of data we think of …

Data has many forms
Yet data comes in many forms and shapes
Graphs

Time
Series

Documents

Blobs

Geo
Sensors

Structured
Unstructured

Web

No Relational
Not all types of data fit well into the relational
world.
Not all data use cases fit well into the ACID
convention
The relational model does not scale very good
Difficult to distribute
Difficult to replicate

The CAP Theory
During a network partition, a distributed system must choose
either Consistency or Availability.

Sharded
NoSQL

RDBMS

Replicated
NoSQL

NO SQL
Large family of databases
No Schema
No relations enforced
Designed for high scale and distribution

Types of NO SQL DB
Key Value
Wide Columns
Documents
Graph

Motivation for NO SQL
Large Scale and Distribution
Simplicity
Low cost
Good fit with the data model
Volume, Velocity and Variety

Important

There is no one NO SQL solution for all
use cases
There are over than 150 possible offerings…

The Cloud and NO SQL
All Cloud Providers have NO SQL solutions
Azure Tables
Google Big Table
Amazon DynamoDB

NO SQL Databases are deployed on a cluster
There are large number of cloud hosting offerings for
no-sql clusters
MongoHQ (MongoDB)
Cassandra on Google Compute engine
Many more

Big Data
What is Big?
“Big” cannot fit on a single machine.

Conclusion:
Big data has to be distributed.

Types of Big Data Processing
Query
General Analysis
Classification
Recommendation
Clustering
Auditing and monitoring
More…

Challenges
Develop a parallel algorithm
Reduce the network traffic -> bring compute to
data
Monitor and manage large number of parallel
tasks
Survive failures
Performance
Linear scale

Batch Processing VS Operational
Intelligence
Batch Processing
Work on existing data
Provide results within minutes

Operational Intelligence
Work on stream of data
Provide real-time results

Distributed File System
No one server can store Big Data files
Distribute files across cluster
Failure is part of the game
Similar API to traditional File Systems
Examples:
HDFS
GFS
Cassandra FS
Mongo FS

Hadoop
Big Data Analysis Platform
Batch Processing
Brings Compute tasks to data nodes
Parallel Processing using Map-Reduce
Open Source
Huge eco system

Hadoop Eco System
Writing a valuable Map-Reduce job for Hadoop
is not simple
Many open source projects provide
abstractions
Pig
Hive
HBase
Sqoop
Mahout
ZooKeeper
More

Hadoop on the Cloud
Hadoop runs on a cluster
You can use a cluster as a service on major
cloud offerings

Storm
Real-Time big data analytics
Process streams of data
Can be used with any programming language
Wide integration with data sources

Check your schema
Be open to use NO-SQL data stores
Identify your use-case and find the right
database for you
Create a simple POC

Look for Big Data
Ask yourself: What can I gain from big data?
How the new data or analysis scope can enhance
your existing set of capabilities?
What additional opportunities for intervention or
processes optimisation does it present?

Identify your use case and find the right product
and data model.
Look for web distributions and create a simple
POC

Big data, Cloud Computing and No SQL

More Related Content

What's hot

Viewers also liked

Similar to Big data, Cloud Computing and No SQL

Recently uploaded

Big data, Cloud Computing and No SQL

Editor's Notes