Cloud computing UNIT 2.1 presentation in

Quick Survey
• Does anyone have a lot of files on their
computer but dosen’t have the space to
store them?
• Do you worry about losing files?
• Do you want your files to be secure and
easily accessible?

CLOUD STORAGE
• Online file storage centers
or cloud storage providers
allow you to safely upload
your files to the Internet.

Cloud Providers
• There are various providers of cloud storage
• Examples:
– Apple iCloud
– Dropbox
– Google Drive
– Amazon Cloud Drive
– Microsoft SkyDrive

Apple iCloud
• Gives 5GB of free storage (if you want
more, you have to pay for it yearly)
• Allows you to sync all your idevices to iCloud
• Anything you have stored in the iCloud can
be accessed from any idevice
– For example, a document typed on an iPad and
saved to the iCloud can be accessed from an
iPhone.

Dropbox
• Gives you 2GB of free storage
• Allows you to get more free space by
completing certain tasks
• Available in the App Store and Android
Market
• For example, a file uploaded to Dropbox
from a MacBook can be accessed from a
Samsung phone.

Google Drive
• Gives you 15GB of free storage
• Requires a G-mail account to be accessed
• Individuals can buy more storage for themselves
(Google Apps for Business Administrators can buy
storage licenses)
• Allows you to type documents, make
spreadsheets/presentations, etc., and then save
them to your drive.

Amazon Cloud Drive
• Gives you 5GB of free storage (additional
storage is available for $10 a year)
• Available for download on Kindle Fire(HD)
• A Save to CloudDrive app is available in the
App Store for $1.99 and an Amazon Cloud
Pictures is in the Android Market for free.

Microsoft SkyDrive
• Gives you 7GB of free storage (additional
storage can be bought)
• Requires a Windows Live account to be
accessed
• Can be saved to a Windows computer as a
mapped drive.

Which One Should I Pick?
• The cloud provider you pick should fulfill all
your needs as the consumer
• If you need a lot of space, choose a
provider that allows you to get the space
you need
• If you don’t need a lot of space, choose a
provider that you can use for storing files
that don’t take up a lot of space.

The Cloud Universe
• There are tons of different
cloud storage providers in the
world today!!!

The Need
• Component failures normal
– Due to clustered computing
• Files are huge
– By traditional standards (many TB)
• Most mutations are mutations
– Not random access overwrite
• Co-Designing apps & file system
• Typical: 1000 nodes & 300 TB

Desiderata
• Must monitor & recover from comp failures
• Modest number of large files
• Workload
– Large streaming reads + small random reads
– Many large sequential writes
• Random access overwrites don’t need to be efficient
• Need semantics for concurrent appends
• High sustained bandwidth
– More important than low latency

Interface
• Familiar
– Create, delete, open, close, read, write
• Novel(Special operations)
– Snapshot
• Create a coy of file or directory tree at Low cost
– Record append
• Allows clients to add information to an existing file
without overwriting previously written data.

Architecture
Client
Client
Client
Client
Master
Many
Many
{
Chunk
Server
Chunk
Server
Chunk
Server
}
data only

Architecture
• Store all files
– In fixed-size chucks
• 64 MB
• 64 bit unique handle
• Triple redundancy
Chunk
Server
Chunk
Server
Chunk
Server

Architecture
Master
• Stores all metadata
– Namespace
– Access-control information
– Chunk locations
– ‘Lease’ management
• Heartbeats ensure that chunk servers
alive
• Having one master  global knowledge
– Allows better placement / replication
– Simplifies design

Architecture
Client
Client
Client
Client
• GFS code implements API
• Cache only metadata

Using fixed chunk size, translate filename &
byte offset to chunk index.
Send request to master

Replies with chunk handle & location of chunkserver
replicas (including which is ‘primary’)

Cache info
using filename & chunk index as key

Request data from nearest chunkserver
“chunkhandle & index into chunk”

No need to talk more
About this 64MB chunk
Until cached info expires or file reopened

Often initial request asks about
Sequence of chunks

Metadata
• Master stores three types
– File & chunk namespaces
– Mapping from files  chunks
– Location of chunk replicas
• Stored in memory
• Kept persistent thru logging

Consistency Model
Consistent = all clients see same data

Consistency Model
Defined = consistent + clients see full effect
of mutation
Key: all replicas must process chunk-mutation
requests in same order

Consistency Model
Different clients may see different data

Implications
• Apps must rely on appends, not overwrites
• Must write records that
– Self-validate
– Self-identify
• Typical uses
– Single writer writes file from beginning to end,
then renames file (or checkpoints along way)
– Many writers concurrently append
• At-least-once semantics ok
• Reader deal with padding & duplicates

Leases & Mutation Order
• Objective
– Ensure data consistent & defined
– Minimize load on master
• Master grants ‘lease’ to one replica
– Called ‘primary’ chunkserver
• Primary serializes all mutation requests
– Communicates order to replicas

Atomic Appends
• As in last slide, but…
• Primary also checks to see if append spills
over into new chunk
– If so, pads old chunk to full extent
– Tells secondary chunk-servers to do the same
– Tells client to try append again on next chunk
• Usually works because
– max(append-size) < ¼ chunk-size [API rule]
– (meanwhile other clients may be appending)

Other Issues
• Fast snapshot
• Master operation
– Namespace management & locking
– Replica placement & rebalancing
– Garbage collection (deleted / stale files)
– Detecting stale replicas

Master Replication
• Master log & checkpoints replicated
• Outside monitor watches master livelihood
– Starts new master process as needed
• Shadow masters
– Provide read-access when primary is down
– Lag state of true master

B. Ramamurthy
Hadoop File System
4/17/2024 37

Reference
• The Hadoop Distributed File System:
Architecture and Design by Apache
Foundation Inc.
4/17/2024 38

Basic Features: HDFS
• Highly fault-tolerant
• High throughput
• Suitable for applications with large data
sets
• Streaming access to file system data
• Can be built out of commodity hardware
4/17/2024 39

Fault tolerance
• Failure is the norm rather than exception
• A HDFS instance may consist of
thousands of server machines, each
storing part of the file system’s data.
• Since we have huge number of
components and that each component has
non-trivial probability of failure means
that there is always some component that
is non-functional.
• Detection of faults and quick, automatic
4/17/2024 40

Data Characteristics
 Streaming data access
 Applications need streaming access to data
 Batch processing rather than interactive user access.
 Large data sets and files: gigabytes to terabytes size
 High aggregate data bandwidth
 Scale to hundreds of nodes in a cluster
 Tens of millions of files in a single instance
 Write-once-read-many: a file once created, written
and closed need not be changed – this assumption
simplifies coherency
 A map-reduce application or web-crawler application
fits perfectly with this model.
4/17/2024 41

Cat
Bat
Dog
Other
Words
(size:
TByte)
map
map
map
map
split
split
split
split
combine
combine
combine
reduce
reduce
reduce
part0
part1
part2
MapReduce
4/17/2024 42

Namenode and Datanodes
 Master/slave architecture
 HDFS cluster consists of a single Namenode, a master server
that manages the file system namespace and regulates
access to files by clients.
 There are a number of DataNodes usually one per node in a
cluster.
 The DataNodes manage storage attached to the nodes that
they run on.
 HDFS exposes a file system namespace and allows user data
to be stored in files.
 A file is split into one or more blocks and set of blocks are
stored in DataNodes.
 DataNodes: serves read, write requests, performs block
creation, deletion, and replication upon instruction from
Namenode.
4/17/2024 44

HDFS Architecture
4/17/2024 45
Namenode
B
replication
Rack1 Rack2
Client
Blocks
Datanodes Datanodes
Client
Write
Read
Metadata ops
Metadata(Name, replicas..)
(/home/foo/data,6. ..
Block ops

File system Namespace
4/17/2024 46
• Hierarchical file system with directories
and files
• Create, remove, move, rename etc.
• Namenode maintains the file system
• Any meta information changes to the file
system recorded by the Namenode.
• An application can specify the number of
replicas of the file needed: replication
factor of the file. This information is
stored in the Namenode.

Data Replication
4/17/2024 47
 HDFS is designed to store very large files across
machines in a large cluster.
 Each file is a sequence of blocks.
 All blocks in the file except the last are of the
same size.
 Blocks are replicated for fault tolerance.
 Block size and replicas are configurable per file.
 The Namenode receives a Heartbeat and a
BlockReport from each DataNode in the cluster.
 BlockReport contains all the blocks on a Datanode.

Replica Placement
4/17/2024 48
 The placement of the replicas is critical to HDFS reliability and performance.
 Optimizing replica placement distinguishes HDFS from other distributed file systems.
 Rack-aware replica placement:
 Goal: improve reliability, availability and network bandwidth utilization
 Research topic
 Many racks, communication between racks are through switches.
 Network bandwidth between machines on the same rack is greater than those in
different racks.
 Namenode determines the rack id for each DataNode.
 Replicas are typically placed on unique racks
 Simple but non-optimal
 Writes are expensive
 Replication factor is 3
 Another research topic?
 Replicas are placed: one on a node in a local rack, one on a different node in the local rack
and one on a node in a different rack.
 1/3 of the replica on a node, 2/3 on a rack and 1/3 distributed evenly across remaining
racks.

Replica Selection
4/17/2024 49
• Replica selection for READ operation:
HDFS tries to minimize the bandwidth
consumption and latency.
• If there is a replica on the Reader node
then that is preferred.
• HDFS cluster may span multiple data
centers: replica in the local data center is
preferred over the remote one.

Safemode Startup
4/17/2024 50
 On startup Namenode enters Safemode.
 Replication of data blocks do not occur in Safemode.
 Each DataNode checks in with Heartbeat and
BlockReport.
 Namenode verifies that each block has acceptable
number of replicas
 After a configurable percentage of safely replicated
blocks check in with the Namenode, Namenode exits
Safemode.
 It then makes the list of blocks that need to be
replicated.
 Namenode then proceeds to replicate these blocks to
other Datanodes.

Filesystem Metadata
4/17/2024 51
• The HDFS namespace is stored by
Namenode.
• Namenode uses a transaction log called
the EditLog to record every change that
occurs to the filesystem meta data.
– For example, creating a new file.
– Change replication factor of a file
– EditLog is stored in the Namenode’s local
filesystem
• Entire filesystem namespace including

Namenode
4/17/2024 52
 Keeps image of entire file system namespace and file
Blockmap in memory.
 4GB of local RAM is sufficient to support the above
data structures that represent the huge number of
files and directories.
 When the Namenode starts up it gets the FsImage
and Editlog from its local file system, update FsImage
with EditLog information and then stores a copy of
the FsImage on the filesytstem as a checkpoint.
 Periodic checkpointing is done. So that the system
can recover back to the last checkpointed state in
case of a crash.

Datanode
4/17/2024 53
 A Datanode stores data in files in its local file
system.
 Datanode has no knowledge about HDFS filesystem
 It stores each block of HDFS data in a separate file.
 Datanode does not create all files in the same
directory.
 It uses heuristics to determine optimal number of
files per directory and creates directories
appropriately:
Research issue?
 When the filesystem starts up it generates a list of
all HDFS blocks and send this report to Namenode:
Blockreport.

The Communication Protocol
4/17/2024 55
 All HDFS communication protocols are layered on
top of the TCP/IP protocol
 A client establishes a connection to a
configurable TCP port on the Namenode machine.
It talks ClientProtocol with the Namenode.
 The Datanodes talk to the Namenode using
Datanode protocol.
 RPC abstraction wraps both ClientProtocol and
Datanode protocol.
 Namenode is simply a server and never initiates a
request; it only responds to RPC requests issued
by DataNodes or clients.

Objectives
• Primary objective of HDFS is to store
data reliably in the presence of failures.
• Three common failures are: Namenode
failure, Datanode failure and network
partition.
4/17/2024 57

DataNode failure and
heartbeat
• A network partition can cause a subset of
Datanodes to lose connectivity with the
Namenode.
• Namenode detects this condition by the
absence of a Heartbeat message.
• Namenode marks Datanodes without
Hearbeat and does not send any IO
requests to them.
• Any data registered to the failed
Datanode is not available to the HDFS.
4/17/2024 58

Re-replication
• The necessity for re-replication may
arise due to:
– A Datanode may become unavailable,
– A replica may become corrupted,
– A hard disk on a Datanode may fail, or
– The replication factor on the block may be
increased.
4/17/2024 59

Cluster Rebalancing
• HDFS architecture is compatible with
data rebalancing schemes.
• A scheme might move data from one
Datanode to another if the free space on
a Datanode falls below a certain
threshold.
• In the event of a sudden high demand for
a particular file, a scheme might
dynamically create additional replicas and
rebalance other data in the cluster.
4/17/2024 60

Data Integrity
• Consider a situation: a block of data
fetched from Datanode arrives
corrupted.
• This corruption may occur because of
faults in a storage device, network faults,
or buggy software.
• A HDFS client creates the checksum of
every block of its file and stores it in
hidden files in the HDFS namespace.
• When a clients retrieves the contents of
4/17/2024 61

Metadata Disk Failure
• FsImage and EditLog are central data structures of
HDFS.
• A corruption of these files can cause a HDFS instance
to be non-functional.
• For this reason, a Namenode can be configured to
maintain multiple copies of the FsImage and EditLog.
• Multiple copies of the FsImage and EditLog files are
updated synchronously.
• Meta-data is not data-intensive.
• The Namenode could be single point failure: automatic
failover is NOT supported! Another research topic.
4/17/2024 62

DATA ORGANIZATION
4/17/2024 63

Data Blocks
• HDFS support write-once-read-many with
reads at streaming speeds.
• A typical block size is 64MB (or even 128
MB).
• A file is chopped into 64MB chunks and
stored.
4/17/2024 64

Staging
• A client request to create a file does not
reach Namenode immediately.
• HDFS client caches the data into a
temporary file. When the data reached a
HDFS block size the client contacts the
Namenode.
• Namenode inserts the filename into its
hierarchy and allocates a data block for
it.
• The Namenode responds to the client
4/17/2024 65

Staging (contd.)
• The client sends a message that the file
is closed.
• Namenode proceeds to commit the file
for creation operation into the persistent
store.
• If the Namenode dies before file is
closed, the file is lost.
• This client side caching is required to
avoid network congestion; also it has
precedence is AFS (Andrew file system).
4/17/2024 66

Replication Pipelining
• When the client receives response from
Namenode, it flushes its block in small
pieces (4K) to the first replica, that in
turn copies it to the next replica and so
on.
• Thus data is pipelined from Datanode to
the next.
4/17/2024 67

API (ACCESSIBILITY)
4/17/2024 68

Application Programming
Interface
• HDFS provides Java API for application
to use.
• Python access is also used in many
applications.
• A C language wrapper for Java API is also
available.
• A HTTP browser can be used to browse
the files of a HDFS instance.
4/17/2024 69

FS Shell, Admin and Browser
Interface
• HDFS organizes its data in files and
directories.
• It provides a command line interface
called the FS shell that lets the user
interact with data in the HDFS.
• The syntax of the commands is similar to
bash and csh.
• Example: to create a directory /foodir
/bin/hadoop dfs –mkdir /foodir
• There is also DFSAdmin interface
4/17/2024 70

Space Reclamation
• When a file is deleted by a client, HDFS
renames file to a file in be the /trash
directory for a configurable amount of
time.
• A client can request for an undelete in
this allowed time.
• After the specified time the file is
deleted and the space is reclaimed.
• When the replication factor is reduced,
the Namenode selects excess replicas
4/17/2024 71

Summary
• We discussed the features of the
Hadoop File System, a peta-scale file
system to handle big-data sets.
• What discussed: Architecture, Protocol,
API, etc.
• Missing element: Implementation
– The Hadoop file system (internals)
– An implementation of an instance of the
HDFS (for use by applications such as web
crawlers).
4/17/2024 72

BigTable: A Distributed
Storage System for
Structured Data
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson
C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar
Chandra, Andrew Fikes, Robert E. Gruber @ Google
Presented by Richard Venutolo

Introduction
• BigTable is a distributed storage system for
managing structured data.
• Designed to scale to a very large size
– Petabytes of data across thousands of servers
• Used for many Google projects
– Web indexing, Personalized Search, Google Earth,
Google Analytics, Google Finance, …
• Flexible, high-performance solution for all of
Google’s products

Motivation
• Lots of (semi-)structured data at Google
– URLs:
• Contents, crawl metadata, links, anchors, pagerank, …
– Per-user data:
• User preference settings, recent queries/search results, …
– Geographic locations:
• Physical entities (shops, restaurants, etc.), roads, satellite
image data, user annotations, …
• Scale is large
– Billions of URLs, many versions/page (~20K/version)
– Hundreds of millions of users, thousands or q/sec
– 100TB+ of satellite image data

Why not just use commercial DB?
• Scale is too large for most commercial databases
• Even if it weren’t, cost would be very high
– Building internally means system can be applied across
many projects for low incremental cost
• Low-level storage optimizations help performance
significantly
– Much harder to do when running on top of a database
layer

Goals
• Want asynchronous processes to be continuously
updating different pieces of data
– Want access to most current data at any time
• Need to support:
– Very high read/write rates (millions of ops per second)
– Efficient scans over all or interesting subsets of data
– Efficient joins of large one-to-one and one-to-many
datasets
• Often want to examine data changes over time
– E.g. Contents of a web page over multiple crawls

BigTable
• Distributed multi-level map
• Fault-tolerant, persistent
• Scalable
– Thousands of servers
– Terabytes of in-memory data
– Petabyte of disk-based data
– Millions of reads/writes per second, efficient scans
• Self-managing
– Servers can be added/removed dynamically
– Servers adjust to load imbalance

Building Blocks
• Building blocks:
– Google File System (GFS): Raw storage
– Scheduler: schedules jobs onto machines
– Lock service: distributed lock manager
– MapReduce: simplified large-scale data processing
• BigTable uses of building blocks:
– GFS: stores persistent data (SSTable file format for
storage of data)
– Scheduler: schedules jobs involved in BigTable serving
– Lock service: master election, location bootstrapping
– Map Reduce: often used to read/write BigTable data

Basic Data Model
• A BigTable is a sparse, distributed persistent
multi-dimensional sorted map
(row, column, timestamp) -> cell contents
• Good match for most Google applications

WebTable Example
• Want to keep copy of a large collection of web pages and
related information
• Use URLs as row keys
• Various aspects of web page as column names
• Store contents of web pages in the contents: column
under the timestamps when they were fetched.

Rows
• Name is an arbitrary string
– Access to data in a row is atomic
– Row creation is implicit upon storing data
• Rows ordered lexicographically
– Rows close together lexicographically usually on
one or a small number of machines

Rows (cont.)
Reads of short row ranges are efficient and
typically require communication with a small
number of machines.
• Can exploit this property by selecting row
keys so they get good locality for data
access.
• Example:
math.gatech.edu, math.uga.edu, phys.gatech.edu, phys.uga.edu
VS
edu.gatech.math, edu.gatech.phys, edu.uga.math, edu.uga.phys

Columns
• Columns have two-level name structure:
• family:optional_qualifier
• Column family
– Unit of access control
– Has associated type information
• Qualifier gives unbounded columns
– Additional levels of indexing, if desired

Timestamps
• Used to store different versions of data in a cell
– New writes default to current time, but timestamps for writes can also be
set explicitly by clients
• Lookup options:
– “Return most recent K values”
– “Return all values in timestamp range (or all values)”
• Column families can be marked w/ attributes:
– “Only retain most recent K values in a cell”
– “Keep values until they are older than K seconds”

Implementation – Three Major
Components
• Library linked into every client
• One master server
– Responsible for:
• Assigning tablets to tablet servers
• Detecting addition and expiration of tablet servers
• Balancing tablet-server load
• Garbage collection
• Many tablet servers
– Tablet servers handle read and write requests to its
table
– Splits tablets that have grown too large

Implementation (cont.)
• Client data doesn’t move through master
server. Clients communicate directly with
tablet servers for reads and writes.
• Most clients never communicate with the
master server, leaving it lightly loaded in
practice.

Tablets
• Large tables broken into tablets at row boundaries
– Tablet holds contiguous range of rows
• Clients can often choose row keys to achieve locality
– Aim for ~100MB to 200MB of data per tablet
• Serving machine responsible for ~100 tablets
– Fast recovery:
• 100 machines each pick up 1 tablet for failed machine
– Fine-grained load balancing:
• Migrate tablets away from overloaded machine
• Master makes load-balancing decisions

Tablet Location
• Since tablets move around from server to
server, given a row, how do clients find the
right machine?
– Need to find tablet whose row range covers the
target row

Tablet Assignment
• Each tablet is assigned to one tablet
server at a time.
• Master server keeps track of the set of
live tablet servers and current
assignments of tablets to servers. Also
keeps track of unassigned tablets.
• When a tablet is unassigned, master
assigns the tablet to an tablet server
with sufficient room.

API
• Metadata operations
– Create/delete tables, column families, change metadata
• Writes (atomic)
– Set(): write cells in a row
– DeleteCells(): delete cells in a row
– DeleteRow(): delete all cells in a row
• Reads
– Scanner: read arbitrary cells in a bigtable
• Each row read is atomic
• Can restrict returned rows to a particular range
• Can ask for just data from 1 row, all rows, etc.
• Can ask for all columns, just certain column families, or specific columns

Refinements: Locality Groups
• Can group multiple column families into a
locality group
– Separate SSTable is created for each locality
group in each tablet.
• Segregating columns families that are not
typically accessed together enables more
efficient reads.
– In WebTable, page metadata can be in one
group and contents of the page in another
group.

Refinements: Compression
• Many opportunities for compression
– Similar values in the same row/column at different
timestamps
– Similar values in different columns
– Similar values across adjacent rows
• Two-pass custom compressions scheme
– First pass: compress long common strings across a large
window
– Second pass: look for repetitions in small window
• Speed emphasized, but good space reduction (10-
to-1)

Refinements: Bloom Filters
• Read operation has to read from disk when
desired SSTable isn’t in memory
• Reduce number of accesses by specifying a Bloom
filter.
– Allows us ask if an SSTable might contain data for a
specified row/column pair.
– Small amount of memory for Bloom filters drastically
reduces the number of disk seeks for read operations
– Use implies that most lookups for non-existent rows or
columns do not need to touch disk

CS525: Special Topics in DBs
Large-Scale Data
Management
HBase
Spring 2013
WPI, Mohamed Eltabakh
96

HBase: Overview
• HBase is a distributed column-oriented
data store built on top of HDFS
• HBase is an Apache open source project
whose goal is to provide storage for the
Hadoop Distributed Computing
• Data is logically organized into tables,
rows and columns
97

HBase: Part of Hadoop’s
Ecosystem
98
HBase is built on top of HDFS
HBase files are
internally
stored in HDFS

HBase vs. HDFS
• Both are distributed systems that scale
to hundreds or thousands of nodes
• HDFS is good for batch processing (scans
over big files)
– Not good for record lookup
– Not good for incremental addition of small
batches
– Not good for updates
99

HBase vs. HDFS (Cont’d)
• HBase is designed to efficiently
address the above points
– Fast record lookup
– Support for record-level insertion
– Support for updates (not in place)
• HBase updates are done by creating
new versions of values
100

HBase vs. HDFS (Cont’d)
101
If application has neither random reads or writes  Stick to HDFS

HBase Data Model
• HBase is based on Google’s
Bigtable model
– Key-Value pairs
103
Row key
Column Family
value
TimeStamp

HBase: Keys and Column Families
105
Each row has a Key
Each record is divided into Column Families
Each column family consists of one or more Columns

• Key
– Byte array
– Serves as the primary
key for the table
– Indexed far fast lookup
• Column Family
– Has a name (string)
– Contains one or more
related columns
• Column
– Belongs to one column
family
– Included inside the row
• familyName:columnNa
me
106
Row key
Time
Stamp
Column
“content
s:”
Column “anchor:”
“com.apac
he.ww
w”
t12
“<html>
…”
t11
“<html>
…”
t10
“anchor:apache
.com”
“APACH
E”
“com.cnn.w
ww”
t15
“anchor:cnnsi.co
m”
“CNN”
t13
“anchor:my.look.
ca”
“CNN.co
m”
t6
“<html>
…”
t5
“<html>
…”
t3
“<html>
…”
Column family named “Contents”
Column family named “anchor”
Column named “apache.com”

• Version Number
– Unique within each
key
– By default
System’s
timestamp
– Data type is Long
• Value (Cell)
– Byte array
107
Row key
Time
Stamp
Column
“content
s:”
“com.apac
he.ww
w”
t12
“<html>
…”
t11
“<html>
…”
t10
“anchor:apache
.com”
“APACH
E”
“com.cnn.w
ww”
t15
“anchor:cnnsi.co
m”
“CNN”
t13
“anchor:my.look.
ca”
“CNN.co
m”
t6
“<html>
…”
t5
“<html>
…”
t3
“<html>
…”
Version number for each row
value

Notes on Data Model
• HBase schema consists of several Tables
• Each table consists of a set of Column Families
– Columns are not part of the schema
• HBase has Dynamic Columns
– Because column names are encoded inside the cells
– Different cells can have different columns
108
“Roles” column family
has different columns
in different cells

Notes on Data Model (Cont’d)
• The version number can be user-supplied
– Even does not have to be inserted in increasing order
– Version number are unique within each key
• Table can be very sparse
– Many cells are empty
• Keys are indexed as the primary key
Has two columns
[cnnsi.com & my.look.ca]

HBase Physical Model
• Each column family is stored in a separate file (called HTables)
• Key & Version numbers are replicated with each column family
• Empty cells are not stored
111
HBase maintains a multi-
level index on values:
<key, column family,
column name, timestamp>

HBase Regions
• Each HTable (column family) is
partitioned horizontally into regions
– Regions are counterpart to HDFS blocks
114
Each will be one region

Three Major Components
116
• The HBaseMaster
– One master
• The HRegionServer
– Many region
servers
• The HBase client

HBase Components
• Region
– A subset of a table’s rows, like horizontal
range partitioning
– Automatically done
• RegionServer (many slaves)
– Manages data regions
– Serves data for reads and writes (using a log)
• Master
– Responsible for coordinating the slaves
– Assigns regions, detects failures
– Admin functions
117

ZooKeeper
• HBase depends on
ZooKeeper
• By default HBase manages
the ZooKeeper instance
– E.g., starts and stops
ZooKeeper
• HMaster and HRegionServers
register themselves with
ZooKeeper
119

Creating a Table
HBaseAdmin admin= new
HBaseAdmin(config);
HColumnDescriptor []column;
column= new HColumnDescriptor[2];
column[0]=new
HColumnDescriptor("columnFamily1:");
column[1]=new
HColumnDescriptor("columnFamily2:");
HTableDescriptor desc= new
HTableDescriptor(Bytes.toBytes("MyTable"));
desc.addFamily(column[0]);
desc.addFamily(column[1]);
admin.createTable(desc);
120

Operations On Regions: Get()
• Given a key  return corresponding record
• For each value return the highest version
121
• Can control the number of versions you want

Operations On Regions: Scan()
122

Get()
Row key
Time
Stamp
“com.apache.www”
t12
t11
t10 “anchor:apache.com” “APACHE”
“com.cnn.www”
t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
t6
t5
t3
Select value from table where
key=‘com.apache.www’ AND
label=‘anchor:apache.com’

Scan() Select value from table
where anchor=‘cnnsi.com’
Row key
Time
Stamp
“com.apache.www”
t12
t11
t10 “anchor:apache.com” “APACHE”
“com.cnn.www”
t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
t6
t5
t3

Operations On Regions: Put()
• Insert a new record (with a new key), Or
• Insert a record for an existing key
125
Implicit version number
(timestamp)
Explicit version number

Operations On Regions: Delete()
• Marking table cells as deleted
• Multiple levels
– Can mark an entire column family as deleted
– Can make all column families of a given row as deleted
126
• All operations are logged by the
RegionServers
• The log is flushed periodically

HBase: Joins
• HBase does not support joins
• Can be done in the application layer
– Using scan() and get() operations
127

Altering a Table
128
Disable the table before changing the
schema

HBase Deployment
130
Master
node
Slave
nodes

Cloud Storage – A look at
Amazon’s Dyanmo
A presentation that look’s at Amazon’s Dynamo service (based
on a research paper published by Amazon.com) as well as
related cloud storage implementations

The Traditional
• Cloud Data Services are traditionally oriented around
Relational Database systems
– Oracle, Microsoft SQL Server and even MySQL have traditionally powered enterprise and online
data clouds
– Clustered - Traditional Enterprise RDBMS provide the ability to cluster and replicate data
over multiple servers – providing reliability
– Highly Available – Provide Synchronization (“Always Consistent”), Load-Balancing and High-
Availability features to provide nearly 100% Service Uptime
– Structured Querying – Allow for complex data models and structured querying – It is possible
to off-load much of data processing and manipulation to the back-end database

The Traditional
• However, Traditional RDBMS clouds are:
EXPENSIVE!
To maintain, license and store large amounts of data
• The service guarantees of traditional enterprise relational databases like Oracle, put
high overheads on the cloud
• Complex data models make the cloud more expensive to maintain, update and keep
synchronized
• Load distribution often requires expensive networking equipment
• To maintain the “elasticity” of the cloud, often requires expensive upgrades to the
network

The Solution
• Downgrade some of the service guarantees of traditional
RDBMS
– Replace the highly complex data models Oracle and SQL Server offer, with a simpler one – This
means classifying service data models based on the complexity of the data model they may
required
– Replace the “Always Consistent” guarantee synchronization model with an “Eventually Consistent”
model – This means classifying services based on how “updated” its data set must be
Redesign or distinguish between services that require a simpler data model and
lower expectations on consistency.
We could then offer something different from traditional RDBMS!

The Solution
• Amazon’s Dynamo – Used by Amazon’s EC2 Cloud Hosting Service. Powers their Elastic
Storage Service called S2 as well as their E-commerce platform
Offers a simple Primary-key based data model. Stores vast amounts of information on distributed,
low-cost virtualized nodes
• Google’s BigTable – Google’s principle data cloud, for their services – Uses a more complex
column-family data model compared to Dynamo, yet much simpler than traditional RMDBS
Google’s underlying file-system provides the distributed architecture on low-cost nodes
• Facebook’s Cassandra – Facebook’s principle data cloud, for their services.
This project was recently open-sourced. Provides a data-model similar to Google’s BigTable, but the
distributed characteristics of Amazon’s Dynamo

Dynamo - Motivation
• Build a distributed storage system:
– Scale
– Simple: key-value
– Highly available
– Guarantee Service Level Agreements (SLA)

System Assumptions and Requirements
• Query Model: simple read and write operations to a data item
that is uniquely identified by a key.
• ACID Properties: Atomicity, Consistency, Isolation,
Durability.
• Efficiency: latency requirements which are in general measured
at the 99.9th percentile of the distribution.
• Other Assumptions: operation environment is assumed to
be non-hostile and there are no security related requirements such as
authentication and authorization.

Service Level Agreements (SLA)
• Application can deliver its
functionality in abounded
time: Every dependency in the
platform needs to deliver its
functionality with even tighter
bounds.
• Example: service guaranteeing
that it will provide a response
within 300ms for 99.9% of its
requests for a peak client load of
500 requests per second.
Service-oriented architecture of
Amazon’s platform

Design Consideration
• Sacrifice strong consistency for availability
• Conflict resolution is executed during read
instead of write, i.e. “always writeable”.
• Other principles:
– Incremental scalability.
– Symmetry.
– Decentralization.
– Heterogeneity.

Summary of techniques used in
Dynamo and their advantages
Problem Technique Advantage
Partitioning Consistent Hashing Incremental Scalability
High Availability for writes
Vector clocks with reconciliation
during reads
Version size is decoupled from
update rates.
Handling temporary failures Sloppy Quorum and hinted handoff Provides high availability and
durability guarantee when some of
the replicas are not available.
Recovering from permanent
failures
Anti-entropy using Merkle trees
Synchronizes divergent replicas in
the background.
Membership and failure detection
Gossip-based membership protocol
and failure detection.
Preserves symmetry and avoids
having a centralized registry for
storing membership and node
liveness information.

Partition Algorithm
• Consistent hashing: the output
range of a hash function is treated as a
fixed circular space or “ring”.
• ”Virtual Nodes”: Each node can
be responsible for more than one
virtual node.

Advantages of using virtual
nodes
• If a node becomes unavailable the
load handled by this node is
evenly dispersed across the
remaining available nodes.
• When a node becomes available
again, the newly available node
accepts a roughly equivalent
amount of load from each of the
other available nodes.
• The number of virtual nodes that
a node is responsible can decided
based on its capacity, accounting
for heterogeneity in the physical
infrastructure.

Replication
• Each data item is
replicated at N hosts.
• “preference list”: The
list of nodes that is
responsible for storing a
particular key.

Data Versioning
• A put() call may return to its caller before
the update has been applied at all the
replicas
• A get() call may return many versions of the
same object.
• Challenge: an object having distinct version sub-histories, which
the system will need to reconcile in the future.
• Solution: uses vector clocks in order to capture causality
between different versions of the same object.

Vector Clock
• A vector clock is a list of (node, counter)
pairs.
• Every version of every object is associated
with one vector clock.
• If the counters on the first object’s clock
are less-than-or-equal to all of the nodes in
the second clock, then the first is an
ancestor of the second and can be
forgotten.

Execution of get () and put ()
operations
1. Route its request through a generic load
balancer that will select a node based on
load information.
2. Use a partition-aware client library that
routes requests directly to the appropriate
coordinator nodes.

Sloppy Quorum
• R/W is the minimum number of nodes that
must participate in a successful read/write
operation.
• Setting R + W > N yields a quorum-like
system.
• In this model, the latency of a get (or put)
operation is dictated by the slowest of the
R (or W) replicas. For this reason, R and W
are usually configured to be less than N, to
provide better latency.

Hinted handoff
• Assume N = 3. When A
is temporarily down or
unreachable during a
write, send replica to D.
• D is hinted that the
replica is belong to A
and it will deliver to A
when A is recovered.
• Again: “always
writeable”

Other techniques
• Replica synchronization:
– Merkle hash tree.
• Membership and Failure Detection:
– Gossip

Implementation
• Java
• Local persistence component allows for
different storage engines to be plugged in:
– Berkeley Database (BDB) Transactional Data
Store: object of tens of kilobytes
– MySQL: object of > tens of kilobytes
– BDB Java Edition, etc.

Cloud Computing
Advantages:
•Security
•Privacy
Disadvantages
:
• Backup and Recovery
• Almost Unlimited Storage
• Efficiency and Performance
Increasing

Cloud Computing Storage
Security Issues:
1.Traditional Security problems
2. Law Issue
3. Third party Issue

Related Research for Storage
Security:
1.Authorization:
2.Encryption:
3. Segmentation:
• Multi-level
authorization
• The Face Fuzzy
Vault
• RSA Encryption
• AES Encryption
• Hybrid Encryption
• Metadata Based Storage
Model
• Multi-cloud Architecture
Model
DATA
PART 1 PART 2
COPY 1 COPY 2
SERVER SERVER SERVER SERVER
Figure: Segmentation Backup System

Figure: Multi-cloud Architecture Model
Table: Metadata Based Storage
Model Level

Cloud computing UNIT 2.1 presentation in

More Related Content

Similar to Cloud computing UNIT 2.1 presentation in

More from RahulBhole12

Recently uploaded

Cloud computing UNIT 2.1 presentation in