What is Object storage ?

TABLE OF CONTENTS
 STORAGE TYPES
 BLOCK STORAGE
 FILE STORAGE
 OBJECT STORAGE
 WHAT IS AN OBJECT?
 WHAT IS A METADATA ?
 PROTECTION STORED DATA
 ERASURE CODING
 USE CASES

STORAGE TYPES
Block (SAN) File (NAS) Object
Transaction Units Blocks Files Objects, that is, files with custom
metadata
Supported type of update Supports in-place updates Supports in-place updates No in-place update support;
updates create new object
versions
Protocols SCSI, Fibre Channel, SATA CIFS and NFS REST and SOAP over HTTP
Metadata support Fixed system attributes Fixed file-system attributes Support of custom metadata
Best suited for Transactional data and
frequently changing data
Shared file data Relatively static file data and as
cloud storage
Biggest strength High performance Simplified access and
management of shared files
Scalability and distributed
access
Limitations Difficult to extend beyond the
data center
Difficult to extend beyond the
data center
No Suited for frequently
changing transactions data;
doesn’t provide a sharing
protocol with a locking
mechanism

STORAGE TYPES
 Block (SAN)
 The oldest, most basic form of storage
 Stores data as blocks, typically 512 bytes
 Has no knowledge of the information it is storing – context is all in application layer
 Best for IOPs intensive worloads, because each application IO is sonsitent to the storage block size
 File (NAS)
 Builds on top of block storage
 Stores data as files, typically in 4KB blocks
 Has a hierarchical map of files to blocks (paths), and system metadata, but no other knowledge
 Middle of the road, serves many different workloads
 Object
 Abstracts file and block
 Stores data as objects, typically in 1MB blocks
 Has a flat namespace of objects, managed by a relationnal or key/value database – can have rich knowledge of objects
 Best for bandwith intensice workloads and large capacities

BLOCK STORAGE
 Block storage is an unformatted, POSIX-compliant storage device presented to
the host operating system
 The most common examples of Block Storage are SAN, iSCSI, and local disks (be
they JBOD or RAID).
 A Block Storage volume is attached directly to an operating system, and
interactions generally happen within the parameters of a filesystem, although it
is also possible to have a block device that is accessed directly at the bit-level.
 Appropriate for use as the primary storage for file systems, databases, or for any
applications yhat require fine granular updates

FILE STORAGE
 The most common example of File Storage is a NAS (generally using CIFS or
NFS).
 File Storage involves the use of a network file system that acts as an abstraction
layer between the OS and the underlying filesystem on the NAS device. The OS
still sees the storage as a local filesystem, but it is not actually interacting
directly with the filesystem on which the storage resides. Instead, its commands
are interpreted by the network filesystem, and translated to commands of the
underlying filesystem.
 This is convenient, because it allows different operating systems that may or
may not support the actual underlying filesystem to interact with it in a uniform
manner, which is very valuable when multiple machines need to be able to
access the same content on a remote server. In this same vein, features like file
locking (to prevent inconsistent states when multiple servers are writing to the
same file) and access control are almost universal in the File Storage world

OBJECT STORAGE
 Object storage (also known as object-based storage) is a
storage architecture that manages data as objects, as opposed
to other storage architectures like file systems which manage
data as a file hierarchy and block storage which manages data
as blocks within sectors and tracks.
 Each object typically includes the data itself, a variable amount
of metadata, and a globally unique identifier. Object storage can
be implemented at multiple levels, including the device level
(object storage device), the system level, and the interface level.
In each case, object storage seeks to enable capabilities not
addressed by other storage architectures, like interfaces that can
be directly programmable by the application, a namespace that
can span multiple instances of physical hardware, and data
management functions like data replication and data
distribution at object-level granularity.

OBJECT STORAGE
Specifications :
 API-level access vs. filesystem-level
 Flat structure vs. hierarchical structure
 Scalable metadata
 Scalable platform
 Durable data storage
 Low-cost data storage

OBJECT STORAGE
Benefit :
 Scalable capacity (many PB easily)
 Scalable performance (environment-level
performance scales in a linear manner)
 Durable
 Low cost
 Simplified management
 Single Access Point
 No volumes to manage/resize/etc.
Inconvénients :
 No random access to files
 POSIX utilities do not work directly with
object-storage (it is not a filesystem)
 Integration may require modification of
application and workflow logic
 Typically, lower performance on a per-
object basis than block storage

System Metada
• Filename : pix-construction16.JPG
• Created : February 9, 2012
• Last modified : December 22, 2013
Custom Metadata
• Subject : VCF Buildings
• Place taken : Honk Kong
• Category : Works
• Allow sharing : No
WHAT IS AN OBJECT ?
File
System
Metadata
Custom
Metadata
File Class = image

WHAT IS METADATA ?
 Describes the object
 Helps you to find yhe right one
 Tells you what it is
 The specifications
 Used where and when
 Access permissions
 Any and all objects
 Different attributes per object
 And attributes later

WHAT IS METADATA ?
 Metadata lives with the object
Another difference between Object Storage and the other storage types is that object metadata lives
directly in the object, rather than e.g. in a separate inode.
For example, imagine if you wanted to store all of the books in the Library of Congress in a single storage
platform. In addition to the contents of the books, you want to store metadata including the author(s), date
of publication, publisher, subject, ISBN, OCR date and method, copyrights, etc. etc. This data could range
from a few KB to several MB per object. Traditionally, all of this data would have to be stored in a relational
database, and an application built to relate this data to a specific object. Doing this for 35 million (and
growing) objects represents a major challenge with traditional storage platforms. In an Object Storage
system, there is no scalability issue, as this data lives directly with the object, and can be retrieved with a
single API call without the overhead associated with a relational database.

PROTECTING STORED DATA
 RAID
 Redundant Array of Independant Disks
 Divides or replicates data across multiple drives to
deliver performance and fault tolerance
 Commonly used : RAID 0, 1, 5, 6
 Pros
 Trusted protection solution in the traditionnal array
world
 Known performance delivery
 Cons
 High-capacity drive rebuilds can take days or even
weeks
 RAID controllers add complexity for requise
performance
 Erasure Coding
 A parity based protection technique
 Data broken into fragements and encoded
 Stored across different locations with a configurable number
of redundant pieces
 Pros
 Consumes less storage than replication – good for
cheap/deep
 Allows for the failure of two or more elements of storage
system
 Cons
 Parity calculation is CPU-intensive
 Increased latency can slow production writes and rebuilds

ERASURE CODING
 Erasure coding (EC) is a method of data protection in
which data is broken into fragments, expanded and
encoded with redundant data pieces and stored
across a set of different locations or storage media.
 The goal of erasure coding is to enable data that
becomes corrupted at some point in the disk storage
process to be reconstructed by using information
about the data that's stored elsewhere in the array.
Erasure codes are often used instead of
traditional RAID because of their ability to reduce the
time and overhead required to reconstruct data. The
drawback of erasure coding is that it can be more
CPU-intensive, and that can translate into increased
latency.

ERASURE CODING
A
X1
X2
A2
A3
A1
A4
Split Encode
 Split a file into n chunks and code into m parity blocks

ERASURE CODING
X1
X2
X2
X2
A2
 Tolerate m erasures (failures)
A1
A3
A4
X1
X1
=
=
=
=
+
+2
 In a distributed system, chunks are spread across nodes
 In this example, 2 nodes can fail and data can still be
rebuilt
A1
X1
A2
X1
A3
X1+X2
A4
X1+(2)X2
Node 1 Node 2 Node 3 Node 4

ERASURE CODING
 In mathematical terms, the protection offered
by erasure coding can be represented in simple
form by the following equation: n = k + m. The
variable “k” is the original amount of data or
symbols. The variable “m” stands for the extra
or redundant symbols that are added to
provide protection from failures. The variable
“n” is the total number of symbols created after
the erasure coding process.
 For instance, in a 10 of 16 configuration, or EC 10/16, six extra symbols (m) would be added to the 10
base symbols (k). The 16 data fragments (n) would be spread across 16 drives, nodes or geographic
locations. The original file could be reconstructed from 10 verified fragments.

USE CASES
What use-cases is Object Storage good for?
Currently the datasets best-suited for Object
Storage are the following:
 Unstructured data
 Media (images, music, video)
 Web Content
 Documents
 Backups/Archives
 Archival and storage of structured and semi-
structured data
 Databases
 Sensor data
 Log files
What use-cases is Object Storage not suited for?
 Relational Databases
 Data requiring random access/updates within
objects

What is Object storage ?

More Related Content

What's hot

Viewers also liked

Similar to What is Object storage ?

Recently uploaded

What is Object storage ?