SlideShare a Scribd company logo
1 of 41
Download to read offline
The Foundations of Cloud Data Storage
The growing mountain of available data is matched by an equally high desire to
access it. The rise of cloud networks makes managing this overwhelming
amount of information not only possible, but highly beneficial.
https://cloud.google.com/pricing
Here you’ll find an overview some specific solutions, including market
leaders like ​Google Cloud Platform​ and ​Amazon Web Services​, ​Microsoft Azure​,
as well as competitive services from companies like ​RackSpace​ and ​HP Helion​.
Requirements
From a tools perspective, you'll need a code editor and several browsers for
testing. You can use whichever code editor you prefer.
To explore a set of online services, like cloud data storage, you'll need an internet
connection. Most of the demonstrations and testing could be done with recent
versions of standard based browsers like Google Chrome.
Although many of the cloud data storage platforms offer trial periods, in most
cases, you'll ​need to enable billing​, which requires either a credit card or a bank
account. That's it for tools and real-world requirements.
Cloud Storage Fundamentals 1
https://cloud.google.com/storage/docs/apis
From a knowledge perspective, you should have a general understanding of how
server code modules like ​API​s work.
This is a full exploration of cloud data storage platforms and implementations,
and you'll get the most benefit if you ​keep your mind open​ for other ways that
you can apply the same lessons to your own needs. The absolute best thing you
can bring to this training is your imagination.
Cloud Storage Fundamentals 2
Disclaimer
The information contained in this manual is for general information purposes
only, and provided as-it-is and while I try to keep the information up-to-date and
correct, I make no representations or warranties of any kind, express or implied,
about the completeness, accuracy, reliability, suitability or availability with
respect to the website or the information, products, services, or related graphics
contained on the website for any purpose. Any reliance you place on such
information is therefore strictly at your own risk.
In no event, will I be liable for any loss or damage including without limitation,
indirect or consequential loss or damage, or any loss or damage whatsoever
arising from loss of data or profits arising out of, or about, the use of this manual.
Through this manual you’ll find link to websites which are not under my control. I
have no control over the nature, content and availability of those sites. The
inclusion of any links, does not necessarily imply a recommendation or endorse
the views expressed within them.
Every effort is made to keep this manual’s content up-to-date. However, I take
no responsibility for, and will not be liable for, the website where this document
is shared, being temporarily unavailable due to technical issues beyond my
control.
Jan-Erik Finlander - 2017
Cloud Solutions Architect
Cloud Storage Fundamentals 3
Contents
Requirements 1
Contents 4
1. Introduction to Cloud Data Storage 6
1.1. Understanding cloud data storage 6
Cloud Data Storage Benefits 6
Cloud Data Storage Risks 7
Cloud Data Storage Services 7
1.2. Calculating Costs 8
Cloud Data Storage Costs 8
Cloud Pricing Calculators 9
Google Cloud Platform’s Pricing Calculator 9
AWS’ Pricing Calculator 10
Microsoft Azure’s Pricing Calculator 11
1.3. Cloud Storage Solutions 11
2. Cloud Storage Options 15
2.1. Working with Object Storage 15
2.2 Managing Database Content 17
Cloud Relational Database Features 18
Cloud Relational Database Access 19
Cloud Non-Relational Databases 19
Non-Relational Database Management 19
Cloud Database Security 20
2.3. Targeting Storage Availability 20
2.4. Assessing API interconnectivity 21
3. Data Storage Issues 23
3.1. Understanding Data Storage Issues 23
Service Level Agreement (SLA) 23
3.2. Establishing and Maintaining Secure Storage 25
Cloud Data Storage Security 25
3.3. Handling Latency 27
Cloud Storage Fundamentals 4
Data Cloud Storage Latency 27
3.4. Managing Scalability and Replication 30
Why would you use replication? 31
4. Data Storage Vendors 32
4.1. Google Cloud Platform 32
4.2. Amazon Web Services (AWS) 34
Amazon Cloud Databases 36
4.3. The Microsoft Cloud 36
Azure Blob Command Tools 37
4.4. HP Helion Cloud 38
HP Cloud Object Storage Access 38
Sources 40
Cloud Storage Fundamentals 5
1. Introduction to Cloud Data Storage
1.1. Understanding cloud data storage
Like data itself, cloud data storage is a sprawling and continuously evolving
topic. Cloud data storage refers to a repository for digital information on one or
more servers, in one or more locations.
Let’s focus on corporate and enterprise level solutions, not personal file hosting,
although there is some overlap.
Cloud data storage is a concept whose time has come with availability of cloud
based network infrastructure extended to the market. Amazon opened the
floodgates in 2006 with the introduction of ​Amazon Web Services S3​. Today,
there is an ever-growing array of companies, that offer cloud data storage
services including: ​Amazon​, ​Box​, ​Google​,​ HP Helion​, ​Azure​, ​Oracle​,​ RackSpace​,
and ​Zetta​.
Cloud data services have taken off largely because they fit a variety of use
cases. They're great for application data regardless of where the user is.
Cloud Data Storage Benefits
● Application data
● Big data
● Archiving and backups
● Long-term storage
● Disaster recovery
Cloud Storage Fundamentals 6
Big data, both in terms of file size and quantity of records, routine archives and
backups, long term storage of all types of records, and in case of emergency,
disaster recovery.
There are numerous impactful benefits to going the cloud data storage route,
among them, are ​access​. Your data is available from pretty much anywhere on
the planet that there is an internet connection.​
Scalability​, cloud storage is, for intents and purposes, infinite, and can grow with
your needs, where security, not only can your data access be restricted to
authorised users, but since cloud storage offers both ​zonal ​and ​geographic
redundancy​ the possibility of total data loss is ​severely limited​. And one of the
biggest gorillas in the room is cost. Hosting your data on the cloud means a
significant reduction​ in self-maintained servers.
Which not only cuts the actual physical footprint but also the man hours required
to maintain those servers.
Cloud Data Storage Risks
● Security
● Privacy
There are risks to be considered. Perhaps paramount in the age of the cyber
hacker is ​security​. Cloud storage providers must implement ​strong and
continually updated strategies​ to keep your data from being
compromised.​ Privacy​ goes together with ​security​.
Since we're talking about data stored in one or more off-site facilities you must
ensure its encrypted and accessible only by authorised users. You should be
aware of the privacy laws governing the data centre locations.
Network issues should also be considered. While ​downtime ​leading to data
inaccessibility​ is perhaps the ultimate worry, backup and restoration speeds are
also affected by available bandwidth and demand.
Cloud Data Storage Services
● Online management
● API Access
● Optimisation
The various cloud data storage hosts offer a wide spectrum of services, but
almost all provide online ​management​ of storage including import, export, and
backup operations.
Cloud Storage Fundamentals 7
API access​ for automated data storage control. And methods for ​optimising
operations whether it's establishing access control lists for authorised users, or
setting up transfer of multiple data objects in parallel for greater efficiency.
So, that's a cloud's eye view of cloud data storage. Next, we'll take a closer look
at one of the key factors, cost.
1.2. Calculating Costs
Calculating costs for cloud data storage can be a daunting task at best. Whether you
are trying to make a basic decision as to its cost-effectiveness versus in-house
storage or forecasting expenses for multiyear budgets, there are a good number of
factors to consider.
Let’s look at the most pertinent of those, as well as some useful tools. Although
prices vary, as you would expect, there are a few guiding principles that seem to
hold across the board.
Cloud Data Storage Costs
● Pay per resource used vs. flat fee
● Combination of charges
○ Storage
○ API operations
○ Network transfers
● Higher volume, cheaper rate
First, the clear majority of cloud data storage companies set their pricing on a ​pay
per resources​ used basis versus a​ flat monthly​ or ​annual fee​. Pay per use philosophy
is applied to most, if not all aspects of the service.
Second, storage pricing is often a combination of charges. You can expect your bill to
include a charge for ​storage​, ​API operations​, such as listing and downloading, and
for ​transfers​ in the network. Now although this might give you pause, the rates for
each of these areas are generally very inexpensive.
Finally, ​the higher the volume​, ​the cheaper the rate​. While this approach doesn't
apply across the board, many companies offer lower prices for greater uses of both
storage and network transfer.
The first cost comparison you might want to run pits keeping your data on your own
servers versus moving it to the cloud. In the privately-owned side, you have the very
real upfront capital expenditures such as hardware purchases, installation and
configuration.
Cloud Storage Fundamentals 8
Own Storage Cloud Storage
● Upfront CapExp ● Ongoing OpEx
● Ongoing OpEx ● Basically, a rental
The ongoing operating expenses of maintenance and replacement. With the cloud,
there are no such capital expenses​. It is all operating costs. And while it's true that
those operating costs are perpetual, cloud data storage is, at its heart, a rental after
all.
Many IT managers have opted to go the hybrid route, where they use both their own
existing servers with those from a cloud storage host. The cost calculations for such
an arrangement take on another level of complexity, but it might be the right fit for
your organisation.
Cloud Pricing Calculators
To give you a concrete idea of how pricing for cloud data storage works, let's look at
some of the handy tools made available by vendors.
Google Cloud Platform’s ​Pricing Calculator
https://cloud.google.com/products/calculator
Not all data is equal. Data that does not need to be accessed as frequently or as
quickly can be stored at a lower rate. Backup data, which you don't need to be as
responsive as application data, can be kept for less in Durable Reduced Availability
Cloud Storage Fundamentals 9
storage.
FREE ​LIMIT PER
DAY
PRICE ​ABOVE FREE
LIMIT (PER UNIT)
PRICE UNIT
Stored data 1 GB storage $0.18 GB/Month
Entity Reads
50,000 $0.06 per 100,000
entities
Entity Writes
20,000 $0.18 per 100,000
entities
Entity Deletes
20,000 $0.02 per 100,000
entities
Small Operations Unlimited Free -
For data, which you access even less frequently, like disaster recovery data, consider using ​Cloud Storage
Nearline​ to get the lowest rate.
AWS’ ​Pricing Calculator
https://calculator.s3.amazonaws.com/index.html
Make sure that you click on the ​Amazon S3​ link on the left. That's the Simple
Storage Service.
This is their most accessible tier. You might also want to look at the pricing for
Amazon Glacier​, which has their lower-cost lower availability data.
Cloud Storage Fundamentals 10
Microsoft Azure’s ​Pricing Calculator
Obviously, figuring the cost of going with cloud data storage is only part of making
the business case for the move. But now you should have a better understanding of
the various facets you'll need to examine.
1.3. Cloud Storage Solutions
If you've considered the Cloud data storage market at all you know that it's a wildly
growing competitive one with many players across the spectrum. In this lesson,
we'll take an overview to five of the top contenders.
Amazon Web Services​, also known as AWS. AWS is a full cloud platform with
services in computing, databases, analytics, applications, and deployment as well as
storage.
The primary object storage system is the Amazon ​Simple Storage Service​ referred to
with another acronym ​S3​. S3 is a very straightforward but extremely robust object
storage.
There is no limit to quantity and individual objects can be as large as five terabytes.
S3 features a high-degree of replication across multiple regional data centres. Lower
cost block storage is available through Amazon's Reduced Redundancy Storage
option or it's Amazon Glacier service.
Cloud Storage Fundamentals 11
https://aws.amazon.com/products/storage
If you’re working with Amazon's compute service, ​EC2​, you can also use their ​Elastic
Block Store​ or ​EBS ​feature. EBS is like a more traditional file system while remaining
highly scale-able. Amazon also offers several database options for structured data
storage, ​Amazon RDS​ for relational SQL databases and Amazon ​DynamoDB​ for
non-relational NoSQL.
The ​Google Cloud Platform​ provides an ever-growing service that leverage on their
global infrastructure.​ Cloud storage ​is Google's primary o​bject storage service​ and,
offers limitless storage ​automatically replicating ​across many data centres ​located
around the world. ​Cloud storage objects can be stored in different types of buckets
with varying degrees of accessibility and price points.
Google's relational database service ​Cloud SQL​ is ​MySQL​-based and allows you to
spin up database instances as needed. For data appropriate to a non-relational
platform, you can turn to the ​Schemaless Cloud DataStore​. Google recently brought
online a new entry in the non-relational space called ​BigTable​, and this is targeted to
massive data sets.
Microsoft Azure​ in its storage realm, objects like documents and media files are
handled by ​Azure Blob Storage​ via a REST interface and client libraries for​ .NET​, ​C++​,
Java​, ​Node.js​, and ​Android​ among others.
Cloud Storage Fundamentals 12
https://azure.microsoft.com/en-us/services/storage/tables
The ​Azure Table Storage​ service manages non-relational data in a NoSQL fashion
complete with auto-load balancing. A separate service SQL Database takes care of
relational data.
https://www.rackspace.com/cloud/files
RackSpace​ puts its Cloud data storage service under the infrastructure umbrella
with three targeted offerings. Cloud Files is for data objects and boast triple
replication with a simplified pricing structure.
The RackSpace Cloud Backup service automatically employs block-level
compression and 256-bit key encryption to keep your data compact and secure.
Cloud Storage Fundamentals 13
HP Cloud Object Storage and HP Cloud Block Storage. Their Cloud object storage
services support the open-stack standard with both Java and Ruby APIs.
HP Cloud Block Storage pairs with HP Cloud compute instances but the storage
persist until they are explicitly deleted.
Cloud Storage Fundamentals 14
2. Cloud Storage Options
2.1. Working with Object Storage
Let’s see how objects are typically handled by the various services, benefits you
should expect and methodology to look for.
What do we mean by an Object?​ Objects are discrete digital entities, which could
mean documents, files, both uncompressed and compressed, images, video,
audio, all media.
Because of the nature of cloud storage, neither quantity nor object file size is a
problem - ​Blobs​ (​Binary Large Objects​), are easily accommodated. Many services
can handle a single put of any file up to ​5 gigabytes​ in size.
Larger files should be ​split into multiple parts​, also known as segmenting and
uploaded as part of an overall Object, typically identified with a unique ID.
Structurally object storage is organised on 2 levels, in the initial layer is the
Container, also called an ​Asset Group​ or more commonly a ​Bucket​. You can have
as many Containers as you need, each container has a unique ID so that they can
be accessed globally.
Containers are project specific and on all the services and cannot be nested.
However, you can create folders within the Containers to create a hierarchy. For
services with multiple data centres around the world, you can specify the region
to host the Container.
Cloud Storage Fundamentals 15
Containers are great for ​reducing latency​ to your targeted markets. Each
individual Object is stored within a specific Container, ​Objects ​cannot be
shared across ​Buckets​, although they can be duplicated. Frequently, you'll find
the ability to create identifying metadata for your Objects via main value pairs.
When the Object is uploaded to the Container, once its integrity is validated, it's
available. Similarly, once you delete it, it's no longer accessible there are ​no
undos​, so backups are essential.
Many services offer ​server-side encryption​ for security, although you're also free
to use client-side encryption prior to transferring the Object. As mentioned in an
earlier movie, many Cloud platforms provide several varied storage classes with
lower costs for data that you don't access as frequently or with as much
redundancy. Some services, like Google Cloud Storage, apply them at the
Container, or Bucket level.
Cloud Storage Fundamentals 16
While others, such as Amazon S3 allow you to specify the storage class for
individual Objects. Once you've set up your Containers and began to populate
them with Objects you should ensure that only the people you want to access
them can.
Object Storage is typically,​ by default​, private and initially only accessible by the
owner or primary administrator. Permissions can be broadened however all the
way up to publicly available. Most services have mechanisms in place for
establishing authenticated users. Often via ACLs or Access Control Lists.
All services provide some ​API libraries​ for Object management and many
platforms offer a full spectrum of languages from which to choose: Anything
from ​Java​, ​.net​, ​PHP​ to ​Python​. And with them you can get a full list of your
Containers, find out what is in each one and then store, retrieve, copy and delete
those Objects.
More advanced operations such as targeting specific versions of the Object are
available on specific platforms. Mastering Containers and Objects is essential to
much of productive Cloud Data Storage, Data Record Storage and databases and
other systems is another major aspect of the Cloud Data Storage world.
2.2 Managing Database Content
Cloud data storage can handle structured data as well as unstructured
blobs. There are two major strains of databases supported, ​Relational Databases
and ​Non-Relational Databases​. Relational Databases are typically SQL
databases and the Non-Relational use NoSQL which is short for not only SQL.
Some cloud data storage platforms like Google and HP ​focus on MySQL​ while
others support a range of MySQL variations.
Cloud Storage Fundamentals 17
RackSpace​ supports ​MySQL Percona Server​ and ​MariaDB​. Some services such as
AWS work with other Relational Databases including ​Oracle​, ​SQL Server​ or
PostgreSQL ​as well as ​MySQL​.
A few cloud data storage services have opted for the Proprietary route like
Microsoft Azure with their SQL database offering.
AWS Google Microsoft
RDBMS RDS - all major MySQL SQL Azure
NoSQL buckets S3 or Glacier Cloud Storage Azure Blobs
NoSQL Key-Value DynamoDB Cloud Datastore Azure Tables
Streaming ML or
Apache Mahout
Custom EC2 Prospective Search &
Prediction API
StreamInsight
NoSQL Document or
Graph
MongoDB on EC2 Freebase MongoDB on Azure
NoSQL - Column
Hadoop Hbase
Elastic MapReduce +
S3 & EC2
Cloud Data Proc HDInsight
Dremel / Warehousing Redshift BigQuery SQL Data Warehouse
Cloud Relational Database Features
● Replication across data centres
○ Increase data durability
○ Decrease latency
● Scale up / down data instances
● Backups created automatically
Many cloud data storage services take ​SQL Servers​ to the next level by
replicating the databases across data centres, increasing data ​durability ​and
decreasing ​latency​.
The cloud database servers scale very efficiently, spinning up new database
instances or down as needed. Backups to multiple locations are often created
automatically​, allowing ​point-in-time​ recovery.
Access to cloud-based databases is broad overall but specific APIs for database
management are limited on a service-by-service basis.
Cloud Relational Database Access
● HTTP requests supported across the board
● Specific APIs vary by services provider
Cloud Storage Fundamentals 18
● New APIs routinely introduced
All hosts support standard HTTP requests for accessing data but you'll have to
check each service to verify that an API for the language of your choice is
available. And keep in mind that it is by no means a static situation.
Cloud Non-Relational Databases
● Proprietary NoSQL frameworks for each vendor
○ Amazon DynamoDB
○ Rackspace ObjectRocket
○ Google Cloud Datastore
● Appropriate for massive datasets
○ AWS Redshift
○ Google Cloud Bigtable
Many services add additional APIs on a continuing basis. If your applications lend
themselves to non-relational data with relatively straightforward queries, the
most responsive​ database system capable of scaling to massive size is ​NoSQL​.
All the services that provide NoSQL alternatives, provide their own
framework like Amazon​ DynamoDB​, RackSpace ​ObjectRocket​, or Google ​Cloud
DataStore​.
Both ​key/value​ and ​document-based​ NoSQL systems are available. Amazon
DynamoDB works with either while Microsoft DocumentDB is document
focused, and Google Cloud DataStore is key/value oriented.
NoSQL's relatively simple structure opens the door to efficient processing of big
data. Several cloud data hosts are taking advantage of NoSQL, like ​AWS Redshift
service or the recently introduced ​Google Cloud BigTable​.
Non-Relational Database Management
● Platform specific but robust
○ Create, update, and delete tables
○ Create, update, and delete content (items or entities)
○ Create, update, and delete content attributes
Management access to the NoSQL services are platform-specific but tend to be
very robust. With most of them, you'll be able to programmatically create,
update and delete tables as well as perform similar operations on table contents
which may be called items or entities and their attributes as well.
Cloud Database Security
● Replication
Cloud Storage Fundamentals 19
○ Automatically initiated
○ Geographically separated facilities
○ API controlled
○ Selectable read consistency and write verification
With both SQL and NoSQL solutions, data security is ​enhanced ​by ​automatic
replication​, often across geographic data centres. Replication can also be
implemented ​via API​ calls. Numerous services allow you to optimise the degree
of read consistency and right verification that your data requires.
Cloud database storage is just as vigorous and vital as its sibling Object
Storage. However, there is more diversity in the feature sets found on the
various providers. You'll ​need to research carefully​ to find the right fit for your
organisational database needs.
2.3. Targeting Storage Availability
By default, all containers and objects are initially private and ​only accessible​ by the
project ​administrator​, often referred to as the owner. Only the owner can grant
permissions to others to read or interact with a container and its objects.
Grantees can include: people, individuals, identified by ID number or email, groups,
like an email group, or domains, often expressed as a subnet range or an IP address.
Collectively, these permissions are called ​Access Control Lists​, or ACL. Various
services support ACLs written in a variety of languages, but the most common are
XML and JSON.
It's quite common for the Cloud Data services to make APIs available in multiple
languages, including Java, C++, PHP, Python, Node.js, Ruby, and others. We'll take a
closer look at APIs in the next lesson.
Once you've established authorised authenticated user access, Cloud Data Storage
Cloud Storage Fundamentals 20
give you full direct management capabilities, just as you would have over in house
storage.
2.4. Assessing API interconnectivity
A good API, application programming interface, is truly worth its weight in gold
considering the time and effort it saves you in coding, testing, and debugging.
The Cloud data storage vendors and for that matter community fully embrace APIs
across the spectrum which can create a bit of a problem, because there are so many
options to choose from.
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingAWSSDK.html
Because most of the Cloud data storage solutions are part of a larger platform, most
of the associated APIs are contained within a series of overall SDKs. Each written for
a separate language.
The AWS SDKs, are available from this URL for ​Java​, ​.NET​, ​PHP​, ​Ruby​, and the
Python ​interface to AWS which is called ​Boto​.
The Google Cloud Platform has a similar SDK for overall functionality, although
theirs is not broken out by language and requires Python 2.7.
Cloud Storage Fundamentals 21
https://cloud.google.com/sdk
Here are the ​APIs available for Google Cloud Storage​. As I scroll down, you'll see that
there's support for .NET, Java, JavaScript, Objective-C, PHP, and Python.
https://cloud.google.com/storage/docs/json_api/
Leveraging the available ​API​s such as those found in AWS and Google Cloud Storage
is a crucial strategy towards efficient storage and IT management.
Cloud Storage Fundamentals 22
3. Data Storage Issues
3.1. Understanding Data Storage Issues
The benefits of cloud data storage are undeniable: ​Global access​, ​no upfront capital
expense​, and virtually ​unlimited capacity​ to mention a few. But cloud data storage is
not without its dark side. The number one issue must be ​security​. The flip side of
being able to access your data from anywhere is that people from anywhere can
potentially access your data.
First, there's the ​vulnerability ​of transferring your data to and from the cloud.
Industrial strength encryption​ is typically the answer, but you must have a robust
encryption key management system in place to maintain long term accessibility.
Next, you need to be confident in the cloud service provider's own security when
your data is at rest to ​guard against data breaches​. This protection needs to be
up-to-date and evolving because the threat is certainly ongoing. To this end, you
want to make sure that access logs are complete and monitored routinely.
Service Level Agreement (SLA)
● Tool for reducing down time
As with any network, there's always a problem of down time. Reducing network
inaccessibility to an ​absolute minimum​ is a key requirement and one the service
providers work diligently toward addressing. While maintaining the external network
is ​beyond your control​, you do have a key tool for dealing with any such problems,
the ​Service Level Agreement​ or ​SLA​.
A solid SLA details up time expectations and the consequences, typically service
credits if things go south. ​Not only​ do you want to make sure your data is ​accessible​,
Cloud Storage Fundamentals 23
quite often you also want to ensure that it's delivered as ​quickly ​and ​efficiently ​as
possible, especially if you're working with application data. With global networks,
latency can be a real concern.
If you have a worldwide audience, you probably want to take advantage of a data
cloud storage host with the worldwide reach of multiple data centres located
geographically closest to your own markets.
One of the major selling points for cloud data storage is ​scalability​. When your traffic
increases, cloud storage hosts are set to share the load among multiple servers. If
your traffic lessens, the number of servers in play shrink as well. This impacts your
bottom line, as cloud data storage is a ​pay for what you use​ service.
Cloud Storage Fundamentals 24
While the various providers are designed to scale object storage, there are several
techniques you can apply to optimise the practice. Like any service, cloud data
storage is not​ without its problems​. As always, the first step in addressing them is
to​ identify the issues​.
3.2. Establishing and Maintaining Secure Storage
Secure organisational data is a topped, ranked, if not number one task for IT
departments. Storing your data in a remote, offsite facility requires a robust
strategy, and an ongoing participation, by both client and host, a fact that cloud data
storage providers are aware of.
Cloud Data Storage Security
● In transit: Transferring to and from storage host
● At rest: stored at remote facility
You can break hosted security concerns into two main areas. First, ​in transit​, when
the data is being​ transferred to or from your system​. And second, ​at rest​, when the
data is on the ​remote storage​ server.
In transit​, data can be protected in several ways, none of which are mutually
exclusive. ​SSL and HTTPS​ protocols should ​always​ be used to secure the data's
travels. Extremely sensitive data can also be ​encrypted ​on the ​client-side​ prior to
transfer. Naturally, this means that you'll have to have a solid key encryption
management system in place.
If you choose not to transfer encrypted data, the cloud data storage host can
encrypt it for you, so that's it's secure while at rest. Services like Amazon S3 allow
you to establish​ bucket policies​ that will stop the transfer unless the data's header
contains a ​request​ to encrypt the data server-side.
AWS also supports another layer of protection for server-side encryption. Key
management service, also called ​KMS​.
Cloud Storage Fundamentals 25
KMS gives you control over ​server-side encryption​ keys, preventing those keys from
ever being exported, and providing a full audit trail of their use. There is an additional
charge for using KMS managed keys however.
Another way to secure your data is by ​versioning​, some variation of which is offered
by several cloud data storage services, including S3. Once versioning is enabled, your
data is protected from accidental deletion or overwrite. Versioning is ​typically
enabled​ at the ​container​, or ​bucket​, level. From a security standpoint, it's a good idea
to enable logging.
Logging​ is disabled by default on most services. And once set up, all requests for
server access is tracked, and typically includes requester details, container name,
object name, request time, request action, response status, and the error code, if
any.
Cloud Storage Fundamentals 26
Logs are stored in a designated container on the cloud data storage host, and can be
retrieved and examined at any time. Because they are treated like any other storage
object, they will incur a charge, and you should set up a policy for archiving or
deleting them after a set period.
Although storing your data remotely is undeniably a risk, with ​heightened
awareness​, and fully taking advantage of available cloud data storage tools, you can
minimize that risk as much as possible.
3.3. Handling Latency
Speed matters. Especially the speed at which your data travels from where it is
stored to where it needs to go. Latency is a real cloud storage data factor, and what
options you have for optimising it.
Data Cloud Storage Latency
● Location is important
● Store data closest to user base
● Specify container’s region
○ US, Europe and Asia
Latency ​can be defined as the amount of time it takes one packet of data to get from
location to another. In terms of Cloud data storage, we're talking about the length of
time from when the request is received by the data hosting server, to when the
response is received by the requesting client. Latency is a ​key defining​ characteristic
for various storage classes. To further optimise latency, the most important is
location​. Whenever possible, it's best to house your data ​closest ​to the folks who
want it.
https://cloud.google.com/storage/docs/bucket-locations
Most Cloud data storage vendors allow you to specify the region when creating a
container for objects. Typically, the regions available are sizable in scope like the US,
Europe, or Asia. And, you should place your storage nearest your market.
Cloud Storage Fundamentals 27
https://aws.amazon.com/about-aws/global-infrastructure
There is a trend, to break up the large regions and allow a finer container placement.
Google Cloud Platform ​Bucket Locations​ service, can be used with their ​Durable
Reduced Availability​ storage class.
You can specify that you want your objects to be housed in the eastern US, the
western US, or central US, or any combination thereof. Or any other regions that are
available.
What else can you do to lessen latency and improve performance?​ Believe it or
not, the actual naming of an object, and / or its container, can have serious impact
on response time. Most Cloud data storage services index ​alphabetically​ their key
name.
It's a common practice to incorporate a ​time stamp​ as part of that ​ID​. This has the
effect of grouping objects that were transferred at about the same time on the
same server partition; therefore, it's recommended to preface your object and
container names with a random hash string, which will have the effect of spreading
them out on varying partitions.
When it comes to structured data versus unstructured blobs, latency is tied to data
consistency. Because database entries can be modified at any point, the read write
times are impactful, and the more emphasis placed on shorter spans, thus
heightening data consistency, the greater the latency.
Cloud Storage Fundamentals 28
https://azure.microsoft.com/en-us/blog/azure-documentdb-is-now-available-in-central-us
Microsoft Azure DocumentDB​ has identified this as a key area for their service, and
now offers four distinct levels of consistency: ​Strong​, ​Bounded Staleness​, ​Session​,
and ​Eventual​.
The Strong level of consistency results in the highest latency, while the Eventual
level is the lowest. Understanding how latency works, and the associated options, is
a pivotal step in positioning your Cloud data storage properly.
Cloud Storage Fundamentals 29
3.4. Managing Scalability and Replication
The raw power of today’s cloud data storage industry is really apparent when you
consider two defining characteristics: ​Scalability ​and ​Replication​.
Scalability ​is the ability of a system to efficiently adapt to handle the current
workload. The vastness of the networks now available for cloud data storage means
that there's virtually ​no limit​ to the number of objects or the amount of data that
you can store online.
This scalability is, for the most part, effortless for customers of these services,
because the infrastructure is already in place and being maintained by the service
providers.
On the bulk of cloud data storage hosts, there are an infinite number of containers
available, and each container is ​infinitely large​. When you try to store more objects
in a container than can be physically contained in a single drive, the data will be
written to other systems while still existing within the same virtual bucket.
Cloud Storage Fundamentals 30
Although the image that most frequently comes to mind when you say scalability, is
one of the service increasing its processes to meet surging tasks, scaling up, the
ability to discard unneeded processes, scaling down, is just as important. Because
cloud data storage runs on a pay for what you use model, most storages calculate
their storage charge on a monthly average use. Now if your average goes down, the
charge goes down.
Replication ​is the ​duplication ​of data in real time over a network. It's a common
practice among cloud data storage platforms to automatically replicate your objects
when they're added to your containers, and store the redundant objects in multiple
devices, usually in the same region. When the object is replicated, everything
remains the same. The key name, the metadata, the container, everything. The
primary goal of replication is data protection, or durability, making sure that your
data objects are available.
Durability ​is the probability that an object will be the same as when you transferred
it after one year. The greater the likelihood that your data will be available, the
higher the durability. 100% durability would mean that an object could not be lost.
90% durability means that there's a one in ten chance.
AWS rates their ​S3 standard storage​ class at ​99.999999999%​ durability. This means
that if you store, say 10,000 objects with them, one might get lost every 10 million
years or so. This automatic replication is to other devices within the same region.
Now you can also replicate your data to a different region.
Why would you use replication?
1. You can reduce latency by housing your objects as close as possible to your
markets.
2. Regulatory compliance may mandate that your data be stored redundantly in
remote locations.
3. Your internal infrastructure may have remote offices that require access to
the same data.
Cloud Storage Fundamentals 31
4. Data Storage Vendors
4.1. Google Cloud Platform
Google Cloud Platform is one of the most all-encompassing online services. With
major entries in the data storage fields backed by an extremely robust global
infrastructure.
https://cloud.google.com
Google Cloud integrates a full spectrum of products and one that's constantly
evolving. Most, if not all, of the product line works smoothly with the other products.
Applications built with Compute Engine can easily pull assets from Cloud Storage.
You can, of course, use the storage products independently of any other service in
the platform.
For object storage on the Google Cloud Platform you'd use ​Cloud Storage​. With
unlimited capacity and worldwide data centres your data objects can be housed in
any of Cloud Storage tiers.
In order of decreasing cost those tiers are: ​Standard Storage​, for objects that require
the highest degree of durability and access. ​Durable Reduced Availability​, or ​DRA​,
perfect for data backups and other objects that do not require the highest degree of
availability. And ​Cloud Storage Nearline​, intended for backups, archives, disaster
Cloud Storage Fundamentals 32
recovery, and other data where increased latency is acceptable.
https://cloud.google.com/products
The actual storage in Cloud Storage is based on buckets and objects. You create a
bucket that holds one or more objects. Access to the buckets and objects is handled
in a variety of ways.
The API is accessible via XML in either Java or Python and JSON through Java,
JavaScript, Python, Go and PHP. Relational data is handled by Google Cloud SQL
which supports MySQL.
With​ Cloud SQL​, you have the choice of hosting regions, US, Europe, or Asia, with
100 gigabytes of storage and up to 16 gigabytes of RAM per database instance.
With Cloud SQL, you get all the power of MySQL with automatic replication of your
data across multiple data centres.
Additional peace of mind comes from the point in time backup and recovery
services. Importing and exporting of your existing data is supported by commonly
used tools like ​MySQL dump​, ​MySQL wire protocol​, and ​JDBC​.
Much of the power of Cloud SQL stems from the fact that an application can spin up
Cloud Storage Fundamentals 33
database instances on an as needed basis. These instances can be accessed in
several ways including the ​Google Cloud Console​.
Additionally, you're free to use the ​MySQL ​client through the ​command line​ or the
JSON API​.
Non-relational data is addressed by ​Cloud DataStore​ which uses schemaless
NoSQL. Cloud DataStore features built in redundancy with automatic replication
across data centres as well.
Through NoSQL Cloud DataStore supports ACID transactions for reliable processing.
And access to Cloud DataStore and NoSQL is available through the Google Cloud
Console interface a command line tool called GCD and a full featured JSON API.
Google Cloud's latest offspring in the data storage space is Bigtable.
Also, NoSQL based ​Bigtable​ is optimised to handle enormous amounts of data
ranging from terabytes to petabytes with single digit millisecond latency the engine
that drives Bigtable is the same one that Google uses for its top of the line
applications including ​Gmail​, ​Google Maps​, and ​Google Analytics​.
Accessible from the open source ​HBase API​, which integrates nicely with Hadoop,
Bigtable encrypts data in transit as well as at rest.
4.2. Amazon Web Services (AWS)
Amazon Web Services, frequently known as AWS, was the first major player to enter
the cloud data storage field, and continues to be a significant force in the market,
with products for every corner of the computing realm, including formidable entries
in all types of data storage.
Amazon's network too is rightfully world famous with a reliable, secure
infrastructure, capable of serving entrepreneur to enterprise. Object storage on AWS
falls to S3, short for Simple Storage Service. S3 is straightforward and easy to use,
while remaining extremely flexible and powerful. Boasting automatic redundancy,
S3 is highly scalable and secure.
Choose between three different service levels to find the right fit for your data,
Standard Storage, with the highest degree of durability, Reduced Redundancy
Storage, which, at a lower cost, is perfect for non-critical data, and Amazon Glacier,
Cloud Storage Fundamentals 34
intended for infrequently accessed data, such as archives and disaster recovery files.
Amazon Web Services has a wide range of products, all of which are integrated with
each other.
https://aws.amazon.com/solutions
AWS supports both ​relational and nonrelational databases​. Their primary SQL
solution is Amazon RDS, Relational Database Service. Their primary SQL solution is
Amazon RDS, Relational Database Service, which supports MySQL, Oracle, SQL
Server, and PostgreSQL.
The exact feature set of RDS is dependent on which database engine used, although
automatic backups are enabled by default across the board. Fully scalable, RDS
spins up database instances as needed. Configure your instances to use from one to
Cloud Storage Fundamentals 35
32 virtual CPUs, with one to 244 gigabytes of memory.
Amazon Cloud Databases
If You Need Consider Using Product Type
A managed ​relational database​ in the cloud that you can
launch in minutes with a just a few clicks.
Amazon RDS Relational
Database
A fully managed MySQL compatible ​relational database
with 5X performance and enterprise level features.
Amazon Aurora Relational
Database
A managed ​NoSQL database​ that offers extremely fast
performance, seamless scalability and reliability
Amazon DynamoDB NoSQL
Database
A fast, fully managed, petabyte-scale ​data warehouse​ at
less than a tenth the cost of traditional solutions.
Amazon Redshift Data
Warehouse
To deploy, operate, and scale in-memory cache based
on memcached or ​Redis​ in the cloud.
Amazon ElastiCache In-Memory
Cache
Help migrating your databases to AWS easily and
inexpensively with zero downtime.
AWS Database Migration
Service
Database
Migration
https://aws.amazon.com/products/databases
Data is ​automatically replicated across three regional AWS data centres​, and
optionally you can take advantage of Amazon's new cross-region replication service
to further spread your data around the globe.
4.3. The Microsoft Cloud
The full range of services from computing to analytics, to the Internet of things
integration, is as robust and compelling as any in the market. The Microsoft brand
also brings a distinct familiarity and a collection of compatible services and tools,
such as Active Directory and Visual Studio.
Unstructured data is fully supported with Azure Blobs. Boasting over 40 trillion
stored objects and an average of 3.5 million requests per second, Azure Blobs
provides high durability and accessibility.
Azure Blob Command Tools
Cloud Storage Fundamentals 36
● AZ Copy
● PowerShell
● Azure cross-platform CLI
There are two abstractions support by Azure Blobs, ​Page Blobs​ for disks and ​Block
Blobs​ for discrete files. Accessible via REST Interfaces, API client libraries, and a set
of powerful command tools, like ​AZ Copy​, ​PowerShell​ and the ​Azure cross-platform
CLI​, Azure Blobs gives you a great many options for object management.
Instead of bucket object, Azure Blobs uses ​Container blob​ for its terminology. Each
Azure Storage account uses a single root container, which can hold Blobs or other
containers.
https://azure.microsoft.com/en-us/solutions
Azure offers two solutions for NoSQL non-relational data, ​Azure Tables​ and
Document DB​. Use Azure Tables for key value data structures and Document DB for
document data models. Document DB is a database as a service and has a very
full-featured SQL compatible environment that is continually evolving.
Additionally, Document DB is schema-less, which allows your data structures to
efficiently evolve over time as well. As you might expect from the developers of SQL
Server, Azure's relational database service called ​SQL Database​, is top of the line
with full support for existing SQL Server tools, APIs and libraries.
SQL Database is Cloud migration friendly and offers three service tiers for a range of
workloads, ​basic​, ​standard ​and ​premium​. SQL Database can handle databases up to
Cloud Storage Fundamentals 37
500 gigabytes and provides point in time restore, geo-restore, and geo-replication
features.
Microsoft Azure currently offers a ​free one-month trial with a $200 credit​, the
perfect way to give this highly competitive service a run for your money.
4.4. HP Helion Cloud
HP Helion combines a solid set of products for computing and storage applications,
including ones to handle both object and database storage. If you're using HP Cloud
Compute, you'll want to tie into their block storage module. With persistent images,
even beyond the life of the associated compute instance, you can store your data if
necessary.
Object storage comes under the aegis of HP Cloud Object Storage, naturally. Like
most other similar services, Cloud Object Storage utilises a container and object
structure.
HP Cloud Object Storage Access
● Online console
● Command line interface
● Rest API
● Language bindings
○ Java, PHP, .Net, Node.JS, or Fog (Ruby Cloud Services Library)
Access is available via an ​online console​, a ​command line interface​, a complete ​rest
API​, or one of the many language bindings, including Java, PHP, .net, node.js, and
Ruby fog.
A common use for objects is to act as a ​Content Delivery Network​ or ​CDN​. HP Cloud
CDN ​optimises ​your cloud object storage to deliver static files with ​minimal latency​,
powered by ​Akamai's global network​ ​of edge servers.
Charges are calculated monthly on the amount of storage used, the amount of data
transferred out of the system, and the number of get, put, post, copy, or list
requests made.
As of the time of this recording, HP offers a ​no-charge trial​ period with a substantial
credit if you'd like to investigate their services further.
Cloud Storage Fundamentals 38
Sources
Google Cloud Platform Products and Services
https://cloud.google.com/products
Google Cloud Platform Pricing Calculator
https://cloud.google.com/products/calculator
Google Cloud Platform APIs & Reference
https://cloud.google.com/storage/docs/apis
Amazon Web Services S3
https://aws.amazon.com/s3
Cloud Storage Nearline
https://cloud.google.com/storage-nearline
AWS’ Pricing Calculator
https://calculator.s3.amazonaws.com/index.html
Cloud Storage with AWS
https://aws.amazon.com/products/storage
Google Cloud SQL
https://cloud.google.com/sql
Google Schemaless Cloud DataStore
https://cloud.google.com/datastore
Google Cloud BigTable
https://cloud.google.com/bigtable
Azure Blob Storage
https://azure.microsoft.com/en-us/services/storage/blobs
Azure Table Storage
https://azure.microsoft.com/en-us/services/storage/tables
Rackspace Scalable Cloud Object Storage
https://www.rackspace.com/cloud/files
Using the AWS SDKs, CLI, and Explorers
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingAWSSDK.html
AWS SDK for Python (Boto3)
https://aws.amazon.com/sdk-for-python
Google Cloud Platform SDK
https://cloud.google.com/sdk
Google Cloud Storage JSON API Overview
https://cloud.google.com/storage/docs/json_api
AWS Key Management Service (KMS)
https://aws.amazon.com/kms
Cloud Storage Fundamentals 39
AWS security-logging
https://aws.amazon.com/answers/logging
Google Cloud Platform Bucket Locations
https://cloud.google.com/storage/docs/bucket-locations
AWS Global Infrastructure
https://aws.amazon.com/about-aws/global-infrastructure
Azure Regions
https://azure.microsoft.com/en-us/regions
Azure DocumentDB
https://azure.microsoft.com/en-us/services/documentdb
Google Cloud Console
https://console.cloud.google.com
Google Cloud DataStore
https://cloud.google.com/datastore
Apache HBase
https://hbase.apache.org
Cloud Databases with AWS
https://aws.amazon.com/products/databases
Azure solutions
https://azure.microsoft.com/en-us/solutions
Create your free Azure account today
https://azure.microsoft.com/en-us/free
Akamai Cloud Networking
https://www.akamai.com/us/en/solutions/products/cloud-networking
Akamai Ion for Free
https://content.akamai.com/PG5155-Online-Trials-Ion-Standard.html
Cloud Storage Fundamentals 40

More Related Content

What's hot

Three Steps to Modern Media Asset Management with Active Archive
Three Steps to Modern Media Asset Management with Active ArchiveThree Steps to Modern Media Asset Management with Active Archive
Three Steps to Modern Media Asset Management with Active ArchiveAvere Systems
 
Security On The Cloud
Security On The CloudSecurity On The Cloud
Security On The CloudTu Pham
 
Hadoop and Cloudian HyperStore
Hadoop and Cloudian HyperStoreHadoop and Cloudian HyperStore
Hadoop and Cloudian HyperStoreCloudian
 
Architecting Big Data Applications with HDInsight
Architecting Big Data Applications with HDInsightArchitecting Big Data Applications with HDInsight
Architecting Big Data Applications with HDInsightAshish Thapliyal
 
Deduplication on Encrypted Big Data in HDFS
Deduplication on Encrypted Big Data in HDFSDeduplication on Encrypted Big Data in HDFS
Deduplication on Encrypted Big Data in HDFSIRJET Journal
 
Predictable Business Continuity for Amazon Web Services
Predictable Business Continuity for Amazon Web ServicesPredictable Business Continuity for Amazon Web Services
Predictable Business Continuity for Amazon Web ServicesVeritas Technologies LLC
 
Raising Your Game: Maximizing Uptime in the Multi-cloud
Raising Your Game: Maximizing Uptime in the Multi-cloudRaising Your Game: Maximizing Uptime in the Multi-cloud
Raising Your Game: Maximizing Uptime in the Multi-cloudVeritas Technologies LLC
 
Using data lifecycle management
Using data lifecycle managementUsing data lifecycle management
Using data lifecycle managementInterfacing
 
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013Kai Wähner
 
Getting on C2S: Lessons Learned Migrating Space Operational Systems to the Cl...
Getting on C2S: Lessons Learned Migrating Space Operational Systems to the Cl...Getting on C2S: Lessons Learned Migrating Space Operational Systems to the Cl...
Getting on C2S: Lessons Learned Migrating Space Operational Systems to the Cl...Amazon Web Services
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Denodo
 
AWS Cloud Essentials - An Overview
AWS Cloud Essentials - An OverviewAWS Cloud Essentials - An Overview
AWS Cloud Essentials - An OverviewEdureka!
 
Big Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your DataBig Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your DataKai Wähner
 
High-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache ImpalaHigh-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache ImpalaCloudera, Inc.
 
NetApp Cloud Data Services & AWS Empower Your Cloud Champions
NetApp Cloud Data Services & AWS Empower Your Cloud ChampionsNetApp Cloud Data Services & AWS Empower Your Cloud Champions
NetApp Cloud Data Services & AWS Empower Your Cloud ChampionsAmazon Web Services
 
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...RightScale
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondCloudera, Inc.
 
Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault
Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault
Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault Veritas Technologies LLC
 
Powering a Hybrid Cloud with CommVault and Amazon Web Services - Session Spon...
Powering a Hybrid Cloud with CommVault and Amazon Web Services - Session Spon...Powering a Hybrid Cloud with CommVault and Amazon Web Services - Session Spon...
Powering a Hybrid Cloud with CommVault and Amazon Web Services - Session Spon...Amazon Web Services
 

What's hot (20)

Three Steps to Modern Media Asset Management with Active Archive
Three Steps to Modern Media Asset Management with Active ArchiveThree Steps to Modern Media Asset Management with Active Archive
Three Steps to Modern Media Asset Management with Active Archive
 
Security On The Cloud
Security On The CloudSecurity On The Cloud
Security On The Cloud
 
Hadoop and Cloudian HyperStore
Hadoop and Cloudian HyperStoreHadoop and Cloudian HyperStore
Hadoop and Cloudian HyperStore
 
Architecting Big Data Applications with HDInsight
Architecting Big Data Applications with HDInsightArchitecting Big Data Applications with HDInsight
Architecting Big Data Applications with HDInsight
 
Deduplication on Encrypted Big Data in HDFS
Deduplication on Encrypted Big Data in HDFSDeduplication on Encrypted Big Data in HDFS
Deduplication on Encrypted Big Data in HDFS
 
Predictable Business Continuity for Amazon Web Services
Predictable Business Continuity for Amazon Web ServicesPredictable Business Continuity for Amazon Web Services
Predictable Business Continuity for Amazon Web Services
 
Raising Your Game: Maximizing Uptime in the Multi-cloud
Raising Your Game: Maximizing Uptime in the Multi-cloudRaising Your Game: Maximizing Uptime in the Multi-cloud
Raising Your Game: Maximizing Uptime in the Multi-cloud
 
Using data lifecycle management
Using data lifecycle managementUsing data lifecycle management
Using data lifecycle management
 
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
"Big Data beyond Apache Hadoop - How to Integrate ALL your Data" - JavaOne 2013
 
Getting on C2S: Lessons Learned Migrating Space Operational Systems to the Cl...
Getting on C2S: Lessons Learned Migrating Space Operational Systems to the Cl...Getting on C2S: Lessons Learned Migrating Space Operational Systems to the Cl...
Getting on C2S: Lessons Learned Migrating Space Operational Systems to the Cl...
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
AWS Cloud Essentials - An Overview
AWS Cloud Essentials - An OverviewAWS Cloud Essentials - An Overview
AWS Cloud Essentials - An Overview
 
Big Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your DataBig Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your Data
 
High-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache ImpalaHigh-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache Impala
 
NetApp Cloud Data Services & AWS Empower Your Cloud Champions
NetApp Cloud Data Services & AWS Empower Your Cloud ChampionsNetApp Cloud Data Services & AWS Empower Your Cloud Champions
NetApp Cloud Data Services & AWS Empower Your Cloud Champions
 
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
 
Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault
Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault
Fine Tune Your Archive: Best Practices for Optimizing Enterprise Vault
 
Powering a Hybrid Cloud with CommVault and Amazon Web Services - Session Spon...
Powering a Hybrid Cloud with CommVault and Amazon Web Services - Session Spon...Powering a Hybrid Cloud with CommVault and Amazon Web Services - Session Spon...
Powering a Hybrid Cloud with CommVault and Amazon Web Services - Session Spon...
 
How to Stop Over Paying Your VMware Taxes
How to Stop Over Paying Your VMware TaxesHow to Stop Over Paying Your VMware Taxes
How to Stop Over Paying Your VMware Taxes
 

Viewers also liked

Engage Your Customers with Amazon SNS Mobile Push (MBL308) | AWS re:Invent 2013
Engage Your Customers with Amazon SNS Mobile Push (MBL308) | AWS re:Invent 2013Engage Your Customers with Amazon SNS Mobile Push (MBL308) | AWS re:Invent 2013
Engage Your Customers with Amazon SNS Mobile Push (MBL308) | AWS re:Invent 2013Amazon Web Services
 
Unlocking Patterns of EA Program Failure: Lessons learned about the barriers ...
Unlocking Patterns of EA Program Failure: Lessons learned about the barriers ...Unlocking Patterns of EA Program Failure: Lessons learned about the barriers ...
Unlocking Patterns of EA Program Failure: Lessons learned about the barriers ...Basuki Rahmad
 
Portfolio Management (Program & Project) by Rahmat Mulyana at OMM 43 PMI Indo...
Portfolio Management (Program & Project) by Rahmat Mulyana at OMM 43 PMI Indo...Portfolio Management (Program & Project) by Rahmat Mulyana at OMM 43 PMI Indo...
Portfolio Management (Program & Project) by Rahmat Mulyana at OMM 43 PMI Indo...rahmatmoelyana
 
ISACA Indonesia Special Technical Session feat Erik Guldentops - Indonesia Re...
ISACA Indonesia Special Technical Session feat Erik Guldentops - Indonesia Re...ISACA Indonesia Special Technical Session feat Erik Guldentops - Indonesia Re...
ISACA Indonesia Special Technical Session feat Erik Guldentops - Indonesia Re...rahmatmoelyana
 
ISACA Indonesia - 9 sept 2013 - Erik Guldentops - Reflections on Value & Risk...
ISACA Indonesia - 9 sept 2013 - Erik Guldentops - Reflections on Value & Risk...ISACA Indonesia - 9 sept 2013 - Erik Guldentops - Reflections on Value & Risk...
ISACA Indonesia - 9 sept 2013 - Erik Guldentops - Reflections on Value & Risk...rahmatmoelyana
 
About tipping edge consulting v1d
About tipping edge consulting v1dAbout tipping edge consulting v1d
About tipping edge consulting v1dMadhav Chablani
 
Rahmat mulyana isaca tech session - mapping cobit 5 & per-02-mbu-2013
Rahmat mulyana   isaca tech session - mapping cobit 5 & per-02-mbu-2013Rahmat mulyana   isaca tech session - mapping cobit 5 & per-02-mbu-2013
Rahmat mulyana isaca tech session - mapping cobit 5 & per-02-mbu-2013rahmatmoelyana
 
Modul 8 enterprise architecture-2012
Modul 8 enterprise architecture-2012Modul 8 enterprise architecture-2012
Modul 8 enterprise architecture-2012Ir. Zakaria, M.M
 
ISACA Indonesia Technical Session - feat Erik Guldentops - panelist Rahmat Mu...
ISACA Indonesia Technical Session - feat Erik Guldentops - panelist Rahmat Mu...ISACA Indonesia Technical Session - feat Erik Guldentops - panelist Rahmat Mu...
ISACA Indonesia Technical Session - feat Erik Guldentops - panelist Rahmat Mu...rahmatmoelyana
 
02. cobit5 introduction
02. cobit5 introduction02. cobit5 introduction
02. cobit5 introductionMulyadi Yusuf
 
Paper seminar akuntansi pemerintah kel 1--sap berbasis akrual
Paper seminar akuntansi pemerintah kel 1--sap berbasis akrualPaper seminar akuntansi pemerintah kel 1--sap berbasis akrual
Paper seminar akuntansi pemerintah kel 1--sap berbasis akrualMulyadi Yusuf
 
Cobit 5 for Information Security
Cobit 5 for Information SecurityCobit 5 for Information Security
Cobit 5 for Information SecuritySeto Joseles
 
SNI ISO/IEC 38500 IT Governance - Chandra Yulistia
SNI ISO/IEC 38500 IT Governance - Chandra YulistiaSNI ISO/IEC 38500 IT Governance - Chandra Yulistia
SNI ISO/IEC 38500 IT Governance - Chandra Yulistiarahmatmoelyana
 
Project, Program & Portofolio Management Contribution, an Article from the PM...
Project, Program & Portofolio Management Contribution, an Article from the PM...Project, Program & Portofolio Management Contribution, an Article from the PM...
Project, Program & Portofolio Management Contribution, an Article from the PM...rahmatmoelyana
 
ISACA Indonesia Special Technical Session feat Erik Guldentops Panelist Widha...
ISACA Indonesia Special Technical Session feat Erik Guldentops Panelist Widha...ISACA Indonesia Special Technical Session feat Erik Guldentops Panelist Widha...
ISACA Indonesia Special Technical Session feat Erik Guldentops Panelist Widha...rahmatmoelyana
 
Information security in healthcare - a perspective on EMR Security
Information security in healthcare - a perspective on EMR SecurityInformation security in healthcare - a perspective on EMR Security
Information security in healthcare - a perspective on EMR SecurityMadhav Chablani
 

Viewers also liked (20)

AWS essentials S3
AWS essentials S3AWS essentials S3
AWS essentials S3
 
AWS essentials EC2
AWS essentials EC2AWS essentials EC2
AWS essentials EC2
 
Engage Your Customers with Amazon SNS Mobile Push (MBL308) | AWS re:Invent 2013
Engage Your Customers with Amazon SNS Mobile Push (MBL308) | AWS re:Invent 2013Engage Your Customers with Amazon SNS Mobile Push (MBL308) | AWS re:Invent 2013
Engage Your Customers with Amazon SNS Mobile Push (MBL308) | AWS re:Invent 2013
 
Unlocking Patterns of EA Program Failure: Lessons learned about the barriers ...
Unlocking Patterns of EA Program Failure: Lessons learned about the barriers ...Unlocking Patterns of EA Program Failure: Lessons learned about the barriers ...
Unlocking Patterns of EA Program Failure: Lessons learned about the barriers ...
 
Portfolio Management (Program & Project) by Rahmat Mulyana at OMM 43 PMI Indo...
Portfolio Management (Program & Project) by Rahmat Mulyana at OMM 43 PMI Indo...Portfolio Management (Program & Project) by Rahmat Mulyana at OMM 43 PMI Indo...
Portfolio Management (Program & Project) by Rahmat Mulyana at OMM 43 PMI Indo...
 
COBIT5-IntroductionS
COBIT5-IntroductionSCOBIT5-IntroductionS
COBIT5-IntroductionS
 
ISACA Indonesia Special Technical Session feat Erik Guldentops - Indonesia Re...
ISACA Indonesia Special Technical Session feat Erik Guldentops - Indonesia Re...ISACA Indonesia Special Technical Session feat Erik Guldentops - Indonesia Re...
ISACA Indonesia Special Technical Session feat Erik Guldentops - Indonesia Re...
 
ISACA Indonesia - 9 sept 2013 - Erik Guldentops - Reflections on Value & Risk...
ISACA Indonesia - 9 sept 2013 - Erik Guldentops - Reflections on Value & Risk...ISACA Indonesia - 9 sept 2013 - Erik Guldentops - Reflections on Value & Risk...
ISACA Indonesia - 9 sept 2013 - Erik Guldentops - Reflections on Value & Risk...
 
Asymptotic notation
Asymptotic notationAsymptotic notation
Asymptotic notation
 
About tipping edge consulting v1d
About tipping edge consulting v1dAbout tipping edge consulting v1d
About tipping edge consulting v1d
 
Rahmat mulyana isaca tech session - mapping cobit 5 & per-02-mbu-2013
Rahmat mulyana   isaca tech session - mapping cobit 5 & per-02-mbu-2013Rahmat mulyana   isaca tech session - mapping cobit 5 & per-02-mbu-2013
Rahmat mulyana isaca tech session - mapping cobit 5 & per-02-mbu-2013
 
Modul 8 enterprise architecture-2012
Modul 8 enterprise architecture-2012Modul 8 enterprise architecture-2012
Modul 8 enterprise architecture-2012
 
ISACA Indonesia Technical Session - feat Erik Guldentops - panelist Rahmat Mu...
ISACA Indonesia Technical Session - feat Erik Guldentops - panelist Rahmat Mu...ISACA Indonesia Technical Session - feat Erik Guldentops - panelist Rahmat Mu...
ISACA Indonesia Technical Session - feat Erik Guldentops - panelist Rahmat Mu...
 
02. cobit5 introduction
02. cobit5 introduction02. cobit5 introduction
02. cobit5 introduction
 
Paper seminar akuntansi pemerintah kel 1--sap berbasis akrual
Paper seminar akuntansi pemerintah kel 1--sap berbasis akrualPaper seminar akuntansi pemerintah kel 1--sap berbasis akrual
Paper seminar akuntansi pemerintah kel 1--sap berbasis akrual
 
Cobit 5 for Information Security
Cobit 5 for Information SecurityCobit 5 for Information Security
Cobit 5 for Information Security
 
SNI ISO/IEC 38500 IT Governance - Chandra Yulistia
SNI ISO/IEC 38500 IT Governance - Chandra YulistiaSNI ISO/IEC 38500 IT Governance - Chandra Yulistia
SNI ISO/IEC 38500 IT Governance - Chandra Yulistia
 
Project, Program & Portofolio Management Contribution, an Article from the PM...
Project, Program & Portofolio Management Contribution, an Article from the PM...Project, Program & Portofolio Management Contribution, an Article from the PM...
Project, Program & Portofolio Management Contribution, an Article from the PM...
 
ISACA Indonesia Special Technical Session feat Erik Guldentops Panelist Widha...
ISACA Indonesia Special Technical Session feat Erik Guldentops Panelist Widha...ISACA Indonesia Special Technical Session feat Erik Guldentops Panelist Widha...
ISACA Indonesia Special Technical Session feat Erik Guldentops Panelist Widha...
 
Information security in healthcare - a perspective on EMR Security
Information security in healthcare - a perspective on EMR SecurityInformation security in healthcare - a perspective on EMR Security
Information security in healthcare - a perspective on EMR Security
 

Similar to The Foundations of Cloud Data Storage

2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
2020 Cloud Data Lake Platforms Buyers Guide - White paper | QuboleVasu S
 
E newsletter promise_&_challenges_of_cloud storage-2
E newsletter promise_&_challenges_of_cloud storage-2E newsletter promise_&_challenges_of_cloud storage-2
E newsletter promise_&_challenges_of_cloud storage-2Anil Vasudeva
 
Security for Effective Data Storage in Multi Clouds
Security for Effective Data Storage in Multi CloudsSecurity for Effective Data Storage in Multi Clouds
Security for Effective Data Storage in Multi CloudsEditor IJCATR
 
What are the pros and cons of using cloud applications.pdf
What are the pros and cons of using cloud applications.pdfWhat are the pros and cons of using cloud applications.pdf
What are the pros and cons of using cloud applications.pdfAnil
 
Building Cloud capability for startups
Building Cloud capability for startupsBuilding Cloud capability for startups
Building Cloud capability for startupsSekhar Mohanty
 
Pros And Cons Of Cloud-Based Security Solutions.pptx
Pros And Cons Of Cloud-Based Security Solutions.pptxPros And Cons Of Cloud-Based Security Solutions.pptx
Pros And Cons Of Cloud-Based Security Solutions.pptxMetaorange
 
The Superior Reasons to Go for Cloud App Development _ Complete Guide (1).pdf
The Superior Reasons to Go for Cloud App Development _ Complete Guide (1).pdfThe Superior Reasons to Go for Cloud App Development _ Complete Guide (1).pdf
The Superior Reasons to Go for Cloud App Development _ Complete Guide (1).pdfAlzenaLimon
 
Secure Computing in Enterprise Cloud Environments
Secure Computing in Enterprise Cloud EnvironmentsSecure Computing in Enterprise Cloud Environments
Secure Computing in Enterprise Cloud EnvironmentsShaun Thomas
 
Performance,cost and reliability through hybrid cloud storage
Performance,cost and reliability through hybrid cloud storagePerformance,cost and reliability through hybrid cloud storage
Performance,cost and reliability through hybrid cloud storageNetmagic Solutions Pvt. Ltd.
 
What is cloud computing report
What is cloud computing reportWhat is cloud computing report
What is cloud computing reportProduct Reviews
 
High Cloud Computing Backbone Technology.pptx
High Cloud Computing Backbone Technology.pptxHigh Cloud Computing Backbone Technology.pptx
High Cloud Computing Backbone Technology.pptxAgusto Sipahutar
 
A Comprehensive Look into the World of Cloud Computing.pdf
A Comprehensive Look into the World of Cloud Computing.pdfA Comprehensive Look into the World of Cloud Computing.pdf
A Comprehensive Look into the World of Cloud Computing.pdfAnil
 
Real time service oriented cloud computing
Real time service oriented cloud computingReal time service oriented cloud computing
Real time service oriented cloud computingwww.pixelsolutionbd.com
 
Cloud computing(Basic).pptx
Cloud computing(Basic).pptxCloud computing(Basic).pptx
Cloud computing(Basic).pptxnischal52
 
How to create a secure high performance storage and compute infrastructure
 How to create a secure high performance storage and compute infrastructure How to create a secure high performance storage and compute infrastructure
How to create a secure high performance storage and compute infrastructureAbhishek Sood
 

Similar to The Foundations of Cloud Data Storage (20)

2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
 
E newsletter promise_&_challenges_of_cloud storage-2
E newsletter promise_&_challenges_of_cloud storage-2E newsletter promise_&_challenges_of_cloud storage-2
E newsletter promise_&_challenges_of_cloud storage-2
 
Security for Effective Data Storage in Multi Clouds
Security for Effective Data Storage in Multi CloudsSecurity for Effective Data Storage in Multi Clouds
Security for Effective Data Storage in Multi Clouds
 
What are the pros and cons of using cloud applications.pdf
What are the pros and cons of using cloud applications.pdfWhat are the pros and cons of using cloud applications.pdf
What are the pros and cons of using cloud applications.pdf
 
Epaper
EpaperEpaper
Epaper
 
Cloud capability for startups
Cloud capability for startupsCloud capability for startups
Cloud capability for startups
 
Building Cloud capability for startups
Building Cloud capability for startupsBuilding Cloud capability for startups
Building Cloud capability for startups
 
Pros And Cons Of Cloud-Based Security Solutions.pptx
Pros And Cons Of Cloud-Based Security Solutions.pptxPros And Cons Of Cloud-Based Security Solutions.pptx
Pros And Cons Of Cloud-Based Security Solutions.pptx
 
The Superior Reasons to Go for Cloud App Development _ Complete Guide (1).pdf
The Superior Reasons to Go for Cloud App Development _ Complete Guide (1).pdfThe Superior Reasons to Go for Cloud App Development _ Complete Guide (1).pdf
The Superior Reasons to Go for Cloud App Development _ Complete Guide (1).pdf
 
B042306013
B042306013B042306013
B042306013
 
Secure Computing in Enterprise Cloud Environments
Secure Computing in Enterprise Cloud EnvironmentsSecure Computing in Enterprise Cloud Environments
Secure Computing in Enterprise Cloud Environments
 
Performance,cost and reliability through hybrid cloud storage
Performance,cost and reliability through hybrid cloud storagePerformance,cost and reliability through hybrid cloud storage
Performance,cost and reliability through hybrid cloud storage
 
What is cloud computing report
What is cloud computing reportWhat is cloud computing report
What is cloud computing report
 
High Cloud Computing Backbone Technology.pptx
High Cloud Computing Backbone Technology.pptxHigh Cloud Computing Backbone Technology.pptx
High Cloud Computing Backbone Technology.pptx
 
A Comprehensive Look into the World of Cloud Computing.pdf
A Comprehensive Look into the World of Cloud Computing.pdfA Comprehensive Look into the World of Cloud Computing.pdf
A Comprehensive Look into the World of Cloud Computing.pdf
 
Final
FinalFinal
Final
 
Real time service oriented cloud computing
Real time service oriented cloud computingReal time service oriented cloud computing
Real time service oriented cloud computing
 
Cloud computing(Basic).pptx
Cloud computing(Basic).pptxCloud computing(Basic).pptx
Cloud computing(Basic).pptx
 
How to create a secure high performance storage and compute infrastructure
 How to create a secure high performance storage and compute infrastructure How to create a secure high performance storage and compute infrastructure
How to create a secure high performance storage and compute infrastructure
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 

The Foundations of Cloud Data Storage

  • 1.
  • 2. The Foundations of Cloud Data Storage The growing mountain of available data is matched by an equally high desire to access it. The rise of cloud networks makes managing this overwhelming amount of information not only possible, but highly beneficial. https://cloud.google.com/pricing Here you’ll find an overview some specific solutions, including market leaders like ​Google Cloud Platform​ and ​Amazon Web Services​, ​Microsoft Azure​, as well as competitive services from companies like ​RackSpace​ and ​HP Helion​. Requirements From a tools perspective, you'll need a code editor and several browsers for testing. You can use whichever code editor you prefer. To explore a set of online services, like cloud data storage, you'll need an internet connection. Most of the demonstrations and testing could be done with recent versions of standard based browsers like Google Chrome. Although many of the cloud data storage platforms offer trial periods, in most cases, you'll ​need to enable billing​, which requires either a credit card or a bank account. That's it for tools and real-world requirements. Cloud Storage Fundamentals 1
  • 3. https://cloud.google.com/storage/docs/apis From a knowledge perspective, you should have a general understanding of how server code modules like ​API​s work. This is a full exploration of cloud data storage platforms and implementations, and you'll get the most benefit if you ​keep your mind open​ for other ways that you can apply the same lessons to your own needs. The absolute best thing you can bring to this training is your imagination. Cloud Storage Fundamentals 2
  • 4. Disclaimer The information contained in this manual is for general information purposes only, and provided as-it-is and while I try to keep the information up-to-date and correct, I make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the website or the information, products, services, or related graphics contained on the website for any purpose. Any reliance you place on such information is therefore strictly at your own risk. In no event, will I be liable for any loss or damage including without limitation, indirect or consequential loss or damage, or any loss or damage whatsoever arising from loss of data or profits arising out of, or about, the use of this manual. Through this manual you’ll find link to websites which are not under my control. I have no control over the nature, content and availability of those sites. The inclusion of any links, does not necessarily imply a recommendation or endorse the views expressed within them. Every effort is made to keep this manual’s content up-to-date. However, I take no responsibility for, and will not be liable for, the website where this document is shared, being temporarily unavailable due to technical issues beyond my control. Jan-Erik Finlander - 2017 Cloud Solutions Architect Cloud Storage Fundamentals 3
  • 5. Contents Requirements 1 Contents 4 1. Introduction to Cloud Data Storage 6 1.1. Understanding cloud data storage 6 Cloud Data Storage Benefits 6 Cloud Data Storage Risks 7 Cloud Data Storage Services 7 1.2. Calculating Costs 8 Cloud Data Storage Costs 8 Cloud Pricing Calculators 9 Google Cloud Platform’s Pricing Calculator 9 AWS’ Pricing Calculator 10 Microsoft Azure’s Pricing Calculator 11 1.3. Cloud Storage Solutions 11 2. Cloud Storage Options 15 2.1. Working with Object Storage 15 2.2 Managing Database Content 17 Cloud Relational Database Features 18 Cloud Relational Database Access 19 Cloud Non-Relational Databases 19 Non-Relational Database Management 19 Cloud Database Security 20 2.3. Targeting Storage Availability 20 2.4. Assessing API interconnectivity 21 3. Data Storage Issues 23 3.1. Understanding Data Storage Issues 23 Service Level Agreement (SLA) 23 3.2. Establishing and Maintaining Secure Storage 25 Cloud Data Storage Security 25 3.3. Handling Latency 27 Cloud Storage Fundamentals 4
  • 6. Data Cloud Storage Latency 27 3.4. Managing Scalability and Replication 30 Why would you use replication? 31 4. Data Storage Vendors 32 4.1. Google Cloud Platform 32 4.2. Amazon Web Services (AWS) 34 Amazon Cloud Databases 36 4.3. The Microsoft Cloud 36 Azure Blob Command Tools 37 4.4. HP Helion Cloud 38 HP Cloud Object Storage Access 38 Sources 40 Cloud Storage Fundamentals 5
  • 7. 1. Introduction to Cloud Data Storage 1.1. Understanding cloud data storage Like data itself, cloud data storage is a sprawling and continuously evolving topic. Cloud data storage refers to a repository for digital information on one or more servers, in one or more locations. Let’s focus on corporate and enterprise level solutions, not personal file hosting, although there is some overlap. Cloud data storage is a concept whose time has come with availability of cloud based network infrastructure extended to the market. Amazon opened the floodgates in 2006 with the introduction of ​Amazon Web Services S3​. Today, there is an ever-growing array of companies, that offer cloud data storage services including: ​Amazon​, ​Box​, ​Google​,​ HP Helion​, ​Azure​, ​Oracle​,​ RackSpace​, and ​Zetta​. Cloud data services have taken off largely because they fit a variety of use cases. They're great for application data regardless of where the user is. Cloud Data Storage Benefits ● Application data ● Big data ● Archiving and backups ● Long-term storage ● Disaster recovery Cloud Storage Fundamentals 6
  • 8. Big data, both in terms of file size and quantity of records, routine archives and backups, long term storage of all types of records, and in case of emergency, disaster recovery. There are numerous impactful benefits to going the cloud data storage route, among them, are ​access​. Your data is available from pretty much anywhere on the planet that there is an internet connection.​ Scalability​, cloud storage is, for intents and purposes, infinite, and can grow with your needs, where security, not only can your data access be restricted to authorised users, but since cloud storage offers both ​zonal ​and ​geographic redundancy​ the possibility of total data loss is ​severely limited​. And one of the biggest gorillas in the room is cost. Hosting your data on the cloud means a significant reduction​ in self-maintained servers. Which not only cuts the actual physical footprint but also the man hours required to maintain those servers. Cloud Data Storage Risks ● Security ● Privacy There are risks to be considered. Perhaps paramount in the age of the cyber hacker is ​security​. Cloud storage providers must implement ​strong and continually updated strategies​ to keep your data from being compromised.​ Privacy​ goes together with ​security​. Since we're talking about data stored in one or more off-site facilities you must ensure its encrypted and accessible only by authorised users. You should be aware of the privacy laws governing the data centre locations. Network issues should also be considered. While ​downtime ​leading to data inaccessibility​ is perhaps the ultimate worry, backup and restoration speeds are also affected by available bandwidth and demand. Cloud Data Storage Services ● Online management ● API Access ● Optimisation The various cloud data storage hosts offer a wide spectrum of services, but almost all provide online ​management​ of storage including import, export, and backup operations. Cloud Storage Fundamentals 7
  • 9. API access​ for automated data storage control. And methods for ​optimising operations whether it's establishing access control lists for authorised users, or setting up transfer of multiple data objects in parallel for greater efficiency. So, that's a cloud's eye view of cloud data storage. Next, we'll take a closer look at one of the key factors, cost. 1.2. Calculating Costs Calculating costs for cloud data storage can be a daunting task at best. Whether you are trying to make a basic decision as to its cost-effectiveness versus in-house storage or forecasting expenses for multiyear budgets, there are a good number of factors to consider. Let’s look at the most pertinent of those, as well as some useful tools. Although prices vary, as you would expect, there are a few guiding principles that seem to hold across the board. Cloud Data Storage Costs ● Pay per resource used vs. flat fee ● Combination of charges ○ Storage ○ API operations ○ Network transfers ● Higher volume, cheaper rate First, the clear majority of cloud data storage companies set their pricing on a ​pay per resources​ used basis versus a​ flat monthly​ or ​annual fee​. Pay per use philosophy is applied to most, if not all aspects of the service. Second, storage pricing is often a combination of charges. You can expect your bill to include a charge for ​storage​, ​API operations​, such as listing and downloading, and for ​transfers​ in the network. Now although this might give you pause, the rates for each of these areas are generally very inexpensive. Finally, ​the higher the volume​, ​the cheaper the rate​. While this approach doesn't apply across the board, many companies offer lower prices for greater uses of both storage and network transfer. The first cost comparison you might want to run pits keeping your data on your own servers versus moving it to the cloud. In the privately-owned side, you have the very real upfront capital expenditures such as hardware purchases, installation and configuration. Cloud Storage Fundamentals 8
  • 10. Own Storage Cloud Storage ● Upfront CapExp ● Ongoing OpEx ● Ongoing OpEx ● Basically, a rental The ongoing operating expenses of maintenance and replacement. With the cloud, there are no such capital expenses​. It is all operating costs. And while it's true that those operating costs are perpetual, cloud data storage is, at its heart, a rental after all. Many IT managers have opted to go the hybrid route, where they use both their own existing servers with those from a cloud storage host. The cost calculations for such an arrangement take on another level of complexity, but it might be the right fit for your organisation. Cloud Pricing Calculators To give you a concrete idea of how pricing for cloud data storage works, let's look at some of the handy tools made available by vendors. Google Cloud Platform’s ​Pricing Calculator https://cloud.google.com/products/calculator Not all data is equal. Data that does not need to be accessed as frequently or as quickly can be stored at a lower rate. Backup data, which you don't need to be as responsive as application data, can be kept for less in Durable Reduced Availability Cloud Storage Fundamentals 9
  • 11. storage. FREE ​LIMIT PER DAY PRICE ​ABOVE FREE LIMIT (PER UNIT) PRICE UNIT Stored data 1 GB storage $0.18 GB/Month Entity Reads 50,000 $0.06 per 100,000 entities Entity Writes 20,000 $0.18 per 100,000 entities Entity Deletes 20,000 $0.02 per 100,000 entities Small Operations Unlimited Free - For data, which you access even less frequently, like disaster recovery data, consider using ​Cloud Storage Nearline​ to get the lowest rate. AWS’ ​Pricing Calculator https://calculator.s3.amazonaws.com/index.html Make sure that you click on the ​Amazon S3​ link on the left. That's the Simple Storage Service. This is their most accessible tier. You might also want to look at the pricing for Amazon Glacier​, which has their lower-cost lower availability data. Cloud Storage Fundamentals 10
  • 12. Microsoft Azure’s ​Pricing Calculator Obviously, figuring the cost of going with cloud data storage is only part of making the business case for the move. But now you should have a better understanding of the various facets you'll need to examine. 1.3. Cloud Storage Solutions If you've considered the Cloud data storage market at all you know that it's a wildly growing competitive one with many players across the spectrum. In this lesson, we'll take an overview to five of the top contenders. Amazon Web Services​, also known as AWS. AWS is a full cloud platform with services in computing, databases, analytics, applications, and deployment as well as storage. The primary object storage system is the Amazon ​Simple Storage Service​ referred to with another acronym ​S3​. S3 is a very straightforward but extremely robust object storage. There is no limit to quantity and individual objects can be as large as five terabytes. S3 features a high-degree of replication across multiple regional data centres. Lower cost block storage is available through Amazon's Reduced Redundancy Storage option or it's Amazon Glacier service. Cloud Storage Fundamentals 11
  • 13. https://aws.amazon.com/products/storage If you’re working with Amazon's compute service, ​EC2​, you can also use their ​Elastic Block Store​ or ​EBS ​feature. EBS is like a more traditional file system while remaining highly scale-able. Amazon also offers several database options for structured data storage, ​Amazon RDS​ for relational SQL databases and Amazon ​DynamoDB​ for non-relational NoSQL. The ​Google Cloud Platform​ provides an ever-growing service that leverage on their global infrastructure.​ Cloud storage ​is Google's primary o​bject storage service​ and, offers limitless storage ​automatically replicating ​across many data centres ​located around the world. ​Cloud storage objects can be stored in different types of buckets with varying degrees of accessibility and price points. Google's relational database service ​Cloud SQL​ is ​MySQL​-based and allows you to spin up database instances as needed. For data appropriate to a non-relational platform, you can turn to the ​Schemaless Cloud DataStore​. Google recently brought online a new entry in the non-relational space called ​BigTable​, and this is targeted to massive data sets. Microsoft Azure​ in its storage realm, objects like documents and media files are handled by ​Azure Blob Storage​ via a REST interface and client libraries for​ .NET​, ​C++​, Java​, ​Node.js​, and ​Android​ among others. Cloud Storage Fundamentals 12
  • 14. https://azure.microsoft.com/en-us/services/storage/tables The ​Azure Table Storage​ service manages non-relational data in a NoSQL fashion complete with auto-load balancing. A separate service SQL Database takes care of relational data. https://www.rackspace.com/cloud/files RackSpace​ puts its Cloud data storage service under the infrastructure umbrella with three targeted offerings. Cloud Files is for data objects and boast triple replication with a simplified pricing structure. The RackSpace Cloud Backup service automatically employs block-level compression and 256-bit key encryption to keep your data compact and secure. Cloud Storage Fundamentals 13
  • 15. HP Cloud Object Storage and HP Cloud Block Storage. Their Cloud object storage services support the open-stack standard with both Java and Ruby APIs. HP Cloud Block Storage pairs with HP Cloud compute instances but the storage persist until they are explicitly deleted. Cloud Storage Fundamentals 14
  • 16. 2. Cloud Storage Options 2.1. Working with Object Storage Let’s see how objects are typically handled by the various services, benefits you should expect and methodology to look for. What do we mean by an Object?​ Objects are discrete digital entities, which could mean documents, files, both uncompressed and compressed, images, video, audio, all media. Because of the nature of cloud storage, neither quantity nor object file size is a problem - ​Blobs​ (​Binary Large Objects​), are easily accommodated. Many services can handle a single put of any file up to ​5 gigabytes​ in size. Larger files should be ​split into multiple parts​, also known as segmenting and uploaded as part of an overall Object, typically identified with a unique ID. Structurally object storage is organised on 2 levels, in the initial layer is the Container, also called an ​Asset Group​ or more commonly a ​Bucket​. You can have as many Containers as you need, each container has a unique ID so that they can be accessed globally. Containers are project specific and on all the services and cannot be nested. However, you can create folders within the Containers to create a hierarchy. For services with multiple data centres around the world, you can specify the region to host the Container. Cloud Storage Fundamentals 15
  • 17. Containers are great for ​reducing latency​ to your targeted markets. Each individual Object is stored within a specific Container, ​Objects ​cannot be shared across ​Buckets​, although they can be duplicated. Frequently, you'll find the ability to create identifying metadata for your Objects via main value pairs. When the Object is uploaded to the Container, once its integrity is validated, it's available. Similarly, once you delete it, it's no longer accessible there are ​no undos​, so backups are essential. Many services offer ​server-side encryption​ for security, although you're also free to use client-side encryption prior to transferring the Object. As mentioned in an earlier movie, many Cloud platforms provide several varied storage classes with lower costs for data that you don't access as frequently or with as much redundancy. Some services, like Google Cloud Storage, apply them at the Container, or Bucket level. Cloud Storage Fundamentals 16
  • 18. While others, such as Amazon S3 allow you to specify the storage class for individual Objects. Once you've set up your Containers and began to populate them with Objects you should ensure that only the people you want to access them can. Object Storage is typically,​ by default​, private and initially only accessible by the owner or primary administrator. Permissions can be broadened however all the way up to publicly available. Most services have mechanisms in place for establishing authenticated users. Often via ACLs or Access Control Lists. All services provide some ​API libraries​ for Object management and many platforms offer a full spectrum of languages from which to choose: Anything from ​Java​, ​.net​, ​PHP​ to ​Python​. And with them you can get a full list of your Containers, find out what is in each one and then store, retrieve, copy and delete those Objects. More advanced operations such as targeting specific versions of the Object are available on specific platforms. Mastering Containers and Objects is essential to much of productive Cloud Data Storage, Data Record Storage and databases and other systems is another major aspect of the Cloud Data Storage world. 2.2 Managing Database Content Cloud data storage can handle structured data as well as unstructured blobs. There are two major strains of databases supported, ​Relational Databases and ​Non-Relational Databases​. Relational Databases are typically SQL databases and the Non-Relational use NoSQL which is short for not only SQL. Some cloud data storage platforms like Google and HP ​focus on MySQL​ while others support a range of MySQL variations. Cloud Storage Fundamentals 17
  • 19. RackSpace​ supports ​MySQL Percona Server​ and ​MariaDB​. Some services such as AWS work with other Relational Databases including ​Oracle​, ​SQL Server​ or PostgreSQL ​as well as ​MySQL​. A few cloud data storage services have opted for the Proprietary route like Microsoft Azure with their SQL database offering. AWS Google Microsoft RDBMS RDS - all major MySQL SQL Azure NoSQL buckets S3 or Glacier Cloud Storage Azure Blobs NoSQL Key-Value DynamoDB Cloud Datastore Azure Tables Streaming ML or Apache Mahout Custom EC2 Prospective Search & Prediction API StreamInsight NoSQL Document or Graph MongoDB on EC2 Freebase MongoDB on Azure NoSQL - Column Hadoop Hbase Elastic MapReduce + S3 & EC2 Cloud Data Proc HDInsight Dremel / Warehousing Redshift BigQuery SQL Data Warehouse Cloud Relational Database Features ● Replication across data centres ○ Increase data durability ○ Decrease latency ● Scale up / down data instances ● Backups created automatically Many cloud data storage services take ​SQL Servers​ to the next level by replicating the databases across data centres, increasing data ​durability ​and decreasing ​latency​. The cloud database servers scale very efficiently, spinning up new database instances or down as needed. Backups to multiple locations are often created automatically​, allowing ​point-in-time​ recovery. Access to cloud-based databases is broad overall but specific APIs for database management are limited on a service-by-service basis. Cloud Relational Database Access ● HTTP requests supported across the board ● Specific APIs vary by services provider Cloud Storage Fundamentals 18
  • 20. ● New APIs routinely introduced All hosts support standard HTTP requests for accessing data but you'll have to check each service to verify that an API for the language of your choice is available. And keep in mind that it is by no means a static situation. Cloud Non-Relational Databases ● Proprietary NoSQL frameworks for each vendor ○ Amazon DynamoDB ○ Rackspace ObjectRocket ○ Google Cloud Datastore ● Appropriate for massive datasets ○ AWS Redshift ○ Google Cloud Bigtable Many services add additional APIs on a continuing basis. If your applications lend themselves to non-relational data with relatively straightforward queries, the most responsive​ database system capable of scaling to massive size is ​NoSQL​. All the services that provide NoSQL alternatives, provide their own framework like Amazon​ DynamoDB​, RackSpace ​ObjectRocket​, or Google ​Cloud DataStore​. Both ​key/value​ and ​document-based​ NoSQL systems are available. Amazon DynamoDB works with either while Microsoft DocumentDB is document focused, and Google Cloud DataStore is key/value oriented. NoSQL's relatively simple structure opens the door to efficient processing of big data. Several cloud data hosts are taking advantage of NoSQL, like ​AWS Redshift service or the recently introduced ​Google Cloud BigTable​. Non-Relational Database Management ● Platform specific but robust ○ Create, update, and delete tables ○ Create, update, and delete content (items or entities) ○ Create, update, and delete content attributes Management access to the NoSQL services are platform-specific but tend to be very robust. With most of them, you'll be able to programmatically create, update and delete tables as well as perform similar operations on table contents which may be called items or entities and their attributes as well. Cloud Database Security ● Replication Cloud Storage Fundamentals 19
  • 21. ○ Automatically initiated ○ Geographically separated facilities ○ API controlled ○ Selectable read consistency and write verification With both SQL and NoSQL solutions, data security is ​enhanced ​by ​automatic replication​, often across geographic data centres. Replication can also be implemented ​via API​ calls. Numerous services allow you to optimise the degree of read consistency and right verification that your data requires. Cloud database storage is just as vigorous and vital as its sibling Object Storage. However, there is more diversity in the feature sets found on the various providers. You'll ​need to research carefully​ to find the right fit for your organisational database needs. 2.3. Targeting Storage Availability By default, all containers and objects are initially private and ​only accessible​ by the project ​administrator​, often referred to as the owner. Only the owner can grant permissions to others to read or interact with a container and its objects. Grantees can include: people, individuals, identified by ID number or email, groups, like an email group, or domains, often expressed as a subnet range or an IP address. Collectively, these permissions are called ​Access Control Lists​, or ACL. Various services support ACLs written in a variety of languages, but the most common are XML and JSON. It's quite common for the Cloud Data services to make APIs available in multiple languages, including Java, C++, PHP, Python, Node.js, Ruby, and others. We'll take a closer look at APIs in the next lesson. Once you've established authorised authenticated user access, Cloud Data Storage Cloud Storage Fundamentals 20
  • 22. give you full direct management capabilities, just as you would have over in house storage. 2.4. Assessing API interconnectivity A good API, application programming interface, is truly worth its weight in gold considering the time and effort it saves you in coding, testing, and debugging. The Cloud data storage vendors and for that matter community fully embrace APIs across the spectrum which can create a bit of a problem, because there are so many options to choose from. http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingAWSSDK.html Because most of the Cloud data storage solutions are part of a larger platform, most of the associated APIs are contained within a series of overall SDKs. Each written for a separate language. The AWS SDKs, are available from this URL for ​Java​, ​.NET​, ​PHP​, ​Ruby​, and the Python ​interface to AWS which is called ​Boto​. The Google Cloud Platform has a similar SDK for overall functionality, although theirs is not broken out by language and requires Python 2.7. Cloud Storage Fundamentals 21
  • 23. https://cloud.google.com/sdk Here are the ​APIs available for Google Cloud Storage​. As I scroll down, you'll see that there's support for .NET, Java, JavaScript, Objective-C, PHP, and Python. https://cloud.google.com/storage/docs/json_api/ Leveraging the available ​API​s such as those found in AWS and Google Cloud Storage is a crucial strategy towards efficient storage and IT management. Cloud Storage Fundamentals 22
  • 24. 3. Data Storage Issues 3.1. Understanding Data Storage Issues The benefits of cloud data storage are undeniable: ​Global access​, ​no upfront capital expense​, and virtually ​unlimited capacity​ to mention a few. But cloud data storage is not without its dark side. The number one issue must be ​security​. The flip side of being able to access your data from anywhere is that people from anywhere can potentially access your data. First, there's the ​vulnerability ​of transferring your data to and from the cloud. Industrial strength encryption​ is typically the answer, but you must have a robust encryption key management system in place to maintain long term accessibility. Next, you need to be confident in the cloud service provider's own security when your data is at rest to ​guard against data breaches​. This protection needs to be up-to-date and evolving because the threat is certainly ongoing. To this end, you want to make sure that access logs are complete and monitored routinely. Service Level Agreement (SLA) ● Tool for reducing down time As with any network, there's always a problem of down time. Reducing network inaccessibility to an ​absolute minimum​ is a key requirement and one the service providers work diligently toward addressing. While maintaining the external network is ​beyond your control​, you do have a key tool for dealing with any such problems, the ​Service Level Agreement​ or ​SLA​. A solid SLA details up time expectations and the consequences, typically service credits if things go south. ​Not only​ do you want to make sure your data is ​accessible​, Cloud Storage Fundamentals 23
  • 25. quite often you also want to ensure that it's delivered as ​quickly ​and ​efficiently ​as possible, especially if you're working with application data. With global networks, latency can be a real concern. If you have a worldwide audience, you probably want to take advantage of a data cloud storage host with the worldwide reach of multiple data centres located geographically closest to your own markets. One of the major selling points for cloud data storage is ​scalability​. When your traffic increases, cloud storage hosts are set to share the load among multiple servers. If your traffic lessens, the number of servers in play shrink as well. This impacts your bottom line, as cloud data storage is a ​pay for what you use​ service. Cloud Storage Fundamentals 24
  • 26. While the various providers are designed to scale object storage, there are several techniques you can apply to optimise the practice. Like any service, cloud data storage is not​ without its problems​. As always, the first step in addressing them is to​ identify the issues​. 3.2. Establishing and Maintaining Secure Storage Secure organisational data is a topped, ranked, if not number one task for IT departments. Storing your data in a remote, offsite facility requires a robust strategy, and an ongoing participation, by both client and host, a fact that cloud data storage providers are aware of. Cloud Data Storage Security ● In transit: Transferring to and from storage host ● At rest: stored at remote facility You can break hosted security concerns into two main areas. First, ​in transit​, when the data is being​ transferred to or from your system​. And second, ​at rest​, when the data is on the ​remote storage​ server. In transit​, data can be protected in several ways, none of which are mutually exclusive. ​SSL and HTTPS​ protocols should ​always​ be used to secure the data's travels. Extremely sensitive data can also be ​encrypted ​on the ​client-side​ prior to transfer. Naturally, this means that you'll have to have a solid key encryption management system in place. If you choose not to transfer encrypted data, the cloud data storage host can encrypt it for you, so that's it's secure while at rest. Services like Amazon S3 allow you to establish​ bucket policies​ that will stop the transfer unless the data's header contains a ​request​ to encrypt the data server-side. AWS also supports another layer of protection for server-side encryption. Key management service, also called ​KMS​. Cloud Storage Fundamentals 25
  • 27. KMS gives you control over ​server-side encryption​ keys, preventing those keys from ever being exported, and providing a full audit trail of their use. There is an additional charge for using KMS managed keys however. Another way to secure your data is by ​versioning​, some variation of which is offered by several cloud data storage services, including S3. Once versioning is enabled, your data is protected from accidental deletion or overwrite. Versioning is ​typically enabled​ at the ​container​, or ​bucket​, level. From a security standpoint, it's a good idea to enable logging. Logging​ is disabled by default on most services. And once set up, all requests for server access is tracked, and typically includes requester details, container name, object name, request time, request action, response status, and the error code, if any. Cloud Storage Fundamentals 26
  • 28. Logs are stored in a designated container on the cloud data storage host, and can be retrieved and examined at any time. Because they are treated like any other storage object, they will incur a charge, and you should set up a policy for archiving or deleting them after a set period. Although storing your data remotely is undeniably a risk, with ​heightened awareness​, and fully taking advantage of available cloud data storage tools, you can minimize that risk as much as possible. 3.3. Handling Latency Speed matters. Especially the speed at which your data travels from where it is stored to where it needs to go. Latency is a real cloud storage data factor, and what options you have for optimising it. Data Cloud Storage Latency ● Location is important ● Store data closest to user base ● Specify container’s region ○ US, Europe and Asia Latency ​can be defined as the amount of time it takes one packet of data to get from location to another. In terms of Cloud data storage, we're talking about the length of time from when the request is received by the data hosting server, to when the response is received by the requesting client. Latency is a ​key defining​ characteristic for various storage classes. To further optimise latency, the most important is location​. Whenever possible, it's best to house your data ​closest ​to the folks who want it. https://cloud.google.com/storage/docs/bucket-locations Most Cloud data storage vendors allow you to specify the region when creating a container for objects. Typically, the regions available are sizable in scope like the US, Europe, or Asia. And, you should place your storage nearest your market. Cloud Storage Fundamentals 27
  • 29. https://aws.amazon.com/about-aws/global-infrastructure There is a trend, to break up the large regions and allow a finer container placement. Google Cloud Platform ​Bucket Locations​ service, can be used with their ​Durable Reduced Availability​ storage class. You can specify that you want your objects to be housed in the eastern US, the western US, or central US, or any combination thereof. Or any other regions that are available. What else can you do to lessen latency and improve performance?​ Believe it or not, the actual naming of an object, and / or its container, can have serious impact on response time. Most Cloud data storage services index ​alphabetically​ their key name. It's a common practice to incorporate a ​time stamp​ as part of that ​ID​. This has the effect of grouping objects that were transferred at about the same time on the same server partition; therefore, it's recommended to preface your object and container names with a random hash string, which will have the effect of spreading them out on varying partitions. When it comes to structured data versus unstructured blobs, latency is tied to data consistency. Because database entries can be modified at any point, the read write times are impactful, and the more emphasis placed on shorter spans, thus heightening data consistency, the greater the latency. Cloud Storage Fundamentals 28
  • 30. https://azure.microsoft.com/en-us/blog/azure-documentdb-is-now-available-in-central-us Microsoft Azure DocumentDB​ has identified this as a key area for their service, and now offers four distinct levels of consistency: ​Strong​, ​Bounded Staleness​, ​Session​, and ​Eventual​. The Strong level of consistency results in the highest latency, while the Eventual level is the lowest. Understanding how latency works, and the associated options, is a pivotal step in positioning your Cloud data storage properly. Cloud Storage Fundamentals 29
  • 31. 3.4. Managing Scalability and Replication The raw power of today’s cloud data storage industry is really apparent when you consider two defining characteristics: ​Scalability ​and ​Replication​. Scalability ​is the ability of a system to efficiently adapt to handle the current workload. The vastness of the networks now available for cloud data storage means that there's virtually ​no limit​ to the number of objects or the amount of data that you can store online. This scalability is, for the most part, effortless for customers of these services, because the infrastructure is already in place and being maintained by the service providers. On the bulk of cloud data storage hosts, there are an infinite number of containers available, and each container is ​infinitely large​. When you try to store more objects in a container than can be physically contained in a single drive, the data will be written to other systems while still existing within the same virtual bucket. Cloud Storage Fundamentals 30
  • 32. Although the image that most frequently comes to mind when you say scalability, is one of the service increasing its processes to meet surging tasks, scaling up, the ability to discard unneeded processes, scaling down, is just as important. Because cloud data storage runs on a pay for what you use model, most storages calculate their storage charge on a monthly average use. Now if your average goes down, the charge goes down. Replication ​is the ​duplication ​of data in real time over a network. It's a common practice among cloud data storage platforms to automatically replicate your objects when they're added to your containers, and store the redundant objects in multiple devices, usually in the same region. When the object is replicated, everything remains the same. The key name, the metadata, the container, everything. The primary goal of replication is data protection, or durability, making sure that your data objects are available. Durability ​is the probability that an object will be the same as when you transferred it after one year. The greater the likelihood that your data will be available, the higher the durability. 100% durability would mean that an object could not be lost. 90% durability means that there's a one in ten chance. AWS rates their ​S3 standard storage​ class at ​99.999999999%​ durability. This means that if you store, say 10,000 objects with them, one might get lost every 10 million years or so. This automatic replication is to other devices within the same region. Now you can also replicate your data to a different region. Why would you use replication? 1. You can reduce latency by housing your objects as close as possible to your markets. 2. Regulatory compliance may mandate that your data be stored redundantly in remote locations. 3. Your internal infrastructure may have remote offices that require access to the same data. Cloud Storage Fundamentals 31
  • 33. 4. Data Storage Vendors 4.1. Google Cloud Platform Google Cloud Platform is one of the most all-encompassing online services. With major entries in the data storage fields backed by an extremely robust global infrastructure. https://cloud.google.com Google Cloud integrates a full spectrum of products and one that's constantly evolving. Most, if not all, of the product line works smoothly with the other products. Applications built with Compute Engine can easily pull assets from Cloud Storage. You can, of course, use the storage products independently of any other service in the platform. For object storage on the Google Cloud Platform you'd use ​Cloud Storage​. With unlimited capacity and worldwide data centres your data objects can be housed in any of Cloud Storage tiers. In order of decreasing cost those tiers are: ​Standard Storage​, for objects that require the highest degree of durability and access. ​Durable Reduced Availability​, or ​DRA​, perfect for data backups and other objects that do not require the highest degree of availability. And ​Cloud Storage Nearline​, intended for backups, archives, disaster Cloud Storage Fundamentals 32
  • 34. recovery, and other data where increased latency is acceptable. https://cloud.google.com/products The actual storage in Cloud Storage is based on buckets and objects. You create a bucket that holds one or more objects. Access to the buckets and objects is handled in a variety of ways. The API is accessible via XML in either Java or Python and JSON through Java, JavaScript, Python, Go and PHP. Relational data is handled by Google Cloud SQL which supports MySQL. With​ Cloud SQL​, you have the choice of hosting regions, US, Europe, or Asia, with 100 gigabytes of storage and up to 16 gigabytes of RAM per database instance. With Cloud SQL, you get all the power of MySQL with automatic replication of your data across multiple data centres. Additional peace of mind comes from the point in time backup and recovery services. Importing and exporting of your existing data is supported by commonly used tools like ​MySQL dump​, ​MySQL wire protocol​, and ​JDBC​. Much of the power of Cloud SQL stems from the fact that an application can spin up Cloud Storage Fundamentals 33
  • 35. database instances on an as needed basis. These instances can be accessed in several ways including the ​Google Cloud Console​. Additionally, you're free to use the ​MySQL ​client through the ​command line​ or the JSON API​. Non-relational data is addressed by ​Cloud DataStore​ which uses schemaless NoSQL. Cloud DataStore features built in redundancy with automatic replication across data centres as well. Through NoSQL Cloud DataStore supports ACID transactions for reliable processing. And access to Cloud DataStore and NoSQL is available through the Google Cloud Console interface a command line tool called GCD and a full featured JSON API. Google Cloud's latest offspring in the data storage space is Bigtable. Also, NoSQL based ​Bigtable​ is optimised to handle enormous amounts of data ranging from terabytes to petabytes with single digit millisecond latency the engine that drives Bigtable is the same one that Google uses for its top of the line applications including ​Gmail​, ​Google Maps​, and ​Google Analytics​. Accessible from the open source ​HBase API​, which integrates nicely with Hadoop, Bigtable encrypts data in transit as well as at rest. 4.2. Amazon Web Services (AWS) Amazon Web Services, frequently known as AWS, was the first major player to enter the cloud data storage field, and continues to be a significant force in the market, with products for every corner of the computing realm, including formidable entries in all types of data storage. Amazon's network too is rightfully world famous with a reliable, secure infrastructure, capable of serving entrepreneur to enterprise. Object storage on AWS falls to S3, short for Simple Storage Service. S3 is straightforward and easy to use, while remaining extremely flexible and powerful. Boasting automatic redundancy, S3 is highly scalable and secure. Choose between three different service levels to find the right fit for your data, Standard Storage, with the highest degree of durability, Reduced Redundancy Storage, which, at a lower cost, is perfect for non-critical data, and Amazon Glacier, Cloud Storage Fundamentals 34
  • 36. intended for infrequently accessed data, such as archives and disaster recovery files. Amazon Web Services has a wide range of products, all of which are integrated with each other. https://aws.amazon.com/solutions AWS supports both ​relational and nonrelational databases​. Their primary SQL solution is Amazon RDS, Relational Database Service. Their primary SQL solution is Amazon RDS, Relational Database Service, which supports MySQL, Oracle, SQL Server, and PostgreSQL. The exact feature set of RDS is dependent on which database engine used, although automatic backups are enabled by default across the board. Fully scalable, RDS spins up database instances as needed. Configure your instances to use from one to Cloud Storage Fundamentals 35
  • 37. 32 virtual CPUs, with one to 244 gigabytes of memory. Amazon Cloud Databases If You Need Consider Using Product Type A managed ​relational database​ in the cloud that you can launch in minutes with a just a few clicks. Amazon RDS Relational Database A fully managed MySQL compatible ​relational database with 5X performance and enterprise level features. Amazon Aurora Relational Database A managed ​NoSQL database​ that offers extremely fast performance, seamless scalability and reliability Amazon DynamoDB NoSQL Database A fast, fully managed, petabyte-scale ​data warehouse​ at less than a tenth the cost of traditional solutions. Amazon Redshift Data Warehouse To deploy, operate, and scale in-memory cache based on memcached or ​Redis​ in the cloud. Amazon ElastiCache In-Memory Cache Help migrating your databases to AWS easily and inexpensively with zero downtime. AWS Database Migration Service Database Migration https://aws.amazon.com/products/databases Data is ​automatically replicated across three regional AWS data centres​, and optionally you can take advantage of Amazon's new cross-region replication service to further spread your data around the globe. 4.3. The Microsoft Cloud The full range of services from computing to analytics, to the Internet of things integration, is as robust and compelling as any in the market. The Microsoft brand also brings a distinct familiarity and a collection of compatible services and tools, such as Active Directory and Visual Studio. Unstructured data is fully supported with Azure Blobs. Boasting over 40 trillion stored objects and an average of 3.5 million requests per second, Azure Blobs provides high durability and accessibility. Azure Blob Command Tools Cloud Storage Fundamentals 36
  • 38. ● AZ Copy ● PowerShell ● Azure cross-platform CLI There are two abstractions support by Azure Blobs, ​Page Blobs​ for disks and ​Block Blobs​ for discrete files. Accessible via REST Interfaces, API client libraries, and a set of powerful command tools, like ​AZ Copy​, ​PowerShell​ and the ​Azure cross-platform CLI​, Azure Blobs gives you a great many options for object management. Instead of bucket object, Azure Blobs uses ​Container blob​ for its terminology. Each Azure Storage account uses a single root container, which can hold Blobs or other containers. https://azure.microsoft.com/en-us/solutions Azure offers two solutions for NoSQL non-relational data, ​Azure Tables​ and Document DB​. Use Azure Tables for key value data structures and Document DB for document data models. Document DB is a database as a service and has a very full-featured SQL compatible environment that is continually evolving. Additionally, Document DB is schema-less, which allows your data structures to efficiently evolve over time as well. As you might expect from the developers of SQL Server, Azure's relational database service called ​SQL Database​, is top of the line with full support for existing SQL Server tools, APIs and libraries. SQL Database is Cloud migration friendly and offers three service tiers for a range of workloads, ​basic​, ​standard ​and ​premium​. SQL Database can handle databases up to Cloud Storage Fundamentals 37
  • 39. 500 gigabytes and provides point in time restore, geo-restore, and geo-replication features. Microsoft Azure currently offers a ​free one-month trial with a $200 credit​, the perfect way to give this highly competitive service a run for your money. 4.4. HP Helion Cloud HP Helion combines a solid set of products for computing and storage applications, including ones to handle both object and database storage. If you're using HP Cloud Compute, you'll want to tie into their block storage module. With persistent images, even beyond the life of the associated compute instance, you can store your data if necessary. Object storage comes under the aegis of HP Cloud Object Storage, naturally. Like most other similar services, Cloud Object Storage utilises a container and object structure. HP Cloud Object Storage Access ● Online console ● Command line interface ● Rest API ● Language bindings ○ Java, PHP, .Net, Node.JS, or Fog (Ruby Cloud Services Library) Access is available via an ​online console​, a ​command line interface​, a complete ​rest API​, or one of the many language bindings, including Java, PHP, .net, node.js, and Ruby fog. A common use for objects is to act as a ​Content Delivery Network​ or ​CDN​. HP Cloud CDN ​optimises ​your cloud object storage to deliver static files with ​minimal latency​, powered by ​Akamai's global network​ ​of edge servers. Charges are calculated monthly on the amount of storage used, the amount of data transferred out of the system, and the number of get, put, post, copy, or list requests made. As of the time of this recording, HP offers a ​no-charge trial​ period with a substantial credit if you'd like to investigate their services further. Cloud Storage Fundamentals 38
  • 40. Sources Google Cloud Platform Products and Services https://cloud.google.com/products Google Cloud Platform Pricing Calculator https://cloud.google.com/products/calculator Google Cloud Platform APIs & Reference https://cloud.google.com/storage/docs/apis Amazon Web Services S3 https://aws.amazon.com/s3 Cloud Storage Nearline https://cloud.google.com/storage-nearline AWS’ Pricing Calculator https://calculator.s3.amazonaws.com/index.html Cloud Storage with AWS https://aws.amazon.com/products/storage Google Cloud SQL https://cloud.google.com/sql Google Schemaless Cloud DataStore https://cloud.google.com/datastore Google Cloud BigTable https://cloud.google.com/bigtable Azure Blob Storage https://azure.microsoft.com/en-us/services/storage/blobs Azure Table Storage https://azure.microsoft.com/en-us/services/storage/tables Rackspace Scalable Cloud Object Storage https://www.rackspace.com/cloud/files Using the AWS SDKs, CLI, and Explorers http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingAWSSDK.html AWS SDK for Python (Boto3) https://aws.amazon.com/sdk-for-python Google Cloud Platform SDK https://cloud.google.com/sdk Google Cloud Storage JSON API Overview https://cloud.google.com/storage/docs/json_api AWS Key Management Service (KMS) https://aws.amazon.com/kms Cloud Storage Fundamentals 39
  • 41. AWS security-logging https://aws.amazon.com/answers/logging Google Cloud Platform Bucket Locations https://cloud.google.com/storage/docs/bucket-locations AWS Global Infrastructure https://aws.amazon.com/about-aws/global-infrastructure Azure Regions https://azure.microsoft.com/en-us/regions Azure DocumentDB https://azure.microsoft.com/en-us/services/documentdb Google Cloud Console https://console.cloud.google.com Google Cloud DataStore https://cloud.google.com/datastore Apache HBase https://hbase.apache.org Cloud Databases with AWS https://aws.amazon.com/products/databases Azure solutions https://azure.microsoft.com/en-us/solutions Create your free Azure account today https://azure.microsoft.com/en-us/free Akamai Cloud Networking https://www.akamai.com/us/en/solutions/products/cloud-networking Akamai Ion for Free https://content.akamai.com/PG5155-Online-Trials-Ion-Standard.html Cloud Storage Fundamentals 40