May, 2015
Cloudian KK. Takenori Sato
Hadoop with Cloudian HyperStore®
tsato@cloudian.com
Introduction
2
Customer Benefits
 In-place analytics – no duplicate data in HDFS
 Eliminate data ingest phase from storage silos
 Improved efficiency and security
 Faster time-to-insight
 Lower OPEX and CAPEX
 Hortonworks certified solution
The Hortonworks + Cloudian solution enables the enterprise to run powerful data
analysis using Hortonworks Data Platform (HDP) and YARN directly on Cloudian’s
fully S3 compatible hybrid storage platform.
Cloudian HyperStore is a fully S3 API compliant, multi-tenant, multi-datacenter
software-defined hybrid cloud storage platform.
What is Cloudian HyperStore?
Hortonworks/Hadoop with Cloudian HyperStore
© Copyright 2010-2015 Cloudian. All rights reserved.
How does Hadoop and HyperStore
work together?
3
3 types of Integration
 Simple : Ad hoc access via s3x:// to Cloudian HyperStore
 Backup: Replication to Cloudian HyperStore via S3 API
 Tiering: Archive to Cloudian HyperStore via S3 API
© Copyright 2010-2015 Cloudian. All rights reserved.
Simple - Case Study
4
Problem:
A customer owns a decent size of Hadoop cluster, and has no problem
about it. But the company decided to allow more people to access the
analytics platform. That means much more data has to become
available, but the customer is not sure how much analysis is actually
made for. If the customer simply imports more data to Hadoop cluster,
they will need more nodes to meet the capacity. But then, they would
have excessive computing resources, and which could become
worse and worse if data demand grows faster than analysis
needs(it often does).
Solution:
Put data to Cloudian HyperStore. This allows not only to make more
data available to analytics, but also manage data growth separately
from computation growth. When it comes to data management,
Cloudian HyperStore can offer better $/GB through Erasure
Coding compared to Replication, and ease of use through the proven
object life cycle management. Besides, they don’t need to build client
tools for new users because there are many S3 client tools are already
available.
© Copyright 2010-2015 Cloudian. All rights reserved.
Simple - Solution
5
HDFS
(defaultFS)
hdfs:// s3x://
Hadoop FileSytem interface
© Copyright 2010-2015 Cloudian. All rights reserved.
Back up - Case Study
6
Problem:
A customer is running a large Hadoop cluster. Since it became critical to
their business, they decided to backup its data. But Hadoop itself
doesn’t provide any way to backup its data for disaster recovery.
Solution:
Replicate the data to Cloudian HyperStore so that they can be
restored in case of disaster and un-intentional deletes. S3 interface of
Cloudian HyperStore allows easy access from a remote region in a
secure way. Note that Cloudian HyperStore can work with any version of
Hadoop. So it is possible to backup multiple Hadoop clusters of different
versions.
© Copyright 2010-2015 Cloudian. All rights reserved.
Back up - Solution
7
HDFS
(defaultFS)
hdfs:// s3x://
Hadoop FileSytem interface
/source /target
Replicate critical data to
CLOUDIAN HyperStore via
S3 API
© Copyright 2010-2015 Cloudian. All rights reserved.
Tiering - Case Study
8
Problem:
A customer is running a large Hadoop cluster. Their data has grown,
and they need to decide to (1) add more nodes or (2) deploy a
framework to constantly move cold data to a lower TCO storage.
Solution:
Setup tiering of cold data to Cloudian HyperStore. Cloudian
HyperStore has a rich object life cycle management and allows even
further tiering to Amazon S3 or Glacier.
© Copyright 2010-2015 Cloudian. All rights reserved.
Tiering - Solution
9
HDFS
(defaultFS)
hdfs:// s3x://
Hadoop FileSytem interface
/source /target
Archive cold data to
Cloudian HyperStore
via S3 API based on
retention policy
© Copyright 2010-2015 Cloudian. All rights reserved.
For more information about CLOUDIAN
10
Homepage(JP) http://cloudian.jp/
Homepage(EN) http://www.cloudian.com/
Blog (JP): http://www.cloudian-blog.com/
Blog (EN) http://www.cloudian.com/blog/
Facebook (JP): https://www.facebook.com/cloudian.cloudstorage.S3
Facebook (EN): https://www.facebook.com/cloudian.cloudstorage
Twitter (JP) https://twitter.com/Cloudian_KK
Twitter (EN) https://twitter.com/CloudianStorage
Email: info@cloudian.com
© Copyright 2010-2015 Cloudian. All rights reserved.
Questions?
THANK YOU www.cloudian.com
Cloud Storage for Everyone

Hadoop and Cloudian HyperStore

  • 1.
    May, 2015 Cloudian KK.Takenori Sato Hadoop with Cloudian HyperStore® tsato@cloudian.com
  • 2.
    Introduction 2 Customer Benefits  In-placeanalytics – no duplicate data in HDFS  Eliminate data ingest phase from storage silos  Improved efficiency and security  Faster time-to-insight  Lower OPEX and CAPEX  Hortonworks certified solution The Hortonworks + Cloudian solution enables the enterprise to run powerful data analysis using Hortonworks Data Platform (HDP) and YARN directly on Cloudian’s fully S3 compatible hybrid storage platform. Cloudian HyperStore is a fully S3 API compliant, multi-tenant, multi-datacenter software-defined hybrid cloud storage platform. What is Cloudian HyperStore? Hortonworks/Hadoop with Cloudian HyperStore © Copyright 2010-2015 Cloudian. All rights reserved.
  • 3.
    How does Hadoopand HyperStore work together? 3 3 types of Integration  Simple : Ad hoc access via s3x:// to Cloudian HyperStore  Backup: Replication to Cloudian HyperStore via S3 API  Tiering: Archive to Cloudian HyperStore via S3 API © Copyright 2010-2015 Cloudian. All rights reserved.
  • 4.
    Simple - CaseStudy 4 Problem: A customer owns a decent size of Hadoop cluster, and has no problem about it. But the company decided to allow more people to access the analytics platform. That means much more data has to become available, but the customer is not sure how much analysis is actually made for. If the customer simply imports more data to Hadoop cluster, they will need more nodes to meet the capacity. But then, they would have excessive computing resources, and which could become worse and worse if data demand grows faster than analysis needs(it often does). Solution: Put data to Cloudian HyperStore. This allows not only to make more data available to analytics, but also manage data growth separately from computation growth. When it comes to data management, Cloudian HyperStore can offer better $/GB through Erasure Coding compared to Replication, and ease of use through the proven object life cycle management. Besides, they don’t need to build client tools for new users because there are many S3 client tools are already available. © Copyright 2010-2015 Cloudian. All rights reserved.
  • 5.
    Simple - Solution 5 HDFS (defaultFS) hdfs://s3x:// Hadoop FileSytem interface © Copyright 2010-2015 Cloudian. All rights reserved.
  • 6.
    Back up -Case Study 6 Problem: A customer is running a large Hadoop cluster. Since it became critical to their business, they decided to backup its data. But Hadoop itself doesn’t provide any way to backup its data for disaster recovery. Solution: Replicate the data to Cloudian HyperStore so that they can be restored in case of disaster and un-intentional deletes. S3 interface of Cloudian HyperStore allows easy access from a remote region in a secure way. Note that Cloudian HyperStore can work with any version of Hadoop. So it is possible to backup multiple Hadoop clusters of different versions. © Copyright 2010-2015 Cloudian. All rights reserved.
  • 7.
    Back up -Solution 7 HDFS (defaultFS) hdfs:// s3x:// Hadoop FileSytem interface /source /target Replicate critical data to CLOUDIAN HyperStore via S3 API © Copyright 2010-2015 Cloudian. All rights reserved.
  • 8.
    Tiering - CaseStudy 8 Problem: A customer is running a large Hadoop cluster. Their data has grown, and they need to decide to (1) add more nodes or (2) deploy a framework to constantly move cold data to a lower TCO storage. Solution: Setup tiering of cold data to Cloudian HyperStore. Cloudian HyperStore has a rich object life cycle management and allows even further tiering to Amazon S3 or Glacier. © Copyright 2010-2015 Cloudian. All rights reserved.
  • 9.
    Tiering - Solution 9 HDFS (defaultFS) hdfs://s3x:// Hadoop FileSytem interface /source /target Archive cold data to Cloudian HyperStore via S3 API based on retention policy © Copyright 2010-2015 Cloudian. All rights reserved.
  • 10.
    For more informationabout CLOUDIAN 10 Homepage(JP) http://cloudian.jp/ Homepage(EN) http://www.cloudian.com/ Blog (JP): http://www.cloudian-blog.com/ Blog (EN) http://www.cloudian.com/blog/ Facebook (JP): https://www.facebook.com/cloudian.cloudstorage.S3 Facebook (EN): https://www.facebook.com/cloudian.cloudstorage Twitter (JP) https://twitter.com/Cloudian_KK Twitter (EN) https://twitter.com/CloudianStorage Email: info@cloudian.com © Copyright 2010-2015 Cloudian. All rights reserved.
  • 11.