March, 2015
Cloudian KK. Takenori Sato
Hadoop with CLOUDIAN HyperStore®
tsato@cloudian.com
Introduction
The Hortonworks + CLOUDIAN solution enables the enterprise to run powerful
data analysis using Hortonworks Data Platform (HDP) and YARN directly on
CLOUDIAN’s fully S3 compatible hybrid storage platform. Additionally, enterprise
users can promote large data sets directly into HDFS from CLOUDIAN HyperStore
for running large scale big data analytics.
CLOUDIAN HyperStore is a fully S3 API compliant, multi-tenant, multi-datacenter
software-defined hybrid cloud storage platform.
What is CLOUDIAN HyperStore?
Hortonworks/Hadoop with CLOUDIAN HyperStore
2
Customer Benefits
In-place analytics – no duplicate data in HDFS
Faster time-to-insight
Eliminate data ingest phase from storage silos
Improved efficiency and security
Lower OPEX and CAPEX
Hortonworks certified solution
users can promote large data sets directly into HDFS from CLOUDIAN HyperStore
for running large scale big data analytics.
© Copyright 2010-2015 Cloudian KK. All rights reserved.
How does Hadoop and CLOUDIAN
HyperStore work together?
3 types of Integration
Simple : Ad hoc access via s3x:// to CLOUDIAN HyperStore
3
Backup: Replication to CLOUDIAN HyperStore
Tiering: Archive to CLOUDIAN HyperStore
© Copyright 2010-2015 Cloudian KK. All rights reserved.
Simple - Case Study
Problem:
A customer owns a decent size of Hadoop cluster, and has no problem
about it. But the company decided to allow more people to access the
analytics platform. That means much more data has to become
available, but the customer is not sure how much analysis is actually
made for. If the customer simply imports more data to Hadoop cluster,
they will need more nodes to meet the capacity. But then, they would
have excessive computing resources, and which could become
worse and worse if data demand grows faster than analysis
needs(it often does).
4
worse and worse if data demand grows faster than analysis
needs(it often does).
Solution:
Put maybe-used data to CLOUDIAN HyperStore. This allows the
customer not only to make more data still available to analytics, but
also manage data growth separately from computation growth. When it
comes to cold data management, CLOUDIAN HyperStore can offer
better $/GB through Erasure Coding compared to Replication, and
ease of use through the proven object life cycle management of S3.
Besides, they don’t need to build client tools for new users because
there are many S3 client tools are already available.
© Copyright 2010-2015 Cloudian KK. All rights reserved.
Simple - Solution
5
HDFS
(defaultFS)
hdfs:// s3x://
© Copyright 2010-2015 Cloudian KK. All rights reserved.
Hadoop FileSystem Interface
Back up - Case Study
Problem:
A customer is running a large Hadoop cluster. Since it became critical to
their business, they decided to backup its data. But Hadoop itself
doesn’t provide any way to backup its data for disaster recovery.
One known way is to have another cluster, and keep copying to it. But it
doesn’t sound cost effective.
6
Solution:
Replicate critical data to CLOUDIAN HyperStore so that they can be
restored in case of disaster and un-intentional deletes. S3 interface of
CLOUDIAN HyperStore allows easy access from a remote region in a
secure way. Note that CLOUDIAN HyperStore can work with any version
of Hadoop. So it is possible to backup multiple Hadoop clusters of
different versions.
© Copyright 2010-2015 Cloudian KK. All rights reserved.
Back up - Solution
/source /target
Replicate critical data to
CLOUDIAN HyperStore
7
HDFS
(defaultFS)
hdfs:// s3x://
© Copyright 2010-2015 Cloudian KK. All rights reserved.
s3x://
Hadoop FileSystem Interface
Tiering - Case Study
Problem:
A customer is running a large Hadoop cluster. Their data has grown,
and they need to decide to add more nodes or deploy a framework to
constantly move cold data to a lower TCO storage.
8
Solution:
Setup tiering of cold data to CLOUDIAN HyperStore. CLOUDIAN
HyperStore has a rich object life cycle management of S3, and allows
even further tiering to Amazon.
© Copyright 2010-2015 Cloudian KK. All rights reserved.
Tiering - Solution
/source /target
Archive cold data to
CLOUDIAN HyperStore
based on retention policy
9
HDFS
(defaultFS)
hdfs:// s3x://
© Copyright 2010-2015 Cloudian KK. All rights reserved.
s3x://
Hadoop FileSystem Interface
For more information about CLOUDIAN
Homepage(JP) http://cloudian.jp/
Homepage(EN) http://www.cloudian.com/
Blog (JP): http://www.cloudian-blog.com/
Blog (EN) http://www.cloudian.com/blog/
10© Copyright 2010-2015 Cloudian KK. All rights reserved.
Facebook (JP): https://www.facebook.com/cloudian.cloudstorage.S3
Facebook (EN): https://www.facebook.com/cloudian.cloudstorage
Twitter (JP) https://twitter.com/Cloudian_KK
Twitter (EN) https://twitter.com/CloudianStorage
Email: info@cloudian.com
Questions?
THANK YOU www.cloudian.jp
Cloud Storage for Everyone

Hadoop and CLOUDIAN HyperStore

  • 1.
    March, 2015 Cloudian KK.Takenori Sato Hadoop with CLOUDIAN HyperStore® tsato@cloudian.com
  • 2.
    Introduction The Hortonworks +CLOUDIAN solution enables the enterprise to run powerful data analysis using Hortonworks Data Platform (HDP) and YARN directly on CLOUDIAN’s fully S3 compatible hybrid storage platform. Additionally, enterprise users can promote large data sets directly into HDFS from CLOUDIAN HyperStore for running large scale big data analytics. CLOUDIAN HyperStore is a fully S3 API compliant, multi-tenant, multi-datacenter software-defined hybrid cloud storage platform. What is CLOUDIAN HyperStore? Hortonworks/Hadoop with CLOUDIAN HyperStore 2 Customer Benefits In-place analytics – no duplicate data in HDFS Faster time-to-insight Eliminate data ingest phase from storage silos Improved efficiency and security Lower OPEX and CAPEX Hortonworks certified solution users can promote large data sets directly into HDFS from CLOUDIAN HyperStore for running large scale big data analytics. © Copyright 2010-2015 Cloudian KK. All rights reserved.
  • 3.
    How does Hadoopand CLOUDIAN HyperStore work together? 3 types of Integration Simple : Ad hoc access via s3x:// to CLOUDIAN HyperStore 3 Backup: Replication to CLOUDIAN HyperStore Tiering: Archive to CLOUDIAN HyperStore © Copyright 2010-2015 Cloudian KK. All rights reserved.
  • 4.
    Simple - CaseStudy Problem: A customer owns a decent size of Hadoop cluster, and has no problem about it. But the company decided to allow more people to access the analytics platform. That means much more data has to become available, but the customer is not sure how much analysis is actually made for. If the customer simply imports more data to Hadoop cluster, they will need more nodes to meet the capacity. But then, they would have excessive computing resources, and which could become worse and worse if data demand grows faster than analysis needs(it often does). 4 worse and worse if data demand grows faster than analysis needs(it often does). Solution: Put maybe-used data to CLOUDIAN HyperStore. This allows the customer not only to make more data still available to analytics, but also manage data growth separately from computation growth. When it comes to cold data management, CLOUDIAN HyperStore can offer better $/GB through Erasure Coding compared to Replication, and ease of use through the proven object life cycle management of S3. Besides, they don’t need to build client tools for new users because there are many S3 client tools are already available. © Copyright 2010-2015 Cloudian KK. All rights reserved.
  • 5.
    Simple - Solution 5 HDFS (defaultFS) hdfs://s3x:// © Copyright 2010-2015 Cloudian KK. All rights reserved. Hadoop FileSystem Interface
  • 6.
    Back up -Case Study Problem: A customer is running a large Hadoop cluster. Since it became critical to their business, they decided to backup its data. But Hadoop itself doesn’t provide any way to backup its data for disaster recovery. One known way is to have another cluster, and keep copying to it. But it doesn’t sound cost effective. 6 Solution: Replicate critical data to CLOUDIAN HyperStore so that they can be restored in case of disaster and un-intentional deletes. S3 interface of CLOUDIAN HyperStore allows easy access from a remote region in a secure way. Note that CLOUDIAN HyperStore can work with any version of Hadoop. So it is possible to backup multiple Hadoop clusters of different versions. © Copyright 2010-2015 Cloudian KK. All rights reserved.
  • 7.
    Back up -Solution /source /target Replicate critical data to CLOUDIAN HyperStore 7 HDFS (defaultFS) hdfs:// s3x:// © Copyright 2010-2015 Cloudian KK. All rights reserved. s3x:// Hadoop FileSystem Interface
  • 8.
    Tiering - CaseStudy Problem: A customer is running a large Hadoop cluster. Their data has grown, and they need to decide to add more nodes or deploy a framework to constantly move cold data to a lower TCO storage. 8 Solution: Setup tiering of cold data to CLOUDIAN HyperStore. CLOUDIAN HyperStore has a rich object life cycle management of S3, and allows even further tiering to Amazon. © Copyright 2010-2015 Cloudian KK. All rights reserved.
  • 9.
    Tiering - Solution /source/target Archive cold data to CLOUDIAN HyperStore based on retention policy 9 HDFS (defaultFS) hdfs:// s3x:// © Copyright 2010-2015 Cloudian KK. All rights reserved. s3x:// Hadoop FileSystem Interface
  • 10.
    For more informationabout CLOUDIAN Homepage(JP) http://cloudian.jp/ Homepage(EN) http://www.cloudian.com/ Blog (JP): http://www.cloudian-blog.com/ Blog (EN) http://www.cloudian.com/blog/ 10© Copyright 2010-2015 Cloudian KK. All rights reserved. Facebook (JP): https://www.facebook.com/cloudian.cloudstorage.S3 Facebook (EN): https://www.facebook.com/cloudian.cloudstorage Twitter (JP) https://twitter.com/Cloudian_KK Twitter (EN) https://twitter.com/CloudianStorage Email: info@cloudian.com
  • 11.