Alluxio Product School
Alluxio and NetApp
Modern Data Architecture Requirements
Resiliency
Data Placement Scale
2
Joseph Kandatilparambil
Solutions Architect, StorageGRID Software
Group, NetApp
Joseph leads the Big Data and Analytics strategy
for StorageGRID and has been with the
StorageGRID team for two years now.
Linkedin:
https://www.linkedin.com/in/jkandatilparambil/
 
Michael Waldrop
Mike leads the Global Solutions Engineering team for Alluxio.
Mike works with enterprises to solve complex data problems
and modernize their Big Data platforms.
https://www.linkedin.com/in/mikewaldrop
Object Use Cases Are Evolving
Cold
Archive
App
Storage
Data
Streaming
Analytics
Active
Archive and
Blob Store
HPC
Cold
Archive
Active
Archive
and
Blob
Store
Data streaming
HPC
App Storage
Analytics
5
• Massive unstructured data growth
continues to drive adoption
• Tipping point for S3 adoption across
workloads and applications
achieved
• Hybrid cloud is the standard
• Migrations from existing / legacy
object installations (not just green
field)
Object Storage Growth is Accelerating
87%
6
Common User Journey
Mount HDFS and object storage
into a common namespace to
facilitate migration to object
store without changing analytics
tools
Take advantage of all the
benefits of Object storage
Burst Compute to cloud to
scale quickly
Leverage multiple cloud
providers to get best of breed
analytics tools without vendor
lock-in
7
What is Data Orchestration ?
8
Modern Data Analytics Architecture
StorageGRID S3 Data Lake
On-Prem
On-Prem
On-Prem
9
• Software-defined object
storage
• Policy-based information
lifecycle management at
scale
• Global Namespace and true
Multi-Tenancy
• Durable, available, and
scale-out
• Cloud integrations: AWS SNS
notification, cloud mirroring,
Metadata search
Why NetApp StorageGRID for your Data Lake?
10
Data Fabric
Seattle
Denver
New York
New Apps
StorageGRID
Up to 16 logical/physical sites
S3.company.com
Public Cloud
S3
Developer
Data Scientist
Use case 1: Migration from HDFS to Object Storage
• Reduces capacity overhead costs
• Decouple compute and Storage
• Performance and scale
• Policy-based data migration to S3
StorageGRID S3 Data
Lake
11
Use case 2: Expose data on-prem to compute in cloud
ON PREMISE
• Enable storage at a fraction of the
cost
• Scale compute on demand
• Data is always protected – dual
layered protection
• Have control over data locality
12
Use case 3: Enable Multi-Cloud workloads
● Minimize data movement
● Have full control over
data placement
● Avoid vendor lock-in
● Adapt to new
requirements
Compute
On-Prem
StorageGRID S3 Data Lake
StorageGRID Data lake storage is designed for high-performance, fault-tolerance, and scale with low
touch operations
13
alluxio.io/slack
www.alluxio.io
twitter.com/alluxio
linkedin.com/alluxio
www.storagegrid.com
twitter.com/NetApp
linkedin.com/NetApp
joseph.kandatilparambil@netapp.com michael.waldrop@alluxio.com
15

Geo-distributed Analytics with NetApp StorageGRID and Alluxio

  • 1.
  • 2.
    Modern Data ArchitectureRequirements Resiliency Data Placement Scale 2
  • 3.
    Joseph Kandatilparambil Solutions Architect,StorageGRID Software Group, NetApp Joseph leads the Big Data and Analytics strategy for StorageGRID and has been with the StorageGRID team for two years now. Linkedin: https://www.linkedin.com/in/jkandatilparambil/  
  • 4.
    Michael Waldrop Mike leadsthe Global Solutions Engineering team for Alluxio. Mike works with enterprises to solve complex data problems and modernize their Big Data platforms. https://www.linkedin.com/in/mikewaldrop
  • 5.
    Object Use CasesAre Evolving Cold Archive App Storage Data Streaming Analytics Active Archive and Blob Store HPC Cold Archive Active Archive and Blob Store Data streaming HPC App Storage Analytics 5
  • 6.
    • Massive unstructureddata growth continues to drive adoption • Tipping point for S3 adoption across workloads and applications achieved • Hybrid cloud is the standard • Migrations from existing / legacy object installations (not just green field) Object Storage Growth is Accelerating 87% 6
  • 7.
    Common User Journey MountHDFS and object storage into a common namespace to facilitate migration to object store without changing analytics tools Take advantage of all the benefits of Object storage Burst Compute to cloud to scale quickly Leverage multiple cloud providers to get best of breed analytics tools without vendor lock-in 7
  • 8.
    What is DataOrchestration ? 8
  • 9.
    Modern Data AnalyticsArchitecture StorageGRID S3 Data Lake On-Prem On-Prem On-Prem 9
  • 10.
    • Software-defined object storage •Policy-based information lifecycle management at scale • Global Namespace and true Multi-Tenancy • Durable, available, and scale-out • Cloud integrations: AWS SNS notification, cloud mirroring, Metadata search Why NetApp StorageGRID for your Data Lake? 10 Data Fabric Seattle Denver New York New Apps StorageGRID Up to 16 logical/physical sites S3.company.com Public Cloud S3 Developer Data Scientist
  • 11.
    Use case 1:Migration from HDFS to Object Storage • Reduces capacity overhead costs • Decouple compute and Storage • Performance and scale • Policy-based data migration to S3 StorageGRID S3 Data Lake 11
  • 12.
    Use case 2:Expose data on-prem to compute in cloud ON PREMISE • Enable storage at a fraction of the cost • Scale compute on demand • Data is always protected – dual layered protection • Have control over data locality 12
  • 13.
    Use case 3:Enable Multi-Cloud workloads ● Minimize data movement ● Have full control over data placement ● Avoid vendor lock-in ● Adapt to new requirements Compute On-Prem StorageGRID S3 Data Lake StorageGRID Data lake storage is designed for high-performance, fault-tolerance, and scale with low touch operations 13
  • 14.
  • 15.