SlideShare a Scribd company logo
1 of 24
Download to read offline
Lessons learned processing
70 billion data points a day
© 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---
Shankar Pasupathy Pranoop Erasani
Technical Director Senior Technical Director
Active IQ Data Science ONTAP NFS
DataWorks Summit, San Jose
June 2018
Agenda
o What is Active IQ ?
o 5 Data Management challenges with Hadoop
o Hybrid cloud analytics architecture
o Why NFS for Hadoop and AI ?
o Performance and Scale of shared storage
o NetApp’s In-Place Analytics Module
o Summary
© 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---2
What is Active IQ ?
© 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---3
Active IQ
platform
AutoSupport (ASUP)
• Configuration data
• Performance counters
• System logs
Active IQ: Predictive Analytics for NetApp storage systems
4 © 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---
Use cases Active IQ
Predict disk drive failures
Predict outages, performance problems
Detect misconfigured storage (ARS)
Automate problem diagnosis
Use community wisdom to guide best practices
Guide future product design
The NetApp Active IQ Ecosystem
5 © 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---
Data growth: 2x every 8 months
300,000 Storage
controllers
70 Billion
Data points
processed daily
135 TB
Data processed
per month
3.7 PB Data lake
Large # of Users
6+ Hadoop clusters
5 data management challenges
1. Storage for Hadoop doubling year over year
2. The need to use the cloud in a cost-effective and secure manner
3. Separate storage architectures for AI and Hadoop
4. Multiple sources of data, each with their own access rights
5. The need for data provenance
© 2018 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —6
Traditional on-premises Hadoop architecture
Stream analytics
Users
Hadoop Data Lake
NoSQL/SQL
AI and ML models
Web tier/ App server
IoT Data
Data Lake
© 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---
Challenge 1: Problems caused by storage growth
o Poor utilization of compute
o Disk failures at scale
o Too many copies of the data
o Tiers of storage and QoS
o (HDFS 3.0)
© 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---
Batch processing QA
Realtime cluster
CPU
Disks
CPU
Disks
CPU
Disks
Switch
3x data
copies
POC
8
Our solution: Separation of compute and storage
© 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---9
 Not a new idea
 LADDIS 2009, Usenix 2012, IEEE Big Data 2013, IDC 2018
 Hadoop in the cloud
 Rack space and throughput
 Modern all flash shared storage ~ 25 GB/s in 4U (4PB effective
space)
 Need 350 traditional DAS servers for 25 GB/s aggregate bandwidth
 Network Latency
 40 Gbit/Ethernet in 2018: 1 – 5 microseconds iWARP/RDMA
 Cloud
 Freedom from IT – ease of use
 Remove operations pain (Hadoop as a service)
 Provision compute instantly
 Cost effective ?
 Inhibitors
 Security and fear: “Data is my most valuable asset”
 Regulations – GDPR, HIPAA, …
 Prohibitive cost for storage (60 PB of data ?)
 Cloud lock in and egress costs
Challenge 2: Hadoop on-premises vs the cloud ?
© 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---10
Our solution: Cloud connected storage
© 2018 NetApp, Inc. All rights reserved. NetApp Internal Use
Efficient
Data
Copy
NetApp Storage
Hadoop
On-premise
NetApp Cloud Volumes
Google
Latency: 1-2 ms
Bandwidth: Links x 10 Gbps
Choosing Hadoop in the cloud vs on-premises
12 © 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---
NetApp
Data Fabric
On-premise
24x7 real-time
processing, high
throughput jobs
AWS/Azure/GCP
QA, POCs, AI/ML
Bursty workloads | Choose your
Cloud
Unified Data
Lake
Cloud Connected
Storage
Secure
IoT Data
24x7
Edge
Efficient
 HDFS
 Sequential I/O
 Throughput oriented
 Large files
 AI
 Needs Random I/O
 IOPS oriented
 Shared file system for distributed training
Challenge 3: What is the right storage architecture for AI ?
13 © 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---
 Tiered storage (SSD, SATA)
 Storage QoS for different workloads
 Ability to rapidly ”clone” data for QA
 In built compression
 Triple parity RAID hides disk failures
 For >4TB SATA disks
Our solution: Build a unified, shared Datalake
© 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---14
Active IQ
Unified
Data Lake
NFS
Active IQ analytics architecture using the hybrid cloud and NFS
storage
12x reduction in storage space, 30x improvement in performance, 3x reduction in compute nodes
In-place analytics
module
NFS
On Premises
HDInsight
In-place analytics
module
Databricks/EMR
In-place analytics
module
Cloud
Connected
Storage
Archive
Data Lake
Unified Data Lake
Active IQ
Telemetry
Data
Cluster
NetApp Cloud Volumes
In the CloudEdge
NetApp
Data Fabric
In-place analytics
module
© 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---
Why NFS for Hadoop and AI ?
1. Performance
2. Scale
3. Manageability
© 2018 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only16
NFS Performance: High throughput at Low latency
17 © 2018 NetApp, Inc. All Rights Reserved≈
500µs
latency
25GB/s
throughput
11.4M IOPS
300GB/s throughput
1M
IOPS
24-node
Cluster
NFS Scale: PB-scale data lake with high file count
18 © 2018 NetApp, Inc. All Rights Reserved≈
20PB
size
400B
files
Tested
10
nodes
172PB
size
47T
files
Supported
24
nodes
NFS Manageability: NetApp In-Place Analytics Module
© 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---19
In-Place Analytics Module
HDFS Amazon S3 GlusterFS Azure NFS
Batch
MapReduce
Interactive
TEZ
Online
HBase
In-Memory
Spark
Graph
Giraph
YARN
(Cluster Resource Management)
FileSystem
(Interfaces to interact with storage systems)
(Computation Framework)
 Available as a drop-in JAR
file
 Integrated with Hortonworks Ambari
 NFS Filesystem
Implementation
 Buffered Input and Output stream
 14 of 22 NFSv3 Operations
 Simplified configuration
 Set fs.defaultFS to NFS path (e.g. IP:/path)
 Tunables configured via a JSON file
 Integrated with LDAP directory services
 Roadmap
 Ranger, Kerberos and HCFS
Additional Benefits of NetApp In-Place Analytics Module
© 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---20
1. No changes to Hadoop applications
 Analytics Jobs run seamlessly over NFS
2. No copy sprawl
 Primary data copy is the data lake; Moreover,1x copy vs 3x HDFS copies
3. Leverage Data Management
 Snapshots, Data protection copies and Clones for point-in-time analytics
4. Optimized for streaming throughput
 NFS Multi-pathing, High concurrency, Prefetching, Data and Metadata caching
5. NFS and HDFS could co-exist
 E.g. HDFS as primary and NFS as secondary or vice-versa
5 data management challenges
1. Storage for Hadoop doubling year over year
2. The need to use the cloud in a cost-effective and secure manner
3. Separate storage architectures for AI and Hadoop
4. Multiple sources of data, each with their own access rights
5. The need for data provenance
© 2018 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —21
Summary
1. Disaggregate compute from storage for analytics
2. Unified data lake for ease of management and Lower TCO
3. Hybrid cloud architecture for access to cloud innovation
© 2018 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —22
© 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---23
Thank You
4. NFS TCO: Ease of data lifecycle management at scale
After
• Automatic tiering
• Zero-touch management
• Preserves file system semantics
• Preserves storage efficiencies
• Data encrypted in-flight
• 1 copy vs 3 HDFS copies
On-PremisesFootprint
FabricPool
Inactive
Data
Object Storage
Performance
Tier
CapacityTier
80%
Before
Active Data Inactive Data
24 © 2018 NetApp, Inc. All rights reserved. NETAPP CONFIDENTIAL

More Related Content

What's hot

Cloud Migration: Moving Data and Infrastructure to the Cloud
Cloud Migration: Moving Data and Infrastructure to the CloudCloud Migration: Moving Data and Infrastructure to the Cloud
Cloud Migration: Moving Data and Infrastructure to the CloudSafe Software
 
Release and patching strategy
Release and patching strategyRelease and patching strategy
Release and patching strategyJitendra Singh
 
Oracle運用Tips大放出! ~ RAC環境のRMANのパラレル化を極める 編 ~ @2016-02-23 JPOUG
Oracle運用Tips大放出! ~ RAC環境のRMANのパラレル化を極める 編 ~ @2016-02-23 JPOUG Oracle運用Tips大放出! ~ RAC環境のRMANのパラレル化を極める 編 ~ @2016-02-23 JPOUG
Oracle運用Tips大放出! ~ RAC環境のRMANのパラレル化を極める 編 ~ @2016-02-23 JPOUG Yuya Ohta
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure Antonios Chatzipavlis
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 
Migration to Oracle Multitenant
Migration to Oracle MultitenantMigration to Oracle Multitenant
Migration to Oracle MultitenantJitendra Singh
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
 
Migrating Oracle Databases to AWS
Migrating Oracle Databases to AWSMigrating Oracle Databases to AWS
Migrating Oracle Databases to AWSAWS Germany
 
Data Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best PracticesData Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best PracticesCitiusTech
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?Kai Wähner
 
MAA for Oracle Database, Exadata and the Cloud
MAA for Oracle Database, Exadata and the CloudMAA for Oracle Database, Exadata and the Cloud
MAA for Oracle Database, Exadata and the CloudMarkus Michalewicz
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022Kai Wähner
 
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2021年7月版]
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2021年7月版]【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2021年7月版]
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2021年7月版]オラクルエンジニア通信
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationMatthew W. Bowers
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 

What's hot (20)

Data mesh
Data meshData mesh
Data mesh
 
Cloud Migration: Moving Data and Infrastructure to the Cloud
Cloud Migration: Moving Data and Infrastructure to the CloudCloud Migration: Moving Data and Infrastructure to the Cloud
Cloud Migration: Moving Data and Infrastructure to the Cloud
 
Release and patching strategy
Release and patching strategyRelease and patching strategy
Release and patching strategy
 
Oracle運用Tips大放出! ~ RAC環境のRMANのパラレル化を極める 編 ~ @2016-02-23 JPOUG
Oracle運用Tips大放出! ~ RAC環境のRMANのパラレル化を極める 編 ~ @2016-02-23 JPOUG Oracle運用Tips大放出! ~ RAC環境のRMANのパラレル化を極める 編 ~ @2016-02-23 JPOUG
Oracle運用Tips大放出! ~ RAC環境のRMANのパラレル化を極める 編 ~ @2016-02-23 JPOUG
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Cloud Migration Workshop
Cloud Migration WorkshopCloud Migration Workshop
Cloud Migration Workshop
 
Migration to Oracle Multitenant
Migration to Oracle MultitenantMigration to Oracle Multitenant
Migration to Oracle Multitenant
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Migrating Oracle Databases to AWS
Migrating Oracle Databases to AWSMigrating Oracle Databases to AWS
Migrating Oracle Databases to AWS
 
Data Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best PracticesData Lake - Multitenancy Best Practices
Data Lake - Multitenancy Best Practices
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
MAA for Oracle Database, Exadata and the Cloud
MAA for Oracle Database, Exadata and the CloudMAA for Oracle Database, Exadata and the Cloud
MAA for Oracle Database, Exadata and the Cloud
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2021年7月版]
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2021年7月版]【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2021年7月版]
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2021年7月版]
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 

Similar to Lessons learned processing 70 billion data points a day using the hybrid cloud

HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataLviv Startup Club
 
Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Lviv Startup Club
 
Macroview Netapp Overview
Macroview Netapp OverviewMacroview Netapp Overview
Macroview Netapp OverviewAlex Tsui
 
Webinar: Three Reasons Why NAS is No Good for AI and Machine Learning
Webinar: Three Reasons Why NAS is No Good for AI and Machine LearningWebinar: Three Reasons Why NAS is No Good for AI and Machine Learning
Webinar: Three Reasons Why NAS is No Good for AI and Machine LearningStorage Switzerland
 
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetAppBridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetAppMongoDB
 
Engineering Genomic Big Data Analytics at A Global Scale
Engineering Genomic Big Data Analytics at A Global ScaleEngineering Genomic Big Data Analytics at A Global Scale
Engineering Genomic Big Data Analytics at A Global ScaleAmazon Web Services
 
OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017
OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017
OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017Cloud Native Day Tel Aviv
 
Big Data and HPC
Big Data and HPCBig Data and HPC
Big Data and HPCNetApp
 
NetApp IT Data Center Strategies to Enable Digital Transformation
NetApp IT Data Center Strategies to Enable Digital TransformationNetApp IT Data Center Strategies to Enable Digital Transformation
NetApp IT Data Center Strategies to Enable Digital TransformationNetApp
 
Instantaneous Replication of Build Artifacts with NetApp
Instantaneous Replication of Build Artifacts with NetAppInstantaneous Replication of Build Artifacts with NetApp
Instantaneous Replication of Build Artifacts with NetAppNetApp
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFSUSE Italy
 
NetApp IT Efficiencies Gained with Flash, NetApp ONTAP, OnCommand Insight, Al...
NetApp IT Efficiencies Gained with Flash, NetApp ONTAP, OnCommand Insight, Al...NetApp IT Efficiencies Gained with Flash, NetApp ONTAP, OnCommand Insight, Al...
NetApp IT Efficiencies Gained with Flash, NetApp ONTAP, OnCommand Insight, Al...NetApp
 
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in VacouverNGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in VacouverScott Shadley, MBA,PMC-III
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
#FMS2018 NGD Systems Real World Results with #ComputationalStorage
#FMS2018 NGD Systems Real World Results with #ComputationalStorage#FMS2018 NGD Systems Real World Results with #ComputationalStorage
#FMS2018 NGD Systems Real World Results with #ComputationalStorageScott Shadley, MBA,PMC-III
 
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...Paul Hofmann
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics systemModusOptimum
 
Containers and Kubernetes
Containers and KubernetesContainers and Kubernetes
Containers and KubernetesAltoros
 

Similar to Lessons learned processing 70 billion data points a day using the hybrid cloud (20)

HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big Data
 
Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)
 
Macroview Netapp Overview
Macroview Netapp OverviewMacroview Netapp Overview
Macroview Netapp Overview
 
Webinar: Three Reasons Why NAS is No Good for AI and Machine Learning
Webinar: Three Reasons Why NAS is No Good for AI and Machine LearningWebinar: Three Reasons Why NAS is No Good for AI and Machine Learning
Webinar: Three Reasons Why NAS is No Good for AI and Machine Learning
 
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetAppBridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
Bridging Your Business Across the Enterprise and Cloud with MongoDB and NetApp
 
Engineering Genomic Big Data Analytics at A Global Scale
Engineering Genomic Big Data Analytics at A Global ScaleEngineering Genomic Big Data Analytics at A Global Scale
Engineering Genomic Big Data Analytics at A Global Scale
 
OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017
OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017
OpenStack and NetApp - Chen Reuven - OpenStack Day Israel 2017
 
Big Data and HPC
Big Data and HPCBig Data and HPC
Big Data and HPC
 
NetApp IT Data Center Strategies to Enable Digital Transformation
NetApp IT Data Center Strategies to Enable Digital TransformationNetApp IT Data Center Strategies to Enable Digital Transformation
NetApp IT Data Center Strategies to Enable Digital Transformation
 
Empower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and HadoopEmpower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and Hadoop
 
Ceph's journey at SUSE
Ceph's journey at SUSECeph's journey at SUSE
Ceph's journey at SUSE
 
Instantaneous Replication of Build Artifacts with NetApp
Instantaneous Replication of Build Artifacts with NetAppInstantaneous Replication of Build Artifacts with NetApp
Instantaneous Replication of Build Artifacts with NetApp
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
 
NetApp IT Efficiencies Gained with Flash, NetApp ONTAP, OnCommand Insight, Al...
NetApp IT Efficiencies Gained with Flash, NetApp ONTAP, OnCommand Insight, Al...NetApp IT Efficiencies Gained with Flash, NetApp ONTAP, OnCommand Insight, Al...
NetApp IT Efficiencies Gained with Flash, NetApp ONTAP, OnCommand Insight, Al...
 
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in VacouverNGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
#FMS2018 NGD Systems Real World Results with #ComputationalStorage
#FMS2018 NGD Systems Real World Results with #ComputationalStorage#FMS2018 NGD Systems Real World Results with #ComputationalStorage
#FMS2018 NGD Systems Real World Results with #ComputationalStorage
 
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics system
 
Containers and Kubernetes
Containers and KubernetesContainers and Kubernetes
Containers and Kubernetes
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 

Recently uploaded (20)

Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 

Lessons learned processing 70 billion data points a day using the hybrid cloud

  • 1. Lessons learned processing 70 billion data points a day © 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL --- Shankar Pasupathy Pranoop Erasani Technical Director Senior Technical Director Active IQ Data Science ONTAP NFS DataWorks Summit, San Jose June 2018
  • 2. Agenda o What is Active IQ ? o 5 Data Management challenges with Hadoop o Hybrid cloud analytics architecture o Why NFS for Hadoop and AI ? o Performance and Scale of shared storage o NetApp’s In-Place Analytics Module o Summary © 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---2
  • 3. What is Active IQ ? © 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---3 Active IQ platform AutoSupport (ASUP) • Configuration data • Performance counters • System logs
  • 4. Active IQ: Predictive Analytics for NetApp storage systems 4 © 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL --- Use cases Active IQ Predict disk drive failures Predict outages, performance problems Detect misconfigured storage (ARS) Automate problem diagnosis Use community wisdom to guide best practices Guide future product design
  • 5. The NetApp Active IQ Ecosystem 5 © 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL --- Data growth: 2x every 8 months 300,000 Storage controllers 70 Billion Data points processed daily 135 TB Data processed per month 3.7 PB Data lake Large # of Users 6+ Hadoop clusters
  • 6. 5 data management challenges 1. Storage for Hadoop doubling year over year 2. The need to use the cloud in a cost-effective and secure manner 3. Separate storage architectures for AI and Hadoop 4. Multiple sources of data, each with their own access rights 5. The need for data provenance © 2018 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —6
  • 7. Traditional on-premises Hadoop architecture Stream analytics Users Hadoop Data Lake NoSQL/SQL AI and ML models Web tier/ App server IoT Data Data Lake © 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---
  • 8. Challenge 1: Problems caused by storage growth o Poor utilization of compute o Disk failures at scale o Too many copies of the data o Tiers of storage and QoS o (HDFS 3.0) © 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL --- Batch processing QA Realtime cluster CPU Disks CPU Disks CPU Disks Switch 3x data copies POC 8
  • 9. Our solution: Separation of compute and storage © 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---9  Not a new idea  LADDIS 2009, Usenix 2012, IEEE Big Data 2013, IDC 2018  Hadoop in the cloud  Rack space and throughput  Modern all flash shared storage ~ 25 GB/s in 4U (4PB effective space)  Need 350 traditional DAS servers for 25 GB/s aggregate bandwidth  Network Latency  40 Gbit/Ethernet in 2018: 1 – 5 microseconds iWARP/RDMA
  • 10.  Cloud  Freedom from IT – ease of use  Remove operations pain (Hadoop as a service)  Provision compute instantly  Cost effective ?  Inhibitors  Security and fear: “Data is my most valuable asset”  Regulations – GDPR, HIPAA, …  Prohibitive cost for storage (60 PB of data ?)  Cloud lock in and egress costs Challenge 2: Hadoop on-premises vs the cloud ? © 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---10
  • 11. Our solution: Cloud connected storage © 2018 NetApp, Inc. All rights reserved. NetApp Internal Use Efficient Data Copy NetApp Storage Hadoop On-premise NetApp Cloud Volumes Google Latency: 1-2 ms Bandwidth: Links x 10 Gbps
  • 12. Choosing Hadoop in the cloud vs on-premises 12 © 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL --- NetApp Data Fabric On-premise 24x7 real-time processing, high throughput jobs AWS/Azure/GCP QA, POCs, AI/ML Bursty workloads | Choose your Cloud Unified Data Lake Cloud Connected Storage Secure IoT Data 24x7 Edge Efficient
  • 13.  HDFS  Sequential I/O  Throughput oriented  Large files  AI  Needs Random I/O  IOPS oriented  Shared file system for distributed training Challenge 3: What is the right storage architecture for AI ? 13 © 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---
  • 14.  Tiered storage (SSD, SATA)  Storage QoS for different workloads  Ability to rapidly ”clone” data for QA  In built compression  Triple parity RAID hides disk failures  For >4TB SATA disks Our solution: Build a unified, shared Datalake © 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---14 Active IQ Unified Data Lake NFS
  • 15. Active IQ analytics architecture using the hybrid cloud and NFS storage 12x reduction in storage space, 30x improvement in performance, 3x reduction in compute nodes In-place analytics module NFS On Premises HDInsight In-place analytics module Databricks/EMR In-place analytics module Cloud Connected Storage Archive Data Lake Unified Data Lake Active IQ Telemetry Data Cluster NetApp Cloud Volumes In the CloudEdge NetApp Data Fabric In-place analytics module © 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---
  • 16. Why NFS for Hadoop and AI ? 1. Performance 2. Scale 3. Manageability © 2018 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only16
  • 17. NFS Performance: High throughput at Low latency 17 © 2018 NetApp, Inc. All Rights Reserved≈ 500µs latency 25GB/s throughput 11.4M IOPS 300GB/s throughput 1M IOPS 24-node Cluster
  • 18. NFS Scale: PB-scale data lake with high file count 18 © 2018 NetApp, Inc. All Rights Reserved≈ 20PB size 400B files Tested 10 nodes 172PB size 47T files Supported 24 nodes
  • 19. NFS Manageability: NetApp In-Place Analytics Module © 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---19 In-Place Analytics Module HDFS Amazon S3 GlusterFS Azure NFS Batch MapReduce Interactive TEZ Online HBase In-Memory Spark Graph Giraph YARN (Cluster Resource Management) FileSystem (Interfaces to interact with storage systems) (Computation Framework)  Available as a drop-in JAR file  Integrated with Hortonworks Ambari  NFS Filesystem Implementation  Buffered Input and Output stream  14 of 22 NFSv3 Operations  Simplified configuration  Set fs.defaultFS to NFS path (e.g. IP:/path)  Tunables configured via a JSON file  Integrated with LDAP directory services  Roadmap  Ranger, Kerberos and HCFS
  • 20. Additional Benefits of NetApp In-Place Analytics Module © 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---20 1. No changes to Hadoop applications  Analytics Jobs run seamlessly over NFS 2. No copy sprawl  Primary data copy is the data lake; Moreover,1x copy vs 3x HDFS copies 3. Leverage Data Management  Snapshots, Data protection copies and Clones for point-in-time analytics 4. Optimized for streaming throughput  NFS Multi-pathing, High concurrency, Prefetching, Data and Metadata caching 5. NFS and HDFS could co-exist  E.g. HDFS as primary and NFS as secondary or vice-versa
  • 21. 5 data management challenges 1. Storage for Hadoop doubling year over year 2. The need to use the cloud in a cost-effective and secure manner 3. Separate storage architectures for AI and Hadoop 4. Multiple sources of data, each with their own access rights 5. The need for data provenance © 2018 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —21
  • 22. Summary 1. Disaggregate compute from storage for analytics 2. Unified data lake for ease of management and Lower TCO 3. Hybrid cloud architecture for access to cloud innovation © 2018 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —22
  • 23. © 2018 NetApp, Inc. All rights reserved. --- NETAPP CONFIDENTIAL ---23 Thank You
  • 24. 4. NFS TCO: Ease of data lifecycle management at scale After • Automatic tiering • Zero-touch management • Preserves file system semantics • Preserves storage efficiencies • Data encrypted in-flight • 1 copy vs 3 HDFS copies On-PremisesFootprint FabricPool Inactive Data Object Storage Performance Tier CapacityTier 80% Before Active Data Inactive Data 24 © 2018 NetApp, Inc. All rights reserved. NETAPP CONFIDENTIAL