SlideShare a Scribd company logo
Dipti Borkar | Head of Product, Alluxio
Shailesh Manjrekar | Head of AI/ML Product and Solutions, SwiftStack
NextGen Data Analytics Stack – Alluxio and SwiftStack
Edge to Core to Cloud
Unstoppable Data Growth – Edge to core to cloud
Emphasis on capturing value and show return on investment (ROI) from data
*, IDC Worldwide Storage in Big Data Forecast, 2015-2019, October 2015 and IDC Directions
Value Capture is key
30B
IoT Connected
Devices*
By 2020,
1010101010101
1010101010101
1010101010101
1010101010101
1010101010101
1010101010101
1010101010101
1010101010101
1010101010101
1010101010101
101010101010101
100-250EB
Big Data Storage
Capacity*
10101010101010
1010101010101010
101010101
44ZB
data created*
Status quo - Existing solutions
Business leaders and storage architects struggling to show return on investment
SILO 1
Compute
+
Storage
SILO 3
Compute
+
Storage
SILO 5
Compute
+
Storage
SILO 2 SILO 4
DAS silos
Poor
Utilization
DIY
Fatigue
High Capex
High OpEx
INEFFICIENT AND EXPENSIVE
Data
Gravity
4 big trends driving the need for a new data analytics
stack
Separation of
Compute &
Storage
Hybrid – Multi
cloud environments
Self-service data
across the
enterprise
Rise
of the object
store
Customer Challenges with existing solutions
Lack of Enterprise ready products and continued pressure of every increasing cloud OpEx
Ever increasing operating expenditures on existing (a) poorly utilized DAS solutions or (b)
cloud storage deployments costs at scale
Need for high throughput stack with API compatibility to support batch, interactive and
advanced analytical workloads
Lack of enterprise ready and multi-cloud data lake systems – at scale deployments with
lifecycle management, self healing, geo-replication and faster re-builds
CHALLENGE 1
CHALLENGE 2
CHALLENGE 3
Data Ecosystem - Beta Data Ecosystem 1.0
COMPUTE
STORAGE STORAGE
COMPUTE
Co-located
Big data journey and innovation options for enterprises
Co-located
compute & HDFS
on the same cluster
Disaggregated
compute & HDFS
on the same cluster
MR / Hive
HDFS
Hive
HDFS
Disaggregated
Burst HDFS data in the
cloud,
public or private
Support Presto, Spark
and other computes
without app changes
Enable & accelerate big
data on
object stores
Transition to Object store
HDFS for Hybrid Cloud
Support more frameworks
§ Typically compute-bound
clusters over 100% capacity
§ Compute & I/O need to be
scaled together even when
not needed
§ Compute & I/O can be
scaled independently but I/O
still needed on HDFS which
is expensive
1 2
3
4
5
Multi-cloud storage and
Data Management
Java File API HDFS Interface S3 Interface REST APIFUSE Interface
HDFS Driver Swift Driver S3 Driver
The SwiftStack Data Analytics Solution with Alluxio
Accelerated Compute, Data accessibility and Elasticity
SwiftStack Data Analytics Solution – business use cases
Customer, Security and Fraud Analysis
Precision Medicine and Bio-Informatics
Customer Churn/ Sentiment Analysis
Analytics As a Service, Operational
Analytics
Internet of Things / Everything
Financial Services, FSI
Healthcare and Life Sciences,
Genomics
Cloud Service Providers
Oil and Gas, Industrial Internet and
Manufacturing
Media and Entertainment
SwiftStack Data Analytics solution – Value to be Captured by Enterprises
Data and analytics as a source of competitive advantage
Source: IDC Directions
“Organizations that analyze all relevant data and deliver actionable information will achieve extra $430B in productivity
gains over less analytically oriented peers by 2020”
IDC: Worldwide Bid Data and Analytics 2016 Predictions
Value can be created in the following ways with some industry relevance:
üImprove operational efficiency
üReduce cost
üNew product development
üInsights into new services
üBetter Customer Experience
Alluxio and SwiftStack partnership
Originated as Tachyon project, at the UC
Berkley’s AMP Lab by then Ph.D. student & now
Alluxio CTO, Haoyuan (H.Y.) Li.2014
2015
Open Source project established & company to
commercialize Alluxio founded
Goal: Orchestrate Data at Memory Speed
for the Cloud for data driven apps such as Big
Data Analytics, ML and AI.
2018 20192018
Data Elasticity
with a unified
namespace
Abstract data silos & storage
systems to independently scale
data on-demand with compute
Run Spark, Hive, Presto, ML
workloads on your data
located anywhere
Accelerate big data
workloads with transparent
tiered local data
Data Accessibility
for popular APIs &
API translation
Data Locality
with Intelligent
Multi-tiering
Alluxio – Key innovations
Alluxio
MasterZookeeper /
RAFT
Standby
Master
WAN
Alluxio
Client
Alluxio
Client
Alluxio
Worker
RAM / SSD / HDD
Alluxio
Worker
RAM / SSD / HDD
Alluxio Reference Architecture
…
…
Application
Application
Under Store 1
Under Store 2
“Infrastructure challenges are the primary inhibitor for broader adoption of AI/ML workflows. SwiftStack’s
Multi-Cloud data management solution is first of it’s kind in the industry and effectively handles storage I/O
challenges faced by edge to core to cloud, large scale AI/ML data pipelines”
Amita Potnis,
Research Director at IDC’s Infrastructure
System Platform and Technologies Group
Property of SwiftStack Inc. 15
Multi-Cloud Storage and Data Management
Storage and multi-cloud data management for
data-driven applications and workflows
SwiftStack
SwiftStack Storage SwiftStack 1space
On-premises cloud storage
Highest throughput performance
Easy to deploy, operate, and scale
From tens of terabytes to hundreds of petabytes
Spans multiple geographic regions
Proven platform to realize more value from data!
Multi-cloud data management
Transparent access to a single storage namespace
Public and private infrastructure
Policy-driven data placement
Metadata search across the namespace
Leverage unique services across clouds!
Property of SwiftStack Inc. 16
SwiftStack Object Storage Architecture
Continuous
Auditing
Automatic
Replication
Fault
Tolerant
• Automated storage system
management for standard
servers
Replicas Erasure Codes
Direct-Attached Storage
• Masterless, quorum writes
• Nearest reads
• As-dispersed-as-possible data
placement across
nodes/zones/regions
• Distributed partitions in a
modified consistent hash ring
Replication
Reconstruction
Auditing
Device Inventory Management
Storage System Metrics Collection
Hardware Fault Detection
Standard Servers, Drives & Networking
Site 1 Site 2 Site 3
SwiftStack
Storage
SwiftStack Differentiation
18
Data Analytics Hub – Total Cost Of Ownership (TCO) Analysis
The 5-year TCO for the Hosted Private Cloud solution is 1/4th of the one hosted on a public cloud
Cloud native
applications with
SwiftStack Data
Analytics solution
© 2016 Western Digital Corporation All rights reserved
3 key use cases - SwiftStack Data Analytics Solution
Cloud bursting
with SwiftStack
Data Analytics
solution; compute
in any public
cloud
HDFS off-load to
SwiftStack Data
Analytics Solution
1. SwiftStack Data Analytics solution – On-premise deployment
• Customers starting their on-premise analytic
journeys
• Benefits
• Same performance as HDFS
• No more HDFS: operational simplicity
• Compute can be fully virtualized /
containerized!
• Durability++ (Erasure Coding)
• Scale (billions of objects / racks / Geo)
Alluxio
Presto
Alluxio
Presto
Dramatically speed-up big data
on object stores on premise
Same container
/ machine
Alluxio
Presto
Alluxio
Spark / Presto / Hive
2. Cloud bursting with SwiftStack Data Analytics solution
•Hybrid workflow
• customers hosting data on-premise and
leveraging public cloud for economies
of scale
• Alluxio for data locality
•Benefits
• Data as strategic asset stays on-premise
• Leverage cloud economies of scale for
compute
Hadoop cluster node
Alluxio
“alluxio//”
Compute:
Spark, Presto, Hive,
…
WAN
Private Cloud
Public Cloud
3. HDFS off-load to SwiftStack Data Analytics solution
•HDFS off-load
existing HDFS customers on DAS
looking to move to S3 - needs
migration
leverage DistCp - Distributed copy as
data mover and then the same
workflow
Using Alluxio
•Benefits
• Known and well understood process by
administrators: existing HDFS
workflows + rsync-like backup workflow
Co-located environment
ImpalaHive Spark
Same data
center / region
Presto
Enable big data on object stores
across single or multiple clouds
Spark
Alluxio Alluxio
Enterprises moving towards independent compute & storage
Customers
25
§ Come, talk to us about analytics on SwiftStack
§ Data analytics Solution – Alluxio and SwiftStack deliver a winning
combination of performance and capacity – “deliver on the promise of a
future ready data lake”
§ Multiple use cases across industry verticals show success of highly
scalable and lowest TCO big data solution
Get started with the Data Analytics Solution
SwiftStack Confidential 26
Questions?

More Related Content

What's hot

Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0
DataWorks Summit
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
Mark Kerzner
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
EMC
 
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address RequirementsGov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
DataWorks Summit
 
Capgemini Insights and Data
Capgemini Insights and Data Capgemini Insights and Data
Capgemini Insights and Data
DataWorks Summit/Hadoop Summit
 
Partner Keynote: How Logical Data Fabric Knits Together Data Visualization wi...
Partner Keynote: How Logical Data Fabric Knits Together Data Visualization wi...Partner Keynote: How Logical Data Fabric Knits Together Data Visualization wi...
Partner Keynote: How Logical Data Fabric Knits Together Data Visualization wi...
Denodo
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
DataStax
 
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Hortonworks
 
Beyond Big Data: Data Science and AI
Beyond Big Data: Data Science and AIBeyond Big Data: Data Science and AI
Beyond Big Data: Data Science and AI
DataWorks Summit
 
Put Alternative Data to Use in Capital Markets

Put Alternative Data to Use in Capital Markets
Put Alternative Data to Use in Capital Markets

Put Alternative Data to Use in Capital Markets

Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
Optimizing your Hadoop Infastructure: An Industry Panel Presentation
Optimizing your Hadoop Infastructure: An Industry Panel PresentationOptimizing your Hadoop Infastructure: An Industry Panel Presentation
Optimizing your Hadoop Infastructure: An Industry Panel Presentation
DataWorks Summit
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
Cloudera, Inc.
 
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
DataWorks Summit/Hadoop Summit
 
Stream based Data Integration
Stream based Data IntegrationStream based Data Integration
Stream based Data Integration
Jeffrey T. Pollock
 
Cloudera for Internet of Things
Cloudera for Internet of ThingsCloudera for Internet of Things
Cloudera for Internet of Things
Cloudera, Inc.
 
Denodo Design Studio: Modeling and Creation of Data Services
Denodo Design Studio: Modeling and Creation of Data ServicesDenodo Design Studio: Modeling and Creation of Data Services
Denodo Design Studio: Modeling and Creation of Data Services
Denodo
 
Sprint's Data Modernization Journey
Sprint's Data Modernization JourneySprint's Data Modernization Journey
Sprint's Data Modernization Journey
Hortonworks
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 

What's hot (20)

Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0Open Source Data Management for Industry 4.0
Open Source Data Management for Industry 4.0
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
 
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address RequirementsGov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
Gov & Private Sector Regulatory Compliance: Using Hadoop to Address Requirements
 
Capgemini Insights and Data
Capgemini Insights and Data Capgemini Insights and Data
Capgemini Insights and Data
 
Partner Keynote: How Logical Data Fabric Knits Together Data Visualization wi...
Partner Keynote: How Logical Data Fabric Knits Together Data Visualization wi...Partner Keynote: How Logical Data Fabric Knits Together Data Visualization wi...
Partner Keynote: How Logical Data Fabric Knits Together Data Visualization wi...
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017 Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
Enterprise Data Science at Scale Meetup - IBM and Hortonworks - Oct 2017
 
Beyond Big Data: Data Science and AI
Beyond Big Data: Data Science and AIBeyond Big Data: Data Science and AI
Beyond Big Data: Data Science and AI
 
Put Alternative Data to Use in Capital Markets

Put Alternative Data to Use in Capital Markets
Put Alternative Data to Use in Capital Markets

Put Alternative Data to Use in Capital Markets

 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
Optimizing your Hadoop Infastructure: An Industry Panel Presentation
Optimizing your Hadoop Infastructure: An Industry Panel PresentationOptimizing your Hadoop Infastructure: An Industry Panel Presentation
Optimizing your Hadoop Infastructure: An Industry Panel Presentation
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
 
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
ING's Customer-Centric Data Journey from Community Idea to Private Cloud Depl...
 
Stream based Data Integration
Stream based Data IntegrationStream based Data Integration
Stream based Data Integration
 
Cloudera for Internet of Things
Cloudera for Internet of ThingsCloudera for Internet of Things
Cloudera for Internet of Things
 
Denodo Design Studio: Modeling and Creation of Data Services
Denodo Design Studio: Modeling and Creation of Data ServicesDenodo Design Studio: Modeling and Creation of Data Services
Denodo Design Studio: Modeling and Creation of Data Services
 
Sprint's Data Modernization Journey
Sprint's Data Modernization JourneySprint's Data Modernization Journey
Sprint's Data Modernization Journey
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 

Similar to Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with Disaggregated Compute and Storage

Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAchieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud World
Alluxio, Inc.
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight
DataWorks Summit/Hadoop Summit
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
Cloudera, Inc.
 
ds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suiteds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suite
Robin Fong 方俊强
 
Open Source DWBI-A Primer
Open Source DWBI-A PrimerOpen Source DWBI-A Primer
Open Source DWBI-A Primer
partha69
 
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreMeetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Alluxio, Inc.
 
Unify Data at Memory Speed
Unify Data at Memory SpeedUnify Data at Memory Speed
Unify Data at Memory Speed
Alluxio, Inc.
 
Alluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle MeetupAlluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle Meetup
Alluxio, Inc.
 
Customer migration to Azure SQL database, December 2019
Customer migration to Azure SQL database, December 2019Customer migration to Azure SQL database, December 2019
Customer migration to Azure SQL database, December 2019
George Walters
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
Denodo
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio, Inc.
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Dataconomy Media
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
Denodo
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
Denodo
 
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
Denodo
 
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at ScaleInfrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Robb Boyd
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
Denodo
 
Achieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloadsAchieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloads
Alluxio, Inc.
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
Cloudera, Inc.
 

Similar to Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with Disaggregated Compute and Storage (20)

Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAchieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud World
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
ds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suiteds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suite
 
Open Source DWBI-A Primer
Open Source DWBI-A PrimerOpen Source DWBI-A Primer
Open Source DWBI-A Primer
 
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreMeetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
 
Unify Data at Memory Speed
Unify Data at Memory SpeedUnify Data at Memory Speed
Unify Data at Memory Speed
 
Alluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle MeetupAlluxio @ Uber Seattle Meetup
Alluxio @ Uber Seattle Meetup
 
Customer migration to Azure SQL database, December 2019
Customer migration to Azure SQL database, December 2019Customer migration to Azure SQL database, December 2019
Customer migration to Azure SQL database, December 2019
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
 
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at ScaleInfrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
Achieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloadsAchieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloads
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
 

More from Alluxio, Inc.

AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
Alluxio, Inc.
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
Alluxio, Inc.
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
Alluxio, Inc.
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio, Inc.
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio, Inc.
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
Alluxio, Inc.
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
Alluxio, Inc.
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
Alluxio, Inc.
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Alluxio, Inc.
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio, Inc.
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio, Inc.
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Alluxio, Inc.
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Alluxio, Inc.
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Alluxio, Inc.
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
Alluxio, Inc.
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
Alluxio, Inc.
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio, Inc.
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
Alluxio, Inc.
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
Alluxio, Inc.
 

More from Alluxio, Inc. (20)

AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 

Recently uploaded

Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
Rakesh Kumar R
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
316895207-SAP-Oil-and-Gas-Downstream-Training.pptx
316895207-SAP-Oil-and-Gas-Downstream-Training.pptx316895207-SAP-Oil-and-Gas-Downstream-Training.pptx
316895207-SAP-Oil-and-Gas-Downstream-Training.pptx
ssuserad3af4
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
YousufSait3
 
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
mz5nrf0n
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
Requirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional SafetyRequirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional Safety
Ayan Halder
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfTop Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
VALiNTRY360
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
Marcin Chrost
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 

Recently uploaded (20)

Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
316895207-SAP-Oil-and-Gas-Downstream-Training.pptx
316895207-SAP-Oil-and-Gas-Downstream-Training.pptx316895207-SAP-Oil-and-Gas-Downstream-Training.pptx
316895207-SAP-Oil-and-Gas-Downstream-Training.pptx
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
 
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
Requirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional SafetyRequirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional Safety
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfTop Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 

Accelerate and Scale Big Data Analytics and Machine Learning Pipelines with Disaggregated Compute and Storage

  • 1. Dipti Borkar | Head of Product, Alluxio Shailesh Manjrekar | Head of AI/ML Product and Solutions, SwiftStack NextGen Data Analytics Stack – Alluxio and SwiftStack Edge to Core to Cloud
  • 2. Unstoppable Data Growth – Edge to core to cloud Emphasis on capturing value and show return on investment (ROI) from data *, IDC Worldwide Storage in Big Data Forecast, 2015-2019, October 2015 and IDC Directions Value Capture is key 30B IoT Connected Devices* By 2020, 1010101010101 1010101010101 1010101010101 1010101010101 1010101010101 1010101010101 1010101010101 1010101010101 1010101010101 1010101010101 101010101010101 100-250EB Big Data Storage Capacity* 10101010101010 1010101010101010 101010101 44ZB data created*
  • 3. Status quo - Existing solutions Business leaders and storage architects struggling to show return on investment SILO 1 Compute + Storage SILO 3 Compute + Storage SILO 5 Compute + Storage SILO 2 SILO 4 DAS silos Poor Utilization DIY Fatigue High Capex High OpEx INEFFICIENT AND EXPENSIVE Data Gravity
  • 4. 4 big trends driving the need for a new data analytics stack Separation of Compute & Storage Hybrid – Multi cloud environments Self-service data across the enterprise Rise of the object store
  • 5. Customer Challenges with existing solutions Lack of Enterprise ready products and continued pressure of every increasing cloud OpEx Ever increasing operating expenditures on existing (a) poorly utilized DAS solutions or (b) cloud storage deployments costs at scale Need for high throughput stack with API compatibility to support batch, interactive and advanced analytical workloads Lack of enterprise ready and multi-cloud data lake systems – at scale deployments with lifecycle management, self healing, geo-replication and faster re-builds CHALLENGE 1 CHALLENGE 2 CHALLENGE 3
  • 6. Data Ecosystem - Beta Data Ecosystem 1.0 COMPUTE STORAGE STORAGE COMPUTE
  • 7. Co-located Big data journey and innovation options for enterprises Co-located compute & HDFS on the same cluster Disaggregated compute & HDFS on the same cluster MR / Hive HDFS Hive HDFS Disaggregated Burst HDFS data in the cloud, public or private Support Presto, Spark and other computes without app changes Enable & accelerate big data on object stores Transition to Object store HDFS for Hybrid Cloud Support more frameworks § Typically compute-bound clusters over 100% capacity § Compute & I/O need to be scaled together even when not needed § Compute & I/O can be scaled independently but I/O still needed on HDFS which is expensive 1 2 3 4 5
  • 8. Multi-cloud storage and Data Management Java File API HDFS Interface S3 Interface REST APIFUSE Interface HDFS Driver Swift Driver S3 Driver The SwiftStack Data Analytics Solution with Alluxio Accelerated Compute, Data accessibility and Elasticity
  • 9. SwiftStack Data Analytics Solution – business use cases Customer, Security and Fraud Analysis Precision Medicine and Bio-Informatics Customer Churn/ Sentiment Analysis Analytics As a Service, Operational Analytics Internet of Things / Everything Financial Services, FSI Healthcare and Life Sciences, Genomics Cloud Service Providers Oil and Gas, Industrial Internet and Manufacturing Media and Entertainment
  • 10. SwiftStack Data Analytics solution – Value to be Captured by Enterprises Data and analytics as a source of competitive advantage Source: IDC Directions “Organizations that analyze all relevant data and deliver actionable information will achieve extra $430B in productivity gains over less analytically oriented peers by 2020” IDC: Worldwide Bid Data and Analytics 2016 Predictions Value can be created in the following ways with some industry relevance: üImprove operational efficiency üReduce cost üNew product development üInsights into new services üBetter Customer Experience
  • 11. Alluxio and SwiftStack partnership Originated as Tachyon project, at the UC Berkley’s AMP Lab by then Ph.D. student & now Alluxio CTO, Haoyuan (H.Y.) Li.2014 2015 Open Source project established & company to commercialize Alluxio founded Goal: Orchestrate Data at Memory Speed for the Cloud for data driven apps such as Big Data Analytics, ML and AI. 2018 20192018
  • 12. Data Elasticity with a unified namespace Abstract data silos & storage systems to independently scale data on-demand with compute Run Spark, Hive, Presto, ML workloads on your data located anywhere Accelerate big data workloads with transparent tiered local data Data Accessibility for popular APIs & API translation Data Locality with Intelligent Multi-tiering Alluxio – Key innovations
  • 13. Alluxio MasterZookeeper / RAFT Standby Master WAN Alluxio Client Alluxio Client Alluxio Worker RAM / SSD / HDD Alluxio Worker RAM / SSD / HDD Alluxio Reference Architecture … … Application Application Under Store 1 Under Store 2
  • 14. “Infrastructure challenges are the primary inhibitor for broader adoption of AI/ML workflows. SwiftStack’s Multi-Cloud data management solution is first of it’s kind in the industry and effectively handles storage I/O challenges faced by edge to core to cloud, large scale AI/ML data pipelines” Amita Potnis, Research Director at IDC’s Infrastructure System Platform and Technologies Group
  • 15. Property of SwiftStack Inc. 15 Multi-Cloud Storage and Data Management Storage and multi-cloud data management for data-driven applications and workflows SwiftStack SwiftStack Storage SwiftStack 1space On-premises cloud storage Highest throughput performance Easy to deploy, operate, and scale From tens of terabytes to hundreds of petabytes Spans multiple geographic regions Proven platform to realize more value from data! Multi-cloud data management Transparent access to a single storage namespace Public and private infrastructure Policy-driven data placement Metadata search across the namespace Leverage unique services across clouds!
  • 16. Property of SwiftStack Inc. 16 SwiftStack Object Storage Architecture Continuous Auditing Automatic Replication Fault Tolerant • Automated storage system management for standard servers Replicas Erasure Codes Direct-Attached Storage • Masterless, quorum writes • Nearest reads • As-dispersed-as-possible data placement across nodes/zones/regions • Distributed partitions in a modified consistent hash ring Replication Reconstruction Auditing Device Inventory Management Storage System Metrics Collection Hardware Fault Detection Standard Servers, Drives & Networking Site 1 Site 2 Site 3 SwiftStack Storage
  • 18. 18 Data Analytics Hub – Total Cost Of Ownership (TCO) Analysis The 5-year TCO for the Hosted Private Cloud solution is 1/4th of the one hosted on a public cloud
  • 19. Cloud native applications with SwiftStack Data Analytics solution © 2016 Western Digital Corporation All rights reserved 3 key use cases - SwiftStack Data Analytics Solution Cloud bursting with SwiftStack Data Analytics solution; compute in any public cloud HDFS off-load to SwiftStack Data Analytics Solution
  • 20. 1. SwiftStack Data Analytics solution – On-premise deployment • Customers starting their on-premise analytic journeys • Benefits • Same performance as HDFS • No more HDFS: operational simplicity • Compute can be fully virtualized / containerized! • Durability++ (Erasure Coding) • Scale (billions of objects / racks / Geo) Alluxio Presto Alluxio Presto Dramatically speed-up big data on object stores on premise Same container / machine Alluxio Presto Alluxio Spark / Presto / Hive
  • 21. 2. Cloud bursting with SwiftStack Data Analytics solution •Hybrid workflow • customers hosting data on-premise and leveraging public cloud for economies of scale • Alluxio for data locality •Benefits • Data as strategic asset stays on-premise • Leverage cloud economies of scale for compute Hadoop cluster node Alluxio “alluxio//” Compute: Spark, Presto, Hive, … WAN Private Cloud Public Cloud
  • 22. 3. HDFS off-load to SwiftStack Data Analytics solution •HDFS off-load existing HDFS customers on DAS looking to move to S3 - needs migration leverage DistCp - Distributed copy as data mover and then the same workflow Using Alluxio •Benefits • Known and well understood process by administrators: existing HDFS workflows + rsync-like backup workflow Co-located environment ImpalaHive Spark Same data center / region Presto Enable big data on object stores across single or multiple clouds Spark Alluxio Alluxio
  • 23. Enterprises moving towards independent compute & storage
  • 25. 25 § Come, talk to us about analytics on SwiftStack § Data analytics Solution – Alluxio and SwiftStack deliver a winning combination of performance and capacity – “deliver on the promise of a future ready data lake” § Multiple use cases across industry verticals show success of highly scalable and lowest TCO big data solution Get started with the Data Analytics Solution