Data Orchestration for AI, Big Data, and Cloud
Haoyuan (HY) Li | Founder, Chairman, CTO | Alluxio
haoyuan@alluxio.com | @haoyuan
2019-06-21 @ O’Reilly AI Beijing
The journey to a fragmented data world
More people & teams need
access to this data
More data
generated every day
New compute & storage
technologies created every
3-8 years
Data Silos are
Inevitable
Single
Data
Lake
Limit
Actively
Managed
Data
Abstract &
Orchestrate
Data
Data silos cross data centers, regions, clouds
HDFS
HIVE
HDFS
Spark
NFS
TENSOR
FLOW
DATA IN DISPARATE STORAGE SYSTEMS
OBJECT
STORE
PRESTO
COMPUTE SPREAD ACROSS MANY DIFFERENT FRAMEWORKS
WAN
HDFS
WAN
S3
Spark
AZURE
PRESTO
Abstract & orchestrate data across data silos
HDFS
HIVE Spark
NFS
TENSOR
FLOW
DATA IN DISPARATE STORAGE SYSTEMS
PRESTO
COMPUTE SPREAD ACROSS MANY DIFFERENT FRAMEWORKS
S3
SPARK
DATA
ORCHESTRATION
DATA
ORCHESTRATION
DATA
ORCHESTRATION
DATA
ORCHESTRATION
DATA
ORCHESTRATION
ANY
DATA
APP
DATA
ORCHESTRATION
Data Orchestration for the cloud
Data Locality,Accessibility & Elasticity for AI & Big Data
§ Accelerate speed to insights with hot data made local to compute faster
§ Burst data elastically with compute anytime in any cloud environment
§ Reduce costs by time-consuming ETL and eliminating multiple persisted copies
Data Orchestration for the AI, Big Data, and Cloud
Java File API HDFS Interface S3 Interface REST APIFUSE Interface
HDFS Driver Swift Driver S3 Driver NFS Driver
Data Orchestration for the AI, Big Data, and Cloud
Data Orchestration for Agility
Data Orchestration for Compute Bursting
Leading Hedge Fund
An Open Source Implementation of Data Orchestration
Started From UC Berkeley AMPLab
1000+ contributors &
growing
4000+ Git Stars
Apache 2.0 Licensed
GitHub’s Top 100 Most
Valuable Repositories
Out of 96 Million
Join the
conversation on
Slack
slackin.alluxio.io
Companies Moving Towards Data Orchestration
(Including 8 of the Top 10 Internet Companies in China)
Read More
Embracing Data Silos – the Data Orchestration Approach
Welcome to join the Alluxio Open Source Community!
www.alluxio.io | @alluxio | slackin.alluxio.io

Data Orchestration for AI, Big Data, and Cloud

  • 1.
    Data Orchestration forAI, Big Data, and Cloud Haoyuan (HY) Li | Founder, Chairman, CTO | Alluxio haoyuan@alluxio.com | @haoyuan 2019-06-21 @ O’Reilly AI Beijing
  • 2.
    The journey toa fragmented data world More people & teams need access to this data More data generated every day New compute & storage technologies created every 3-8 years
  • 3.
  • 4.
  • 5.
    Data silos crossdata centers, regions, clouds HDFS HIVE HDFS Spark NFS TENSOR FLOW DATA IN DISPARATE STORAGE SYSTEMS OBJECT STORE PRESTO COMPUTE SPREAD ACROSS MANY DIFFERENT FRAMEWORKS WAN HDFS WAN S3 Spark AZURE PRESTO
  • 6.
    Abstract & orchestratedata across data silos HDFS HIVE Spark NFS TENSOR FLOW DATA IN DISPARATE STORAGE SYSTEMS PRESTO COMPUTE SPREAD ACROSS MANY DIFFERENT FRAMEWORKS S3 SPARK DATA ORCHESTRATION DATA ORCHESTRATION DATA ORCHESTRATION DATA ORCHESTRATION DATA ORCHESTRATION ANY DATA APP DATA ORCHESTRATION
  • 7.
    Data Orchestration forthe cloud Data Locality,Accessibility & Elasticity for AI & Big Data § Accelerate speed to insights with hot data made local to compute faster § Burst data elastically with compute anytime in any cloud environment § Reduce costs by time-consuming ETL and eliminating multiple persisted copies
  • 8.
    Data Orchestration forthe AI, Big Data, and Cloud
  • 9.
    Java File APIHDFS Interface S3 Interface REST APIFUSE Interface HDFS Driver Swift Driver S3 Driver NFS Driver Data Orchestration for the AI, Big Data, and Cloud
  • 10.
  • 11.
    Data Orchestration forCompute Bursting Leading Hedge Fund
  • 12.
    An Open SourceImplementation of Data Orchestration Started From UC Berkeley AMPLab 1000+ contributors & growing 4000+ Git Stars Apache 2.0 Licensed GitHub’s Top 100 Most Valuable Repositories Out of 96 Million Join the conversation on Slack slackin.alluxio.io
  • 13.
    Companies Moving TowardsData Orchestration (Including 8 of the Top 10 Internet Companies in China) Read More
  • 14.
    Embracing Data Silos– the Data Orchestration Approach Welcome to join the Alluxio Open Source Community! www.alluxio.io | @alluxio | slackin.alluxio.io