Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Orchestrate a Data Symphony

223 views

Published on

Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019

Orchestrate a Data Symphony

Speaker:
Haoyuan Li, Alluxio

For more Alluxio events: https://www.alluxio.io/events/

Published in: Software
  • Be the first to comment

  • Be the first to like this

Orchestrate a Data Symphony

  1. 1. DATA ORCHESTRATION SUMMIT 2019 Orchestrate a Data Symphony Haoyuan (H.Y.) Li | Founder & CTO | haoyuan@alluxio.com | @haoyuan
  2. 2. The most valuable companies in the Data Era all depend on Data Infrastructure
  3. 3. Data Infrastructure drives these innovations
  4. 4. From Data to Value: Simple & Easy? A Data Driven Application ValueData
  5. 5. The Reality
  6. 6. Building the right data infrastructure is really hard!
  7. 7. Explosionof Compute Frameworks Storage Innova9on Life Cycle Fast Moving Landscape Journey to Hybrid / Multi-Cloud
  8. 8. Endless requirements More opportunities come with more challenges How to modernize my data infra to Cloud? Why can’t we also support Presto for querying? Why can’t I train my model on a public cloud? Job is taking forever, can’t you add more resource? How do I access remote HDFS data in Google Dataproc? …
  9. 9. HDFS HIVE HDFS Spark NFS TENSOR FLOW DATA IN DISPARATE STORAGE SYSTEMS OBJECT STORE PRESTO COMPUTE SPREAD ACROSS MANY DIFFERENT FRAMEWORKS WAN HDFS WAN S3 Spark AZURE PRESTO Data silos across data centers, regions, cloudsComplex. Error prone. Time consuming.
  10. 10. Vision: Orchestrate Data for Applications
  11. 11. A Data Orchestra,on Approach
  12. 12. Data Orchestration Requirements Structured Data Catalog Intelligent Caching Data Transformation Data Management Global Namespace
  13. 13. Structured Data Catalog Intelligent Caching Data Transformation Data Management Global Namespace Abstract data silos to make data elastic for data driven applications Data Orchestra7on Characteris7cs
  14. 14. Structured Data Catalog Global Namespace Data Transformation Data Management Intelligent Caching Enable Data Locality for fast performance regardless where the data is stored Data Orchestra9on Characteris9cs
  15. 15. Structured Data Catalog Intelligent Caching Data Transformation Data Management Seamless data movement across data lakes without app changes Global Namespace Data Orchestra8on Characteris8cs
  16. 16. Structured Data Catalog Intelligent Caching Data Transformation Provide a structured data service to greatly optimize SQL engines Global Namespace Data Management Data Orchestra;on Characteris;cs
  17. 17. Intelligent Caching Data Transformation Transform data to compute-optimized representations for easy consumption Global Namespace Data Management Structured Data Catalog Data Orchestra7on Characteris7cs
  18. 18. A Data Orchestration Approach HDFS HIVE Spark NFS TENSOR FLOW DATA IN DISPARATE STORAGE SYSTEMS PREST O COMPUTE SPREAD ACROSS MANY DIFFERENT FRAMEWORKS S3 SPARK DATA ORCHESTRATION DATA ORCHESTRATION DATA ORCHESTRATION DATA ORCHESTRATION DATA ORCHESTRATION ANY DATA APP DATA ORCHESTRATION
  19. 19. Alluxio – An Open Source Implementation of Data Orchestration Intelligent Caching Data Management Global Namespace
  20. 20. Structured Data Catalog Intelligent Caching Data Transformation Data Management Global Namespace Announcing Alluxio Structured Data Service
  21. 21. Data Orchestration Ecosystem Many open source projects represented here today
  22. 22. Thank You & Welcome to the Data Orchestra5on Summit

×