Successfully reported this slideshow.
Your SlideShare is downloading. ×

Running Presto with Alluxio on Amazon EMR

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 18 Ad

Running Presto with Alluxio on Amazon EMR

Download to read offline

Alluxio Community Online Office Hours
Feb 12, 2020

For more Alluxio events: https://www.alluxio.io/events/

Speaker:
Alex Ma, Alluxio

Many organizations are leveraging EMR to run big data analytics on public cloud. However, reading and writing data to S3 directly can result in slow and inconsistent performance. Alluxio is a data orchestration layer for the cloud, and in this use case it caches data for S3, ensuring high and predictable performance as well as reduced network traffic.

In this online meetup, you will learn about:

- How to set up Alluxio with the EMR stack so that Presto jobs can seamlessly read from and write to S3
- Compare the performance between Presto on EMR with Presto and Alluxio on EMR
- Open Session for discussion on any topics such as solving the separation of compute and storage problem, and more

Alluxio Community Online Office Hours
Feb 12, 2020

For more Alluxio events: https://www.alluxio.io/events/

Speaker:
Alex Ma, Alluxio

Many organizations are leveraging EMR to run big data analytics on public cloud. However, reading and writing data to S3 directly can result in slow and inconsistent performance. Alluxio is a data orchestration layer for the cloud, and in this use case it caches data for S3, ensuring high and predictable performance as well as reduced network traffic.

In this online meetup, you will learn about:

- How to set up Alluxio with the EMR stack so that Presto jobs can seamlessly read from and write to S3
- Compare the performance between Presto on EMR with Presto and Alluxio on EMR
- Open Session for discussion on any topics such as solving the separation of compute and storage problem, and more

Advertisement
Advertisement

More Related Content

More from Alluxio, Inc. (20)

Recently uploaded (20)

Advertisement

Running Presto with Alluxio on Amazon EMR

  1. 1. Alluxio Open Source Community Office Hour: Running Presto with Alluxio - AWS EMR
  2. 2. Open Source Project started at the UC Berkley’s AMP Lab, with an incredible Open Source Momentum with growing community 1,000+ contributors & growing 4000+ Git Stars Apache 2.0 Licensed Millions of downloads
  3. 3. Intro to EMR ▪ AWS Provided and Managed Hadoop Services ▪ Spark, HDFS, Presto, HCatalog, Hive, HBase, Flink, etc. ▪ Easy to configure and onboard ▪ Does the work for you ▪ Elastic and Flexible ▪ EMRFS - Consistent View and Data Encryption 3
  4. 4. Sample Alluxio Use Cases Hive Alluxio Burst big data workloads in hybrid cloud environments On premise Same instance / container Alluxio On-premise PrestoSpark Alluxio Accelerate big data frameworks on the public cloud Same instance / container Dramatically speed-up big data on object stores on premise Same container / machine or or
  5. 5. Data Locality with Intelligent Multi-tiering Local performance from remote data using multi-tier storage Hot Warm Cold RAM SSD HDD Read & Write Buffering Transparent to App Policies for pinning, promotion/demotion, TTL
  6. 6. Alluxio-EMR Prerequisites and Design Considerations ▪ IAM Account with the default EMR Roles ▪ S3 Bucket to host Bootstrap script and to act as a UFS ▪ Key Pair for EC2 ▪ AWS CLI ▪ Leverage AWS Glue/RDS to persist Hive Metastore State ▪ Bootstrap Scripts 7
  7. 7. EMR Service Integration: Bootstrap Actions ▪ EMR provides hooks into the main configuration files for Hadoop Services: ▪ hive-site.xml, core-site.xml, hadoop-env.sh, hive.properties ▪ Bootstrap Actions
  8. 8. Bootstrap Actions
  9. 9. Alluxio Options
  10. 10. Create table
  11. 11. Use Crawler
  12. 12. Use Transparent URI
  13. 13. Use Transparent URI
  14. 14. Demo…...
  15. 15. 100+ Production Deployments Massive clusters deployed, many with 500+ nodes
  16. 16. Additional Resources ▪ Running Presto with Alluxio https://docs.alluxio.io/os/user/stable/en/compute/Presto.html ▪ Using Transparent URI https://docs.alluxio.io/ee/user/stable/en/operation/Transparent-Uri.html ▪ Top 5 performance tips running Presto with Alluxio https://www.alluxio.io/blog/top-5-performance-tuning-tips-for-running-presto-on-alluxio-1 ▪ Presto: Fast SQL on Anything by Starburst https://www.slideshare.net/Alluxio/presto-fast-sql-on-anything ▪ Getting Started with EMR and Alluxio https://github.com/Alluxio/alluxio/tree/master/integration/emr https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-gs.html
  17. 17. Questions? How are you using EMR? Welcome to join the Alluxio Open Source Community! www.alluxio.io | @alluxio

×