Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

2

Share

Download to read offline

Enterprise Distributed Query Service powered by Presto & Alluxio across clouds at WalmartLabs

Download to read offline

Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019

Enterprise Distributed Query Service powered by Presto & Alluxio across clouds at WalmartLabs

Speaker:
Ashish Tadose, WalmartLabs

For more Alluxio events: https://www.alluxio.io/events/

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Enterprise Distributed Query Service powered by Presto & Alluxio across clouds at WalmartLabs

  1. 1. 1 Enterprise Distributed Query Service powered by Presto & Alluxio across clouds @ WalmartLabs Ashish Tadose Principal Engineer
  2. 2. 2 Agenda • Data stores @ Walmart Labs • Motivation for Presto as Distributed Query service • Multi-tenant managed Distributed Query service • Alluxio caching to optimize the performance • Architectural components • Alluxio to support query federation in hybrid Footer
  3. 3. 3 Data stores @ Walmart Labs Access needs are varied from team to team – one solution does not fit all….
  4. 4. 4 Motivation for Presto.. • DataLake cluster - powered by on-prem Hadoop/HDFS • Compute storage colocation – GOOD • Need to ingest data from all diverse sources – CHALLENGING • Scaling out compute with growing needs – CHALLENGING • Need to separate storage & compute / support federated query capability – PRESTO.. • Isolated clusters in private cloud powering dedicated data-marts Datajourney
  5. 5. 5 • Simplified query access layer • Leverage cloud elastic compute • Better scalability & Effective cluster utilization by auto-scaling • Performant query response times • Security – Authentication – LDAP – Authorization – work with existing policies • Handle sensitive data – encryption at rest & over the wire • Efficient Monitoring & alerting • Dedicated quotas – SLA guarantees • Flexibility to configure query configuration per tenant Multi-tenant Query service - requirements
  6. 6. 6 • Authentication – Presto LDAP – Custom authentication service • Authorization – Custom Presto ranger plugin – Hadoop impersonation support • Quota – Presto resource groups • Query configuration tuning – Session property managers – Customizations to make it for Unix groups • Query audit – Presto’s event listener framework • Auto-scaling in GCP – GCP instance group auto-scaling – Auto scaling based on CPU load and queued queries Architectural components
  7. 7. 7 Presto & Alluxio Works well together… Small range query response time Lower is better Large scan query response time Lower is better Concurrency Higher is better Presto Presto + Alluxio • Query performance bottlenecks • Un-predictable network IO • Query pattern - Datasets modelled in star schema could benefit by dimension table caching • Presto + Alluxio • Avoids unpredictable network • Consistent query latency • Higher throughput and better concurrency
  8. 8. 8 • Presto + Alluxio collocated cluster • Meta synch components to automatically crate alluxio backed tables and create alluxio mount points • Tweak auto scaling to keep the min number of alluxio workers • Pin frequently used dimension tables to avoid cache evictions Presto + Alluxio – architectural components
  9. 9. 9 • Ability to query datasets couldn’t make it to public clouds • Alluxio greatly improved query performance to avoid network hops recurrent queries • Avoids creating data copies in clouds of datasets – alluxio mounts file meta changes • Enabled query guards in Presto to avoid abuse of this connectors Presto + Alluxio – hybrid cloud
  10. 10. 10 Page 10 unlimited Query service backed by Presto + Alluxio We provide the analyst with a query tool for interactive ad-hoc analysis over different source system through a unified SQL query interface. Put ALL your data to work SQL on Anything Optimized performance Improve data’s time to value Increase Your Optionality Never get deprived of cluster resources Query service hosted in GCP & On- prem is powered by Presto + Alluxio and is offered as a managed distributed service. We also help business in optimizing their SQL queries to make sure they run within expected time. Using the platform’s Federated Query Capabilities, data can be queries and joined from multiple data sources No fancy technology needed to query data. All you need is ANSI SQL Performance boost compared to Hive Queries can be executed across any number of data resources regardless of where the data resides With auto-scaling in place, queries always get enough resources to perform fast Choose BI tool of your own choice
  11. 11. 11 THANKS! 11
  • yongchoelchoi

    Jul. 13, 2020
  • GnanaSekhar4

    May. 19, 2020

Data Orchestration Summit www.alluxio.io/data-orchestration-summit-2019 November 7, 2019 Enterprise Distributed Query Service powered by Presto & Alluxio across clouds at WalmartLabs Speaker: Ashish Tadose, WalmartLabs For more Alluxio events: https://www.alluxio.io/events/

Views

Total views

6,441

On Slideshare

0

From embeds

0

Number of embeds

5,783

Actions

Downloads

29

Shares

0

Comments

0

Likes

2

×