Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Elephant in the Clouds

1,865 views

Published on

The Elephant in the Clouds - Sanjay Radia

Published in: Technology
  • Be the first to comment

The Elephant in the Clouds

  1. 1. The Elephant in the Clouds Sanjay Radia Chief Architect, Founder Hortonworks
  2. 2. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Hadoop in the Cloud? Unlimited Elastic Scale Ephemeral & Long-Running IT & Business Agility No Upfront HW Costs
  3. 3. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Today’s Hadoop Cloud Solutions The Forrester WaveTM Big Data Hadoop Cloud Solutions Q2 2016 Get it at //aka.ms/forresterwave Rackspace Oracle AltiscaleQubole Google IBM Amazon Web Services Microsoft Leaders Strong PerformersContendersChallengers StrongWeak Strategy Weak Strong Current Offering Market Presence
  4. 4. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key Architectural Considerations for Hadoop in the Cloud Shared Data & Storage On-Demand Ephemeral Workloads 10101 10101010101 01010101010101 0101010101010101010 Elastic Resource Management Shared Metadata, Security & Governance
  5. 5. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Prescriptive On-Demand Ephemeral Workloads On-Demand Ephemeral Workloads Data Science R/W TablesCompute Fabric ETL R/W TablesCompute Fabric Warehouse R/W TablesCompute Fabric Search R/W TablesCompute Fabric
  6. 6. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Shared Data and Storage Understand and Leverage Unique Cloud Properties  Shared data lake is cloud storage accessible by all apps  Cloud storage segregated from compute  Built-in geo-distribution and DR Focus Areas  Address cloud storage consistency and performance  Enhance performance via memory and local storage Shared Data & Storage 10101 10101010101 01010101010101 0101010101010101010
  7. 7. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Enhance Performance via Caching Tabular Data: LLAP Read + Write-thru Cache  Cache only the needed columns  Shared across jobs / apps and across engines  Spills to SSD when memory is full (anti-caching)  Read & Write-through cache  Security: Column-level and row-level HDFS Caching for Non-tabular Data  Cache data from cloud storage as needed  Write-through cache Workloads Cloud Storage LLAP R/W TablesHDFS Files Cache
  8. 8. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Shared Data Requires Shared Metadata, Security, and Governance Shared Metadata Across All Workloads  Metadata considerations – Tabular data metastore – Lineage and provenance metadata – Pipeline and job management metadata – Add upon ingest – Update as processing modifies data  Access / tag-based policies and audit logs  Centrally stored to facilitate use across apps – Ex. backed by Cloud RDS (or shared DB) Classification Prohibition Time Location Streams Pipelines Feeds Tables Files Objects Shared Metadata Policies
  9. 9. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Elastic Resource Management in Context of Workload Workload Management vs. Cluster Management  Understand resource needs of different workload types  Add / remove resources to meet workload SLAs  Manage compute power and high-performance data-access (ex., LLAP)  Pricing-aware: instances (spot, reserved), data, bandwidth Elastic Resource Management
  10. 10. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ram Venkatesh Senior Director of Engineering Hortonworks Demo of Cloud Tech Preview Effectiveness of mobile ad spend (cross device attribution) Clickstream ETL BI & Reporting Data Science Data, Metadata, Security Cloud Control Plane
  11. 11. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Vision: Connected Data Architecture Enables Enterprise Transformations Data in Motion Data in Motion Data at Rest Data at Rest Machine Learning Deep Historical Analysis CLOUD DATA CENTER Stream Analytics Edge Data Edge Data Edge Analytics
  12. 12. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Recommended Sessions… Thursday  Hadoop & Cloud Storage: Object Store Integration in Production  LLAP: Sub-Second Analytical Queries in Hive  Zeppelin + Livy: Bringing multi tenancy to interactive data analysis CHECK OUT HORTONWORKS CLOUD TECH PREVIEW! http://hortonworks.com/news-blogs/
  13. 13. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You

×