SlideShare a Scribd company logo

Cloud-Friendly Hadoop and Hive - StampedeCon 2013

`At the StampedeCon 2013 Big Data conference in St. Louis, Shrikanth Shankar, Head of Engineering at Qubole, presented Cloud-Friendly Hadoop and Hive. The cloud reduces the barrier to entry for many small and medium size enterprises into analytics. Hadoop and related frameworks like Hive, Oozie, Sqoop are becoming tools of choice for deriving insights from data. However, these frameworks were designed for in-house datacenters, which have different tradeoffs from a cloud environment, and making them run well in the cloud presents some challenges. In this talk, Shrikanth Shankar, Head of Engineering at Qubole, describes how these experiences taught us to extend Hadoop and Hive to exploit these new tradeoffs. Use cases will be presented that show how the challenges at large scale at Facebook are now making it extremely easy for a significantly smaller end user to leverage these technologies in the cloud.

1 of 19
CLOUD FRIENDLY HADOOP/HIVE
Shrikanth Shankar | Qubole
VP of Engineering
Thursday, July 25, 13
INTRODUCTION
• Hadoop has revolutionized big data processing
• Becoming the de-facto platform for new data projects
• Started as file system (HDFS) + Programming framework (Map-Reduce).An ecosystem of
projects has sprung up on top of Hadoop
• Hive, Pig, Cascading etc. - Simple ways of processing data
• Sqoop, Flume etc. - Data movement into and out of HDFS
• Oozie,Azkaban etc. - Workflow scheduling
• However, these systems were all designed with an on-premise architecture in mind.
• The cloud is different enough - Some things can/should change.
Thursday, July 25, 13
DN/TT DN/TT
ON-PREMISE HADOOP
ARCHITECTURE
Hadoop Cluster
Namenode
JobTracker
DN/TTDN/TTDN/TT ......
IT control
Relational
systems
(Hive metastore etc.)
End User End User ...... End User
Thursday, July 25, 13
HADOOP ON-PREMISE
• Usually deployed on bare-metal nodes*
• HDFS is store of choice (3-way replication for safety). Locality of data
access is a big design point
• Clusters are mostly static - new machines are added on IT schedule*
• Static clusters means users can focus on their tasks (MR jobs, Hive
queries) and not on cluster management
• IT bears the burden of managing clusters
Thursday, July 25, 13
HADOOP ON-PREMISE
• Partitioning of resources
• Static partitioning with different clusters for Batch and
Interactive workloads
• Within a cluster load balancing is done by the JT scheduler
• Capex costs are significant
• IT controlled - requires an Ops team (Hadoop ops, Sysadmin
etc.)
Thursday, July 25, 13
CLOUD ARCHITECTURE
HIGHLY AWS CENTRIC - BUT EVERYONE IS
FOLLOWING FAST
Thursday, July 25, 13
Ad

Recommended

The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012Joydeep Sen Sarma
 
Scylla Summit 2022: IO Scheduling & NVMe Disk Modelling
 Scylla Summit 2022: IO Scheduling & NVMe Disk Modelling Scylla Summit 2022: IO Scheduling & NVMe Disk Modelling
Scylla Summit 2022: IO Scheduling & NVMe Disk ModellingScyllaDB
 
Scylla Summit 2016: Graph Processing with Titan and Scylla
Scylla Summit 2016: Graph Processing with Titan and ScyllaScylla Summit 2016: Graph Processing with Titan and Scylla
Scylla Summit 2016: Graph Processing with Titan and ScyllaScyllaDB
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1ScyllaDB
 
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScyllaDB
 
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...
Implementing a Distributed NoSQL Database in a Persistent Distributed Ledger ...ScyllaDB
 
Scylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech CycleScylla Summit 2022: How ScyllaDB Powers This Next Tech Cycle
Scylla Summit 2022: How ScyllaDB Powers This Next Tech CycleScyllaDB
 

More Related Content

What's hot

Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Alluxio, Inc.
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio, Inc.
 
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...OpenNebula Project
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla ClusterScyllaDB
 
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...ScyllaDB
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...ScyllaDB
 
How SkyElectric Uses Scylla to Power Its Smart Energy Platform
How SkyElectric Uses Scylla to Power Its Smart Energy PlatformHow SkyElectric Uses Scylla to Power Its Smart Energy Platform
How SkyElectric Uses Scylla to Power Its Smart Energy PlatformScyllaDB
 
PLNOG15: Exascale future of today - Rob Bird
PLNOG15: Exascale future of today - Rob BirdPLNOG15: Exascale future of today - Rob Bird
PLNOG15: Exascale future of today - Rob BirdPROIDEA
 
Dawarehouse como servicio en azure (sqldw)
Dawarehouse como servicio en azure (sqldw)Dawarehouse como servicio en azure (sqldw)
Dawarehouse como servicio en azure (sqldw)Enrique Catala Bañuls
 
Hadoop - An introduction for SQL Server DBAs
Hadoop - An introduction for SQL Server DBAsHadoop - An introduction for SQL Server DBAs
Hadoop - An introduction for SQL Server DBAsandrewdenty
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016Tzach Livyatan
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
Scylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScyllaDB
 
How to be Successful with Scylla
How to be Successful with ScyllaHow to be Successful with Scylla
How to be Successful with ScyllaScyllaDB
 
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?ScyllaDB
 
Powering Interactive Analytics with Alluxio and Presto
Powering Interactive Analytics with Alluxio and PrestoPowering Interactive Analytics with Alluxio and Presto
Powering Interactive Analytics with Alluxio and PrestoAlluxio, Inc.
 
Casual mass parallel data processing in Java
Casual mass parallel data processing in JavaCasual mass parallel data processing in Java
Casual mass parallel data processing in JavaAltoros
 

What's hot (20)

March 2011 HUG: Scaling Hadoop
March 2011 HUG: Scaling HadoopMarch 2011 HUG: Scaling Hadoop
March 2011 HUG: Scaling Hadoop
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
 
Compression talk
Compression talkCompression talk
Compression talk
 
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
 
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
How SkyElectric Uses Scylla to Power Its Smart Energy Platform
How SkyElectric Uses Scylla to Power Its Smart Energy PlatformHow SkyElectric Uses Scylla to Power Its Smart Energy Platform
How SkyElectric Uses Scylla to Power Its Smart Energy Platform
 
PLNOG15: Exascale future of today - Rob Bird
PLNOG15: Exascale future of today - Rob BirdPLNOG15: Exascale future of today - Rob Bird
PLNOG15: Exascale future of today - Rob Bird
 
Dawarehouse como servicio en azure (sqldw)
Dawarehouse como servicio en azure (sqldw)Dawarehouse como servicio en azure (sqldw)
Dawarehouse como servicio en azure (sqldw)
 
Hadoop - An introduction for SQL Server DBAs
Hadoop - An introduction for SQL Server DBAsHadoop - An introduction for SQL Server DBAs
Hadoop - An introduction for SQL Server DBAs
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
Scylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDB
 
How to be Successful with Scylla
How to be Successful with ScyllaHow to be Successful with Scylla
How to be Successful with Scylla
 
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
 
BDAAS on the Cloud
BDAAS on the CloudBDAAS on the Cloud
BDAAS on the Cloud
 
Powering Interactive Analytics with Alluxio and Presto
Powering Interactive Analytics with Alluxio and PrestoPowering Interactive Analytics with Alluxio and Presto
Powering Interactive Analytics with Alluxio and Presto
 
Casual mass parallel data processing in Java
Casual mass parallel data processing in JavaCasual mass parallel data processing in Java
Casual mass parallel data processing in Java
 

Viewers also liked

Curriculum Vitae_Samdoo JUNG
Curriculum Vitae_Samdoo JUNGCurriculum Vitae_Samdoo JUNG
Curriculum Vitae_Samdoo JUNGSamdoo Jung
 
Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014StampedeCon
 
Carol Varney's 1 Resume 2015
Carol Varney's 1 Resume 2015Carol Varney's 1 Resume 2015
Carol Varney's 1 Resume 2015Carol Varney
 
How Big Data Will Save Planet Earth - StampedeCon 2015
How Big Data Will Save Planet Earth - StampedeCon 2015How Big Data Will Save Planet Earth - StampedeCon 2015
How Big Data Will Save Planet Earth - StampedeCon 2015StampedeCon
 
Estrategias pedagógicas para la enseñanza de una cultura
Estrategias pedagógicas para la enseñanza de una culturaEstrategias pedagógicas para la enseñanza de una cultura
Estrategias pedagógicas para la enseñanza de una culturaoscar daniel naranjo aristizabal
 
PatrickJacks_Resume2
PatrickJacks_Resume2PatrickJacks_Resume2
PatrickJacks_Resume2Patrick Jacks
 
Economics and finance of Europe: An intensive ten-week programme
Economics and finance of Europe: An intensive ten-week programmeEconomics and finance of Europe: An intensive ten-week programme
Economics and finance of Europe: An intensive ten-week programmeLSE Enterprise
 
LSE Enterprise Annual Report 2015
LSE Enterprise Annual Report 2015LSE Enterprise Annual Report 2015
LSE Enterprise Annual Report 2015LSE Enterprise
 
Listening for Insights: The Power of Social Media Listening - StampedeCon 2012
Listening for Insights: The Power of Social Media Listening - StampedeCon 2012Listening for Insights: The Power of Social Media Listening - StampedeCon 2012
Listening for Insights: The Power of Social Media Listening - StampedeCon 2012StampedeCon
 
Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015StampedeCon
 
4 types of train accidents for which victims can claim compensation
4 types of train accidents for which victims can claim compensation4 types of train accidents for which victims can claim compensation
4 types of train accidents for which victims can claim compensationfrekhtmanassociates
 
From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...
From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...
From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...StampedeCon
 
2 kien pham cv en vn with project experience
2 kien pham cv en  vn with project experience2 kien pham cv en  vn with project experience
2 kien pham cv en vn with project experienceKien Pham
 
Estrous synchronization
Estrous synchronizationEstrous synchronization
Estrous synchronizationArmia Naguib
 

Viewers also liked (16)

Curriculum Vitae_Samdoo JUNG
Curriculum Vitae_Samdoo JUNGCurriculum Vitae_Samdoo JUNG
Curriculum Vitae_Samdoo JUNG
 
Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014
 
Carol Varney's 1 Resume 2015
Carol Varney's 1 Resume 2015Carol Varney's 1 Resume 2015
Carol Varney's 1 Resume 2015
 
CORRUPTION IN INDIA
CORRUPTION IN INDIACORRUPTION IN INDIA
CORRUPTION IN INDIA
 
How Big Data Will Save Planet Earth - StampedeCon 2015
How Big Data Will Save Planet Earth - StampedeCon 2015How Big Data Will Save Planet Earth - StampedeCon 2015
How Big Data Will Save Planet Earth - StampedeCon 2015
 
Estrategias pedagógicas para la enseñanza de una cultura
Estrategias pedagógicas para la enseñanza de una culturaEstrategias pedagógicas para la enseñanza de una cultura
Estrategias pedagógicas para la enseñanza de una cultura
 
PatrickJacks_Resume2
PatrickJacks_Resume2PatrickJacks_Resume2
PatrickJacks_Resume2
 
Cattalugue2016
Cattalugue2016Cattalugue2016
Cattalugue2016
 
Economics and finance of Europe: An intensive ten-week programme
Economics and finance of Europe: An intensive ten-week programmeEconomics and finance of Europe: An intensive ten-week programme
Economics and finance of Europe: An intensive ten-week programme
 
LSE Enterprise Annual Report 2015
LSE Enterprise Annual Report 2015LSE Enterprise Annual Report 2015
LSE Enterprise Annual Report 2015
 
Listening for Insights: The Power of Social Media Listening - StampedeCon 2012
Listening for Insights: The Power of Social Media Listening - StampedeCon 2012Listening for Insights: The Power of Social Media Listening - StampedeCon 2012
Listening for Insights: The Power of Social Media Listening - StampedeCon 2012
 
Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015
 
4 types of train accidents for which victims can claim compensation
4 types of train accidents for which victims can claim compensation4 types of train accidents for which victims can claim compensation
4 types of train accidents for which victims can claim compensation
 
From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...
From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...
From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...
 
2 kien pham cv en vn with project experience
2 kien pham cv en  vn with project experience2 kien pham cv en  vn with project experience
2 kien pham cv en vn with project experience
 
Estrous synchronization
Estrous synchronizationEstrous synchronization
Estrous synchronization
 

Similar to Cloud-Friendly Hadoop and Hive - StampedeCon 2013

Top 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloudTop 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloudRogue Wave Software
 
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...Mohamed Sayed
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCOlga Lavrentieva
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloudelliando dias
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013StampedeCon
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Govind Kanshi
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interactionGovind Kanshi
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Real-time searching of big data with Solr and Hadoop
Real-time searching of big data with Solr and HadoopReal-time searching of big data with Solr and Hadoop
Real-time searching of big data with Solr and HadoopRogue Wave Software
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerIke Ellis
 
Atom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at SupremindAtom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at SupremindAlluxio, Inc.
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for HadoopJoe Crobak
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkInfinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkHentsū
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Community
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - LondonInfinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - LondonHentsū
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop OperationsOwen O'Malley
 
Scheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersScheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersAmjith Singh
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupMike Percy
 

Similar to Cloud-Friendly Hadoop and Hive - StampedeCon 2013 (20)

Top 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloudTop 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloud
 
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPC
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interaction
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Real-time searching of big data with Solr and Hadoop
Real-time searching of big data with Solr and HadoopReal-time searching of big data with Solr and Hadoop
Real-time searching of big data with Solr and Hadoop
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute Beginner
 
Atom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at SupremindAtom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at Supremind
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkInfinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - LondonInfinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
 
Scheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersScheduling scheme for hadoop clusters
Scheduling scheme for hadoop clusters
 
Hadoop
HadoopHadoop
Hadoop
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 
Qcon talk
Qcon talkQcon talk
Qcon talk
 

More from StampedeCon

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016StampedeCon
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
 

More from StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 

Recently uploaded

Dynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringDynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringMassimo Talia
 
Traffic Signboard Classification with Voice alert to the driver.pptx
Traffic Signboard Classification with Voice alert to the driver.pptxTraffic Signboard Classification with Voice alert to the driver.pptx
Traffic Signboard Classification with Voice alert to the driver.pptxharimaxwell0712
 
"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor FesenkoFwdays
 
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions..."How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions...Fwdays
 
Curtain Module Manual Zigbee Neo CS01-1C.pdf
Curtain Module Manual Zigbee Neo CS01-1C.pdfCurtain Module Manual Zigbee Neo CS01-1C.pdf
Curtain Module Manual Zigbee Neo CS01-1C.pdfDomotica daVinci
 
M.Aathiraju Self Intro.docx-AD21001_____
M.Aathiraju Self Intro.docx-AD21001_____M.Aathiraju Self Intro.docx-AD21001_____
M.Aathiraju Self Intro.docx-AD21001_____Aathiraju
 
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...UiPathCommunity
 
H3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxH3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxMemory Fabric Forum
 
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...DianaGray10
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVARobert McDermott
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolProduct School
 
AI Act & Standardization: UNINFO involvement
AI Act & Standardization: UNINFO involvementAI Act & Standardization: UNINFO involvement
AI Act & Standardization: UNINFO involvementMimmo Squillace
 
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...ISPMAIndia
 
Importance of magazines in education ppt
Importance of magazines in education pptImportance of magazines in education ppt
Importance of magazines in education pptsafnarafeek2002
 
Confoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceConfoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceSusan Ibach
 
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Umar Saif
 
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17Ana-Maria Mihalceanu
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdf
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdfLLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdf
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdfThomas Poetter
 
2024 February Patch Tuesday
2024 February Patch Tuesday2024 February Patch Tuesday
2024 February Patch TuesdayIvanti
 

Recently uploaded (20)

Dynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringDynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineering
 
Traffic Signboard Classification with Voice alert to the driver.pptx
Traffic Signboard Classification with Voice alert to the driver.pptxTraffic Signboard Classification with Voice alert to the driver.pptx
Traffic Signboard Classification with Voice alert to the driver.pptx
 
"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko
 
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions..."How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
 
Curtain Module Manual Zigbee Neo CS01-1C.pdf
Curtain Module Manual Zigbee Neo CS01-1C.pdfCurtain Module Manual Zigbee Neo CS01-1C.pdf
Curtain Module Manual Zigbee Neo CS01-1C.pdf
 
M.Aathiraju Self Intro.docx-AD21001_____
M.Aathiraju Self Intro.docx-AD21001_____M.Aathiraju Self Intro.docx-AD21001_____
M.Aathiraju Self Intro.docx-AD21001_____
 
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
 
H3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxH3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptx
 
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVA
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product School
 
AI Act & Standardization: UNINFO involvement
AI Act & Standardization: UNINFO involvementAI Act & Standardization: UNINFO involvement
AI Act & Standardization: UNINFO involvement
 
5 Tech Trend to Notice in ESG Landscape- 47Billion
5 Tech Trend to Notice in ESG Landscape- 47Billion5 Tech Trend to Notice in ESG Landscape- 47Billion
5 Tech Trend to Notice in ESG Landscape- 47Billion
 
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
 
Importance of magazines in education ppt
Importance of magazines in education pptImportance of magazines in education ppt
Importance of magazines in education ppt
 
Confoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceConfoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data science
 
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
 
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdf
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdfLLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdf
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdf
 
2024 February Patch Tuesday
2024 February Patch Tuesday2024 February Patch Tuesday
2024 February Patch Tuesday
 

Cloud-Friendly Hadoop and Hive - StampedeCon 2013

  • 1. CLOUD FRIENDLY HADOOP/HIVE Shrikanth Shankar | Qubole VP of Engineering Thursday, July 25, 13
  • 2. INTRODUCTION • Hadoop has revolutionized big data processing • Becoming the de-facto platform for new data projects • Started as file system (HDFS) + Programming framework (Map-Reduce).An ecosystem of projects has sprung up on top of Hadoop • Hive, Pig, Cascading etc. - Simple ways of processing data • Sqoop, Flume etc. - Data movement into and out of HDFS • Oozie,Azkaban etc. - Workflow scheduling • However, these systems were all designed with an on-premise architecture in mind. • The cloud is different enough - Some things can/should change. Thursday, July 25, 13
  • 3. DN/TT DN/TT ON-PREMISE HADOOP ARCHITECTURE Hadoop Cluster Namenode JobTracker DN/TTDN/TTDN/TT ...... IT control Relational systems (Hive metastore etc.) End User End User ...... End User Thursday, July 25, 13
  • 4. HADOOP ON-PREMISE • Usually deployed on bare-metal nodes* • HDFS is store of choice (3-way replication for safety). Locality of data access is a big design point • Clusters are mostly static - new machines are added on IT schedule* • Static clusters means users can focus on their tasks (MR jobs, Hive queries) and not on cluster management • IT bears the burden of managing clusters Thursday, July 25, 13
  • 5. HADOOP ON-PREMISE • Partitioning of resources • Static partitioning with different clusters for Batch and Interactive workloads • Within a cluster load balancing is done by the JT scheduler • Capex costs are significant • IT controlled - requires an Ops team (Hadoop ops, Sysadmin etc.) Thursday, July 25, 13
  • 6. CLOUD ARCHITECTURE HIGHLY AWS CENTRIC - BUT EVERYONE IS FOLLOWING FAST Thursday, July 25, 13
  • 7. CLOUD COMPONENTS Object Stores Ephemeral compute nodes Block Stores PaaS Offerings (RDS, etc.) Thursday, July 25, 13
  • 8. INFRASTRUCTURE CHARACTERISTICS • Running in aVM • Not that big a deal usually - except plan for performance variability • No locality information • Nodes are ephemeral - if you lose a node you will lose data on the node • AZ-wide correlated failures are to be expected. Region wide are possible (but rare) • High capacity Object stores with high cross sectional bandwidth • High latency, Variability in perf, REMOTE*. Not POSIX compliant • Persistent block stores • REMOTE,Variable perf, Thursday, July 25, 13
  • 9. INFRASTRUCTURE CHARACTERISTICS • ELASTIC • Add a 100 nodes on demand in a few minutes • Costs are Op-ex (largely). • Nodes are per hour (CPU + Disk), Storage is per GB • Cost management is a key challenge • Some interesting payment choices (On-demand, Spot, Reserved) Thursday, July 25, 13
  • 11. STORAGE • From a cost perspective using HDFS for long term storage means you pay for both CPU and disk. • Its also more expensive to make HDFS reliable (cross AZ, maybe even cross Region?) • Using an object store allows you to pay only for storage • With object stores you see latency issues since data is remote Thursday, July 25, 13
  • 12. STORAGE • But node storage is still needed when jobs and queries are active • For intermediate job results (not all results should go back to S3 - e.g. stage outputs in Hive) • For intermediate data (mapper output) • Makes scaling nodes challenging • Also since performance is better - may want to move remote data to HDFS before accessing Thursday, July 25, 13
  • 13. COMPUTE AND CLUSTERS • If you dont need Hadoop for persistent storage - when do you need a cluster? • Bring them up on demand - maybe for every job? • But that can be expensive - no multiplexing • Ideally you want to share Hadoop clusters as much as possible. Shut down cluster when not being used Thursday, July 25, 13
  • 14. COMPUTE AND CLUSTERS • If cluster is dynamic and you need sharing - how do you do ‘discover’ it? • How about cluster sizing? • Static is a left over from on-premise • Be dynamic on the cloud. Hard for end users to do manually Thursday, July 25, 13
  • 15. COMPUTE AND CLUSTER • Adding nodes needs to be done based on load • E.g. Most of the time jobs need < 5 nodes. A batch job comes in needs 100 nodes. We should expand the cluster (for as long as needed) • Removing nodes is trickier • If we lose intermediate results lots of work will be lost. • Job1 uses 100 nodes, produces data spread over all of them. Job 2 consumes results but only needs 10 nodes. How do you give up 90 nodes? Thursday, July 25, 13
  • 16. COMPUTE AND CLUSTER • Pricing choices are interesting • For e.g. spot nodes average half the price of an on-demand node • But if price spikes you lose all the spot nodes at once • Hadoop fault tolerance can retry failed jobs (but expensive) - what about data loss when you lose all the spot nodes? Thursday, July 25, 13
  • 17. END USER EXPERIENCE • The cloud isnt just about cost - its also about agility.To allow this we need to focus on the end user experience • End users would prefer to focus on higher level API’s • e.g. Run a Hadoop job or a Hive query - specifics of clusters should be hidden from them • Some things should be persistent (log files, results, ...) • They get this for free on premise Thursday, July 25, 13
  • 18. BETTER END STATE • IT/dev ops/users should set high level controls • Usage governance (max cluster size, max bill, cpu hours used per month etc.) • End users should focus at the level they understand • Smart software should bridge the gap Thursday, July 25, 13