SlideShare a Scribd company logo
CLOUD FRIENDLY HADOOP/HIVE
Shrikanth Shankar | Qubole
VP of Engineering
Thursday, July 25, 13
INTRODUCTION
• Hadoop has revolutionized big data processing
• Becoming the de-facto platform for new data projects
• Started as file system (HDFS) + Programming framework (Map-Reduce).An ecosystem of
projects has sprung up on top of Hadoop
• Hive, Pig, Cascading etc. - Simple ways of processing data
• Sqoop, Flume etc. - Data movement into and out of HDFS
• Oozie,Azkaban etc. - Workflow scheduling
• However, these systems were all designed with an on-premise architecture in mind.
• The cloud is different enough - Some things can/should change.
Thursday, July 25, 13
DN/TT DN/TT
ON-PREMISE HADOOP
ARCHITECTURE
Hadoop Cluster
Namenode
JobTracker
DN/TTDN/TTDN/TT ......
IT control
Relational
systems
(Hive metastore etc.)
End User End User ...... End User
Thursday, July 25, 13
HADOOP ON-PREMISE
• Usually deployed on bare-metal nodes*
• HDFS is store of choice (3-way replication for safety). Locality of data
access is a big design point
• Clusters are mostly static - new machines are added on IT schedule*
• Static clusters means users can focus on their tasks (MR jobs, Hive
queries) and not on cluster management
• IT bears the burden of managing clusters
Thursday, July 25, 13
HADOOP ON-PREMISE
• Partitioning of resources
• Static partitioning with different clusters for Batch and
Interactive workloads
• Within a cluster load balancing is done by the JT scheduler
• Capex costs are significant
• IT controlled - requires an Ops team (Hadoop ops, Sysadmin
etc.)
Thursday, July 25, 13
CLOUD ARCHITECTURE
HIGHLY AWS CENTRIC - BUT EVERYONE IS
FOLLOWING FAST
Thursday, July 25, 13
CLOUD COMPONENTS
Object Stores
Ephemeral compute
nodes
Block
Stores
PaaS
Offerings
(RDS, etc.)
Thursday, July 25, 13
INFRASTRUCTURE
CHARACTERISTICS
• Running in aVM
• Not that big a deal usually - except plan for performance variability
• No locality information
• Nodes are ephemeral - if you lose a node you will lose data on the node
• AZ-wide correlated failures are to be expected. Region wide are possible (but rare)
• High capacity Object stores with high cross sectional bandwidth
• High latency, Variability in perf, REMOTE*. Not POSIX compliant
• Persistent block stores
• REMOTE,Variable perf,
Thursday, July 25, 13
INFRASTRUCTURE
CHARACTERISTICS
• ELASTIC
• Add a 100 nodes on demand in a few minutes
• Costs are Op-ex (largely).
• Nodes are per hour (CPU + Disk), Storage is per GB
• Cost management is a key challenge
• Some interesting payment choices (On-demand, Spot, Reserved)
Thursday, July 25, 13
LETS PUTTHESE WORLDS
TOGETHER
Thursday, July 25, 13
STORAGE
• From a cost perspective using HDFS for long term storage
means you pay for both CPU and disk.
• Its also more expensive to make HDFS reliable (cross AZ,
maybe even cross Region?)
• Using an object store allows you to pay only for storage
• With object stores you see latency issues since data is remote
Thursday, July 25, 13
STORAGE
• But node storage is still needed when jobs and queries are
active
• For intermediate job results (not all results should go back
to S3 - e.g. stage outputs in Hive)
• For intermediate data (mapper output)
• Makes scaling nodes challenging
• Also since performance is better - may want to move remote
data to HDFS before accessing
Thursday, July 25, 13
COMPUTE AND CLUSTERS
• If you dont need Hadoop for persistent storage - when do
you need a cluster?
• Bring them up on demand - maybe for every job?
• But that can be expensive - no multiplexing
• Ideally you want to share Hadoop clusters as much as
possible. Shut down cluster when not being used
Thursday, July 25, 13
COMPUTE AND CLUSTERS
• If cluster is dynamic and you need sharing - how do you do
‘discover’ it?
• How about cluster sizing?
• Static is a left over from on-premise
• Be dynamic on the cloud. Hard for end users to do manually
Thursday, July 25, 13
COMPUTE AND CLUSTER
• Adding nodes needs to be done based on load
• E.g. Most of the time jobs need < 5 nodes. A batch job
comes in needs 100 nodes. We should expand the cluster
(for as long as needed)
• Removing nodes is trickier
• If we lose intermediate results lots of work will be lost.
• Job1 uses 100 nodes, produces data spread over all of them.
Job 2 consumes results but only needs 10 nodes. How do you
give up 90 nodes?
Thursday, July 25, 13
COMPUTE AND CLUSTER
• Pricing choices are interesting
• For e.g. spot nodes average half the price of an on-demand
node
• But if price spikes you lose all the spot nodes at once
• Hadoop fault tolerance can retry failed jobs (but expensive) -
what about data loss when you lose all the spot nodes?
Thursday, July 25, 13
END USER EXPERIENCE
• The cloud isnt just about cost - its also about agility.To allow
this we need to focus on the end user experience
• End users would prefer to focus on higher level API’s
• e.g. Run a Hadoop job or a Hive query - specifics of
clusters should be hidden from them
• Some things should be persistent (log files, results, ...)
• They get this for free on premise
Thursday, July 25, 13
BETTER END STATE
• IT/dev ops/users should set high level controls
• Usage governance (max cluster size, max bill, cpu hours used
per month etc.)
• End users should focus at the level they understand
• Smart software should bridge the gap
Thursday, July 25, 13
QUESTIONS?
Thursday, July 25, 13

More Related Content

What's hot

March 2011 HUG: Scaling Hadoop
March 2011 HUG: Scaling HadoopMarch 2011 HUG: Scaling Hadoop
March 2011 HUG: Scaling Hadoop
Yahoo Developer Network
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Alluxio, Inc.
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio, Inc.
 
Compression talk
Compression talkCompression talk
Compression talk
Ilya Ganelin
 
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...
OpenNebula Project
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
ScyllaDB
 
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
ScyllaDB
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
ScyllaDB
 
How SkyElectric Uses Scylla to Power Its Smart Energy Platform
How SkyElectric Uses Scylla to Power Its Smart Energy PlatformHow SkyElectric Uses Scylla to Power Its Smart Energy Platform
How SkyElectric Uses Scylla to Power Its Smart Energy Platform
ScyllaDB
 
PLNOG15: Exascale future of today - Rob Bird
PLNOG15: Exascale future of today - Rob BirdPLNOG15: Exascale future of today - Rob Bird
PLNOG15: Exascale future of today - Rob Bird
PROIDEA
 
Dawarehouse como servicio en azure (sqldw)
Dawarehouse como servicio en azure (sqldw)Dawarehouse como servicio en azure (sqldw)
Dawarehouse como servicio en azure (sqldw)
Enrique Catala Bañuls
 
Hadoop - An introduction for SQL Server DBAs
Hadoop - An introduction for SQL Server DBAsHadoop - An introduction for SQL Server DBAs
Hadoop - An introduction for SQL Server DBAs
andrewdenty
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016
Tzach Livyatan
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
Hakka Labs
 
Scylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDB
ScyllaDB
 
How to be Successful with Scylla
How to be Successful with ScyllaHow to be Successful with Scylla
How to be Successful with Scylla
ScyllaDB
 
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
ScyllaDB
 
BDAAS on the Cloud
BDAAS on the CloudBDAAS on the Cloud
BDAAS on the Cloud
Abhishek Somani
 
Powering Interactive Analytics with Alluxio and Presto
Powering Interactive Analytics with Alluxio and PrestoPowering Interactive Analytics with Alluxio and Presto
Powering Interactive Analytics with Alluxio and Presto
Alluxio, Inc.
 
Casual mass parallel data processing in Java
Casual mass parallel data processing in JavaCasual mass parallel data processing in Java
Casual mass parallel data processing in Java
Altoros
 

What's hot (20)

March 2011 HUG: Scaling Hadoop
March 2011 HUG: Scaling HadoopMarch 2011 HUG: Scaling Hadoop
March 2011 HUG: Scaling Hadoop
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
 
Compression talk
Compression talkCompression talk
Compression talk
 
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
 
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
How SkyElectric Uses Scylla to Power Its Smart Energy Platform
How SkyElectric Uses Scylla to Power Its Smart Energy PlatformHow SkyElectric Uses Scylla to Power Its Smart Energy Platform
How SkyElectric Uses Scylla to Power Its Smart Energy Platform
 
PLNOG15: Exascale future of today - Rob Bird
PLNOG15: Exascale future of today - Rob BirdPLNOG15: Exascale future of today - Rob Bird
PLNOG15: Exascale future of today - Rob Bird
 
Dawarehouse como servicio en azure (sqldw)
Dawarehouse como servicio en azure (sqldw)Dawarehouse como servicio en azure (sqldw)
Dawarehouse como servicio en azure (sqldw)
 
Hadoop - An introduction for SQL Server DBAs
Hadoop - An introduction for SQL Server DBAsHadoop - An introduction for SQL Server DBAs
Hadoop - An introduction for SQL Server DBAs
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
Scylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDB
 
How to be Successful with Scylla
How to be Successful with ScyllaHow to be Successful with Scylla
How to be Successful with Scylla
 
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
 
BDAAS on the Cloud
BDAAS on the CloudBDAAS on the Cloud
BDAAS on the Cloud
 
Powering Interactive Analytics with Alluxio and Presto
Powering Interactive Analytics with Alluxio and PrestoPowering Interactive Analytics with Alluxio and Presto
Powering Interactive Analytics with Alluxio and Presto
 
Casual mass parallel data processing in Java
Casual mass parallel data processing in JavaCasual mass parallel data processing in Java
Casual mass parallel data processing in Java
 

Viewers also liked

Curriculum Vitae_Samdoo JUNG
Curriculum Vitae_Samdoo JUNGCurriculum Vitae_Samdoo JUNG
Curriculum Vitae_Samdoo JUNG
Samdoo Jung
 
Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014
StampedeCon
 
Carol Varney's 1 Resume 2015
Carol Varney's 1 Resume 2015Carol Varney's 1 Resume 2015
Carol Varney's 1 Resume 2015
Carol Varney
 
CORRUPTION IN INDIA
CORRUPTION IN INDIACORRUPTION IN INDIA
CORRUPTION IN INDIA
Santhosh Kumar
 
How Big Data Will Save Planet Earth - StampedeCon 2015
How Big Data Will Save Planet Earth - StampedeCon 2015How Big Data Will Save Planet Earth - StampedeCon 2015
How Big Data Will Save Planet Earth - StampedeCon 2015
StampedeCon
 
Estrategias pedagógicas para la enseñanza de una cultura
Estrategias pedagógicas para la enseñanza de una culturaEstrategias pedagógicas para la enseñanza de una cultura
Estrategias pedagógicas para la enseñanza de una cultura
oscar daniel naranjo aristizabal
 
PatrickJacks_Resume2
PatrickJacks_Resume2PatrickJacks_Resume2
PatrickJacks_Resume2
Patrick Jacks
 
Cattalugue2016
Cattalugue2016Cattalugue2016
Cattalugue2016
WsolutionSteel
 
Economics and finance of Europe: An intensive ten-week programme
Economics and finance of Europe: An intensive ten-week programmeEconomics and finance of Europe: An intensive ten-week programme
Economics and finance of Europe: An intensive ten-week programme
LSE Enterprise
 
LSE Enterprise Annual Report 2015
LSE Enterprise Annual Report 2015LSE Enterprise Annual Report 2015
LSE Enterprise Annual Report 2015
LSE Enterprise
 
Listening for Insights: The Power of Social Media Listening - StampedeCon 2012
Listening for Insights: The Power of Social Media Listening - StampedeCon 2012Listening for Insights: The Power of Social Media Listening - StampedeCon 2012
Listening for Insights: The Power of Social Media Listening - StampedeCon 2012
StampedeCon
 
Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015
StampedeCon
 
4 types of train accidents for which victims can claim compensation
4 types of train accidents for which victims can claim compensation4 types of train accidents for which victims can claim compensation
4 types of train accidents for which victims can claim compensation
frekhtmanassociates
 
From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...
From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...
From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...
StampedeCon
 
2 kien pham cv en vn with project experience
2 kien pham cv en  vn with project experience2 kien pham cv en  vn with project experience
2 kien pham cv en vn with project experience
Kien Pham
 
Estrous synchronization
Estrous synchronizationEstrous synchronization
Estrous synchronization
Armia Naguib
 

Viewers also liked (16)

Curriculum Vitae_Samdoo JUNG
Curriculum Vitae_Samdoo JUNGCurriculum Vitae_Samdoo JUNG
Curriculum Vitae_Samdoo JUNG
 
Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014
 
Carol Varney's 1 Resume 2015
Carol Varney's 1 Resume 2015Carol Varney's 1 Resume 2015
Carol Varney's 1 Resume 2015
 
CORRUPTION IN INDIA
CORRUPTION IN INDIACORRUPTION IN INDIA
CORRUPTION IN INDIA
 
How Big Data Will Save Planet Earth - StampedeCon 2015
How Big Data Will Save Planet Earth - StampedeCon 2015How Big Data Will Save Planet Earth - StampedeCon 2015
How Big Data Will Save Planet Earth - StampedeCon 2015
 
Estrategias pedagógicas para la enseñanza de una cultura
Estrategias pedagógicas para la enseñanza de una culturaEstrategias pedagógicas para la enseñanza de una cultura
Estrategias pedagógicas para la enseñanza de una cultura
 
PatrickJacks_Resume2
PatrickJacks_Resume2PatrickJacks_Resume2
PatrickJacks_Resume2
 
Cattalugue2016
Cattalugue2016Cattalugue2016
Cattalugue2016
 
Economics and finance of Europe: An intensive ten-week programme
Economics and finance of Europe: An intensive ten-week programmeEconomics and finance of Europe: An intensive ten-week programme
Economics and finance of Europe: An intensive ten-week programme
 
LSE Enterprise Annual Report 2015
LSE Enterprise Annual Report 2015LSE Enterprise Annual Report 2015
LSE Enterprise Annual Report 2015
 
Listening for Insights: The Power of Social Media Listening - StampedeCon 2012
Listening for Insights: The Power of Social Media Listening - StampedeCon 2012Listening for Insights: The Power of Social Media Listening - StampedeCon 2012
Listening for Insights: The Power of Social Media Listening - StampedeCon 2012
 
Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015
 
4 types of train accidents for which victims can claim compensation
4 types of train accidents for which victims can claim compensation4 types of train accidents for which victims can claim compensation
4 types of train accidents for which victims can claim compensation
 
From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...
From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...
From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...
 
2 kien pham cv en vn with project experience
2 kien pham cv en  vn with project experience2 kien pham cv en  vn with project experience
2 kien pham cv en vn with project experience
 
Estrous synchronization
Estrous synchronizationEstrous synchronization
Estrous synchronization
 

Similar to Cloud-Friendly Hadoop and Hive - StampedeCon 2013

Top 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloudTop 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloud
Rogue Wave Software
 
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
Mohamed Sayed
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPC
Olga Lavrentieva
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
elliando dias
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
StampedeCon
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interaction
Govind Kanshi
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)
Govind Kanshi
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
 
Real-time searching of big data with Solr and Hadoop
Real-time searching of big data with Solr and HadoopReal-time searching of big data with Solr and Hadoop
Real-time searching of big data with Solr and Hadoop
Rogue Wave Software
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute Beginner
Ike Ellis
 
Atom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at SupremindAtom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at Supremind
Alluxio, Inc.
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
Joe Crobak
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkInfinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Hentsū
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Community
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - LondonInfinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
Hentsū
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
Owen O'Malley
 
Scheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersScheduling scheme for hadoop clusters
Scheduling scheme for hadoop clusters
Amjith Singh
 
Hadoop
HadoopHadoop
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
Mike Percy
 
Qcon talk
Qcon talkQcon talk
Qcon talk
bcoverston
 

Similar to Cloud-Friendly Hadoop and Hive - StampedeCon 2013 (20)

Top 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloudTop 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloud
 
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPC
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interaction
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Real-time searching of big data with Solr and Hadoop
Real-time searching of big data with Solr and HadoopReal-time searching of big data with Solr and Hadoop
Real-time searching of big data with Solr and Hadoop
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute Beginner
 
Atom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at SupremindAtom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at Supremind
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkInfinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - LondonInfinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
 
Scheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersScheduling scheme for hadoop clusters
Scheduling scheme for hadoop clusters
 
Hadoop
HadoopHadoop
Hadoop
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 
Qcon talk
Qcon talkQcon talk
Qcon talk
 

More from StampedeCon

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
StampedeCon
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
StampedeCon
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
StampedeCon
 

More from StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 

Recently uploaded

System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 

Recently uploaded (20)

System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 

Cloud-Friendly Hadoop and Hive - StampedeCon 2013

  • 1. CLOUD FRIENDLY HADOOP/HIVE Shrikanth Shankar | Qubole VP of Engineering Thursday, July 25, 13
  • 2. INTRODUCTION • Hadoop has revolutionized big data processing • Becoming the de-facto platform for new data projects • Started as file system (HDFS) + Programming framework (Map-Reduce).An ecosystem of projects has sprung up on top of Hadoop • Hive, Pig, Cascading etc. - Simple ways of processing data • Sqoop, Flume etc. - Data movement into and out of HDFS • Oozie,Azkaban etc. - Workflow scheduling • However, these systems were all designed with an on-premise architecture in mind. • The cloud is different enough - Some things can/should change. Thursday, July 25, 13
  • 3. DN/TT DN/TT ON-PREMISE HADOOP ARCHITECTURE Hadoop Cluster Namenode JobTracker DN/TTDN/TTDN/TT ...... IT control Relational systems (Hive metastore etc.) End User End User ...... End User Thursday, July 25, 13
  • 4. HADOOP ON-PREMISE • Usually deployed on bare-metal nodes* • HDFS is store of choice (3-way replication for safety). Locality of data access is a big design point • Clusters are mostly static - new machines are added on IT schedule* • Static clusters means users can focus on their tasks (MR jobs, Hive queries) and not on cluster management • IT bears the burden of managing clusters Thursday, July 25, 13
  • 5. HADOOP ON-PREMISE • Partitioning of resources • Static partitioning with different clusters for Batch and Interactive workloads • Within a cluster load balancing is done by the JT scheduler • Capex costs are significant • IT controlled - requires an Ops team (Hadoop ops, Sysadmin etc.) Thursday, July 25, 13
  • 6. CLOUD ARCHITECTURE HIGHLY AWS CENTRIC - BUT EVERYONE IS FOLLOWING FAST Thursday, July 25, 13
  • 7. CLOUD COMPONENTS Object Stores Ephemeral compute nodes Block Stores PaaS Offerings (RDS, etc.) Thursday, July 25, 13
  • 8. INFRASTRUCTURE CHARACTERISTICS • Running in aVM • Not that big a deal usually - except plan for performance variability • No locality information • Nodes are ephemeral - if you lose a node you will lose data on the node • AZ-wide correlated failures are to be expected. Region wide are possible (but rare) • High capacity Object stores with high cross sectional bandwidth • High latency, Variability in perf, REMOTE*. Not POSIX compliant • Persistent block stores • REMOTE,Variable perf, Thursday, July 25, 13
  • 9. INFRASTRUCTURE CHARACTERISTICS • ELASTIC • Add a 100 nodes on demand in a few minutes • Costs are Op-ex (largely). • Nodes are per hour (CPU + Disk), Storage is per GB • Cost management is a key challenge • Some interesting payment choices (On-demand, Spot, Reserved) Thursday, July 25, 13
  • 11. STORAGE • From a cost perspective using HDFS for long term storage means you pay for both CPU and disk. • Its also more expensive to make HDFS reliable (cross AZ, maybe even cross Region?) • Using an object store allows you to pay only for storage • With object stores you see latency issues since data is remote Thursday, July 25, 13
  • 12. STORAGE • But node storage is still needed when jobs and queries are active • For intermediate job results (not all results should go back to S3 - e.g. stage outputs in Hive) • For intermediate data (mapper output) • Makes scaling nodes challenging • Also since performance is better - may want to move remote data to HDFS before accessing Thursday, July 25, 13
  • 13. COMPUTE AND CLUSTERS • If you dont need Hadoop for persistent storage - when do you need a cluster? • Bring them up on demand - maybe for every job? • But that can be expensive - no multiplexing • Ideally you want to share Hadoop clusters as much as possible. Shut down cluster when not being used Thursday, July 25, 13
  • 14. COMPUTE AND CLUSTERS • If cluster is dynamic and you need sharing - how do you do ‘discover’ it? • How about cluster sizing? • Static is a left over from on-premise • Be dynamic on the cloud. Hard for end users to do manually Thursday, July 25, 13
  • 15. COMPUTE AND CLUSTER • Adding nodes needs to be done based on load • E.g. Most of the time jobs need < 5 nodes. A batch job comes in needs 100 nodes. We should expand the cluster (for as long as needed) • Removing nodes is trickier • If we lose intermediate results lots of work will be lost. • Job1 uses 100 nodes, produces data spread over all of them. Job 2 consumes results but only needs 10 nodes. How do you give up 90 nodes? Thursday, July 25, 13
  • 16. COMPUTE AND CLUSTER • Pricing choices are interesting • For e.g. spot nodes average half the price of an on-demand node • But if price spikes you lose all the spot nodes at once • Hadoop fault tolerance can retry failed jobs (but expensive) - what about data loss when you lose all the spot nodes? Thursday, July 25, 13
  • 17. END USER EXPERIENCE • The cloud isnt just about cost - its also about agility.To allow this we need to focus on the end user experience • End users would prefer to focus on higher level API’s • e.g. Run a Hadoop job or a Hive query - specifics of clusters should be hidden from them • Some things should be persistent (log files, results, ...) • They get this for free on premise Thursday, July 25, 13
  • 18. BETTER END STATE • IT/dev ops/users should set high level controls • Usage governance (max cluster size, max bill, cpu hours used per month etc.) • End users should focus at the level they understand • Smart software should bridge the gap Thursday, July 25, 13