1. Hadoop adoption has matured from initial small deployments to scaling up across enterprises, but configuring and managing large Hadoop environments can be difficult and expensive.
2. Hadoop as a Service (HaaS) provides an alternative where enterprises can deploy Hadoop in the cloud to avoid the challenges of managing large on-premise clusters.
3. HaaS allows enterprises to focus on data analysis rather than infrastructure while reducing costs and providing scalability, high availability, and self-configuration capabilities not easily achieved on-premise.
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
Why Hadoop as a Service?
1. 1
Hadoop as a Service
Kumar Ramamurthy
Vice President and Global Practice Head | Enterprise Information Management (EIM)
2. 2
Agenda
1 Hadoop and
adoption maturity 2 Challenges faced while
scaling up Hadoop
3 Characteristics of a
well-run Hadoop
environment
4 Hadoop on the
cloud wagon
5 Market overview
6 Business benefits
3. 3
Hadoop and adoption maturity
Organizations have bought into the benefits of Hadoop, and are now looking for deploying and
maintaining a “well-run Hadoop environment”
Taking the first
Hadoop step
Deploying small
Hadoop clusters
Hadoop on
multiple business
use cases
Scaling up Hadoop
to enterprise-wide
operations
4. 4
Configuration &
tuning ???
Infrastructure
management ???
Business pain points while scaling up on Hadoop
As enterprises seek effective and efficient ways to leverage Hadoop for direct and instant access to actionable business
insights, one of the questions frequently asked is “Can we run Hadoop in the cloud?”
Deploying, configuring and managing Hadoop data clusters can be difficult, expensive and time consuming
01010
10101
01010
1010
01010
10101
01010
1010
01010
10101
01010
1010
01010
10101
01010
1010
01010
10101
01010
1010
01010
10101
01010
1010
01010
10101
01010
1010
01010
10101
01010
1010
01010
10101
01010
1010
01010
10101
01010
1010
01010
10101
01010
1010
01010
10101
01010
1010
01010
10101
01010
1010
01010
10101
01010
1010
Hadoop
0101010101010101010101
01010101010101
0101010
0101
01010101010101
0101010
0101
Capital
expenditure ???
Provisioning &
availability ???
……
……
……
5. 5
Characteristics of a well-run Hadoop environment
Satisfy the needs of data scientists and data administrators
Provide run it yourself environment: Full support service for job monitoring and
tuning support
Store data at rest in always-on HDFS
Provide elasticity and non-stop operation
Provide self-configuration options
6. 6
Spectrum of Hadoop deployment options
On-
premise
full
custom
Hadoop
appliance
Hadoop
hosting
Hadoop
on the
cloud
Hadoop
as a
service
Bare metal Cloud
7. 7
Getting Hadoop on the cloud-wagon
Hadoop as a Service (Haas) is a cloud computing solution that makes medium and large-scale data
processing accessible, easy, fast and cost effective.
Data Sources Hadoop Tools Data Visualization and BI
8. 8
Key drivers to consider before deploying a Hadoop cluster in the cloud
Is your Hadoop cluster secure in the public cloud?
Does your Hadoop distribution support the operating system standards of your enterprise?
Is your business performance impacted while adopting any vendor specific platform/tools?
Are your analytical tools and platforms supported by the cloud platform?
Do you load data from your internal systems or from the cloud?
Analyze Security Criteria
Evaluate Hadoop Distributions
Understand Vendor Specific Dependencies
Consider the entire Hadoop Ecosystem
Analyze all your Data Sources
9. 9
Hadoop as a Service market landscape
HaaS market expected to
reach $ 16.1 Bn by 2020
Key deployment methods in demand:
Run it Yourself & Pure Play
North America highest revenue generating region
throughout 2020 with a value of $11.6 Bn
Source: Allied market research
Global HaaS market to grow at a CAGR of 84.81%
over the period 2014-2018
Global Hadoop as a Service
market by end user
Manufacturing Financial services Retail Telecom Healthcare Media &
Entertainment
10. 10
Leading providers of Hadoop as a Service
Altiscale Purpose-built, petabyte-scale infrastructure that delivers Apache Hadoop as a cloud
service
Amazon EMR Managed Hadoop framework to distribute and process vast amounts data across
dynamically scalable Amazon EC2 (Elastic Compute Cloud) instances
Google Google Cloud Storage connector for Hadoop to perform MapReduce jobs directly on
data in Google Cloud Storage
HP Cloud Elastic cloud computing and cloud storage platform to analyze and index large data
volumes in the hundreds of petabytes in size
IBM BigInsights Provides Hadoop-as-a-service on IBM’s SoftLayer global cloud infrastructure – a bare
metal design
Microsoft Scales to petabytes on demand; processes unstructured and semi-structured data;
deploys on Windows or Linux; integrates with on-premises Hadoop clusters (if
needed); and supports multiple development languages
Rackspace Offers several options for running Apache Hadoop including deploying Hadoop on
Rackspace managed dedicated servers; spinning up Hadoop on Rackspace’s public
cloud via virtual servers or on dedicated bare-metal cloud servers
Verizon Enterprise business inked a Cloudera partnership in 2013, and the IT services giant
now offers Cloudera atop its cloud infrastructure
11. 11
Enterprises realize multifold business benefits through HaaS
With HaaS, businesses can eliminate the operational challenges of running
Hadoop and focus on business growth
Paying only for the compute
without a large hardware
acquisition cost
Scaling up and down as business
needs change through elastic
Hadoop clusters
Deploying and launching Hadoop
environments in minutes
Focusing on building applications and
answering business questions rather
than complex Hadoop clusters
Storing, processing and analyzing
large volumes of both relational
and non-relational data
Creating integrated backup
and disaster recovery
Providing distributed, fault-
tolerant computing framework
and resource management
12. 12
In summary …why HaaS?
Reducing cost of innovation and focus on critical business areas
Providing instant access to hardware resources to scale up or scale
down as per business needs
Effectively optimizing and handling batch workloads
Managing variable resource requirements for different types of
machines and workloads
Running closer to the data
Simplifying Hadoop operations