Driving Business Benefits with Hadoop

Driving Business Value with Hadoop:
MapR Customer Experiences
Carl Olofson
Research Vice President

Agenda
© IDC Visit us at IDC.com and follow us on Twitter: @IDC 2
 The Promise and Challenge of
Hadoop
 Choosing a Distributor
 MapR’s Key Differentiators
 IDC’s MapR Business Value Study
 Key Takeaways
 Conclusions/Recommendations

The Promise and
Challenge of
Hadoop
Choosing a
Distributor
MapR’s Key
Differentiators
IDC’s MapR
Business Value
Study
Key
Takeaways
Conclusions /
Recommendations
Agenda

The Promise and Challenge of Hadoop
Why Hadoop?
 Can collect any amount of data of any kind
 Open source and cheap to deploy
 Serves a variety of purposes
Challenges
 Involves many Apache projects
 Coordination of software is complex
 Management of clusters requires special expertise

The Problem
of Hadoop
Sprawl
 Hadoop is usually implemented, initially, as
discrete, limited projects.
 As these projects proliferate, they consume
more and more resources.
 Eventually, they require centralized
management.
 As long as they remain discrete
configurations, management is complex, and
resources are excessive.
 Needed: a manageable Hadoop data platform
that supports many projects in a single
system with shared resources, embracing
multi-tenancy.

Choosing a
Distributor
 Choose a distributor whose software
and service delivery best meets your
needs.
 A distributor offers…
• Coordinated versions of related Apache
software delivered for immediate use
• Software combined in practical combinations
for various business purposes
• Expertise, guidance, and support are part of
the value proposition
 There are several leading distributors;
competition breeds excellence

MapR’s Key Differentiators
 Software Components
• MapR-FS
• MapR-DB
• MapR Streams
 MapR Converged Data
Platform
 Zeta Architecture

IDC’s MapR Business Value Study
IDC conducted research that
explores the value and benefits of
the Apache Hadoop for MapR
based on interviews with MapR
customers
The project included nine qualitative &
quantitative interviews with MapR customers.
Based on its analysis, IDC has created a model
that expresses the value and costs for these
organizations of using Hadoop with MapR.
These results will inform the study IDC is
developing for MapR.

Firmographics of Interviewed Organizations
9
Firmographics Average Median Range
Number of employees 10,410 500 8 to 65,000
Number of MapR users 762 150 0 to 3,500
Number of MapR applications 11 4 1 to 50
Number of PBs in MapR environment 3.90 1.10 0.02 to 20
Countries United States
Industries
Financial Services, Security, Professional Services, Cloud Services Provider,
Advertising (Big Data)
N=9 interviewed organizations
Apache Hadoop for MapR
© IDC Visit us at IDC.com and follow us on Twitter: @IDC

Executive Summary
10
Interviewed organizations reported that Hadoop with MapR provides the performance, efficiency, and scalability they need to use
data analytics in a cost- and resource-effective way to drive their businesses and operations.
Hadoop with MapR has enabled data scientists and application developers to do their jobs more effectively and productively by
providing more timely and superior analytical outputs. On average, interviewed organizations reported that data scientists and
analysts improved their productivity by 31% and application developers by 39% with MapR.
Interviewed organizations reported leveraging Hadoop with MapR to drive their businesses. All interviewed organizations indicated
earning more revenue, and six attributed specific amount of revenue – an average of $26.7 million per year – by better serving
their customers with improved analytical capabilities or improving their analytics-based products and services.
MapR provides its customers with a cost-effective and efficient Big Data platform. On average, interviewed organizations put the
infrastructure and IT staff time cost of deploying and running MapR at 42% lower than if they had tried to do it by themselves.
IDC calculates that business benefits achieved by interviewed organizations are worth a discounted average of $19.4 million over
three years, which results in a return on investment (ROI) of 382% and a payback period of eight months.

Quotes
11
“We tested it and MapR was faster by 15% to 20%. The other reason that we
went with MapR was security – MapR allows us to do really granular
security, and that’s just not possible with some of the other offerings.”
Why MapR?
“We needed a specific functionality - the ability to process TBs of data efficiently and cost
effectively. The MAPR version of Hadoop is faster and more reliable and has a great
support infrastructure around it. . . The most significant benefits for us of MapR have
been 4 nines in uptime and the ability to provide insight to our customers through
the ability to get meaningful data from the data that we’re collecting."
“We get dramatically better utilization out of our systems with MapR. Our
testing showed a three to five times performance increase when we ran on
MapR compared to other Hadoop, and compared to before it’s insane, it’s
probably like probably 10 to 20 times faster.”

Quotes
12
“We support over 150 data analysts, about 30 of those are what most people call data scientists.
The data scientists have absolutely saved time with MapR - productivity wise for them because
they have faster access to the data now and they can direct probably 50% more time to
other activities, plus they’re avoiding hiring. On top of the 50% saved, we are probably also
avoiding 10 hires."
“MapR has given our data scientists the ability to do more stuff. They can do
their work much faster – for example, if they had to develop a risk model, and
compare the eight hours it used to take compared with now running in an hour,
it makes them more productive – about 20% on average."
“Internal business processes have become more efficient with MapR. We have reporting that’s
available every 15 minutes to 30 minutes depending on what report you’re looking at. And before there
was not an efficient reconciliation process. The reconciliation process, if it occurred, would take 1 to 2
months to figure out. And the reconciliation process now takes a day. . . I would say there are
five people over saving 24 days collectively out of a month.”
Data-Related Staff Benefits of MapR

Quotes
13
“MapR absolutely has impacted our revenue. Our product is a superior product
because of MapR Hadoop, and that means more customers and more revenue. . .
We’ve added hundreds of customers and millions of dollars of revenue. I attribute
that to having Hadoop with MapR being the core of our platform."
“The quality of our outputs has increased dramatically with MapR – for example,
we're able to process everything and don't have to do sampling. So we can provide our
customers proper and accurate data. We can also do more test iterations to provide a
better quality product, and that's helped us get more customers. . . MapR has supported
business growth, dramatic growth of a product campaign we're running.”
“In our case, the choice of MapR was pretty simple. The alternative was that we just wouldn't do
it with anything else, because we couldn't. It actually wouldn't have worked – we couldn't scale up
enough to use a different solution. We wouldn't be in business.”
Business Benefits of MapR

Quotes
14
“We haven't had any downtime for a couple of years with MapR. Before, we were experiencing
downtime about quarterly, and these incidents had an impact on revenue because it affects client
retention, especially in the security space. When a client finds out that someone hacked into your
network, it makes for a very unhappy client. We lost clients probably 20-30% of the time
when we had downtime.”
“It we had built this out ourselves using relational databases, it would have
been more expensive than MapR because of phenomenal storage costs.
And then we'd have to pay for licenses –overall initial costs would have been
a number of times more expensive than MapR.”
“Multi-tenancy with MapR is helping us save money, and we can tell a
customer now about our multi-tenant capabilities and they will then not request
a dedicated environment. And that saves us money. We don’t have to get new
MapR servers, new MapR licenses just for their dedicated environment.
We are able to leverage the existing cluster.”
Reliability and Cost Benefits of MapR

$2.98M
$3.26M
$2.02M
$51,000
0
1
2
3
4
Business Productivity
Benefits
Increased IT Staff Productivity
Benefits
Risk Mitigation - User
Productivity Benefits
Million$Annual Average Benefits per Organization
15
Total:
$8.31 Million

$74,500
$81,500
$50,600
$1,300
$0
$20,000
$40,000
$60,000
$80,000
$100,000
Business Productivity
Benefits
Increased Revenue IT Staff Productivity
Benefits
Risk Mitigation - User
Productivity Benefits
Annual Average Benefits per 100 TBs
16
Total:
$207,900

$2.91M
$0.26M $0.59M $0.59M
$0
$4.28M
$9.54M $9.69M
-5
0
5
10
15
20
25
Initial Year 1 Year 2 Year 3
Million$
Investment Benefits Cumulative Net Benefit
$20.6 Million
Cost Benefit Analysis
17

Three-Year ROI Analysis Average per Organization Average per 100 TBs
Benefit (discounted) $19.4 Million $486,100
Investment (discounted) $4.0 Million $100,800
Net Present Value $15.4 Million $385,300
ROI (NPV/Investment) 382% 382%
Payback (Months) 8.2 8.2
Discount Factor 12% 12%
Firmographics of Interviewed Organizations
18

Key
Takeaways
 Users found MapR’s distro to be much
easier than their prior configuration of
Apache Hadoop, resulting in dramatic
savings in staff time setting up, configuring,
and managing the Hadoop clusters.
 MapR’s software for optimizing file i/o,
database management, and cluster
management delivered significant
performance improvements.
 The rational way that the MapR Hadoop
system is configured, as a converged
platform, led to more efficient business
processes and better business outcomes.

Conclusions/Recommendations
Recommendations
 There are multiple major Hadoop distributions,
each with its own merits.
 This study examined only one of these: MapR.
 The clear outcome of this study is demonstrable
benefits to customers derived from the use of
MapR.
 Business application of Hadoop requires the use
of a managed distribution of Hadoop.
 Users have several leading options in this
regard.
 MapR deserves to be on the list of those
receiving serious consideration.
Conclusions

© 2016 MapR Technologies 21© 2016 MapR Technologies
Driving Better Business Benefits with Hadoop

© 2016 MapR Technologies 22
What Contributed to the ROI with MapR
Higher productivity derived from:
• Higher efficiency
– Greater scalability
– Higher performance
– Better utilization
• Greater reliability
• Multi-tenancy
• Real-time data access

Life Without a Converged Platform
Streaming
Real-time
Open Source
Analytics
(Hadoop, Spark)
Operational Cluster
(HBase, Other
NoSQL)
Streaming Cluster
(TIBCO, IBM, Kafka)
Batch Loads
Sources Apps
Enterprise
Storage
(system of record)

A Modern Big Data Architecture
MapR-DB: relational,
time series,
structured data
MapR-FS: emails,
blogs, tweets, log
files, unstructured
data
MapR Streams: event
data, change data,
IoT data
Agile, self-
service data
exploration
ETL into operational
reporting formats (e.g.,
Parquet)
Multi-tenancy:
job/data placement
control, volumes
Access controls:
file, table, column,
column family, doc,
sub-doc levels
Sources
RELATIONAL,
SAAS,
MAINFRAME
DOCUMENTS,
EMAILS
LOG FILES,
CLICKSTREAMS
ENSORS
BLOGS,
TWEETS,
LINK DATA
DATA
WAREHOUSES,
DATA MARTS
Auditing:
compliance, analyze
user accesses
Snapshots:
track data lineage
and history
Table Replication:
global multi-master,
business continuity
MapR Converged Data Platform
Enterprise Storage Database Event Streaming
MapR-FS MapR-DB MapR Streams

Higher Efficiency

DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
MapR No-NameNode Architecture for Scale
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
NameNode
A B C D E FAAA BBBB CCC DDD EEE FFF
No special configuration
Metadata is persisted to disk
Trillions of files (> 5000x advantage)

HDFS-Based Scale Issues
Limited to 50 – 200 million files in
single NameNode (in 128 MB chunks)
Federation allows more files, but adds
new single points of failure
Federation plus Standby NameNodes
lead to complex configuration
5 to 20 NameNodes required for 1
billion files

Ch
un
k
Ch
un
k
Ch
un
k
Ch
un
k
Blocks are inside chunks,
Chunks are inside containers
Containers - 32 GB
High scale to trillions of files
Default size self adjusts
Chunks - 256 MB
File sharding for parallelism
Default size can adjust by directory
Blocks - 8 KB
Raw device I/O for random
reads/writes
Small size advantage for
snapshot and mirroring
deltas
The Architectural Key to MapR Scale and Speed

Disk I/O Throughput Performance (by Samsung)
Samsung Flash Memory Summit 2015 Keynote: https://www.youtube.com/watch?v=fOT63zR7PvU#t=26m45s
I/O throughput using
upcoming performance
enhancements on high
performance flash

Optimized Resource Consumption
Linux File System
(general purpose, slower than MapR-FS,
leaves HA up to other engines)
Storage Hardware
HDFS (append-only)
Java Virtual Machine
HBase
(excessive writes)
Java Virtual Machine
Storage Hardware
MapR-FS + MapR-DB + MapR Streams
Every layer contends for
more CPU and memory
Efficient architecture frees up
resources; shared HA, DR,
and I/O systems
Java Virtual
Machine
Kafka
(separate cluster)
X
X
Replace with speed,
connectivity, HA/DR
Replace with less I/O
and RAM consumption
Eliminate layer
Eliminate layer
Replace with
full read-write
Fast, efficient, direct I/O

Data/Job Placement Control for Resource Management
…
Single MapR Cluster
Operational workload
on largest servers
with SSDs
Production analytics
on standard servers
Archived data on
lower power, high
disk density nodes
Example topology for (optionally) dedicating
specific nodes to specific workloads
Self-service data
exploration on
standard servers

Containerized Enterprise Using Docker, Mesos & Myriad
Mesos
YARN
Spark Hive MapRed
• Unified, shared platform for enterprise apps & data processing
• Unified application ecosystem
• Shared, persistent, high performance storage for all apps
• Multi-tenant, with choice of –
• YARN + Services per tenant
• Single, shared YARN for all tenants
Myriad
YARN
Spark Hive MapRed
Tenant
#1
Tenant
#2
Tenant
#1

Greater Reliability

No NameNode architecture
MapReduce/YARN HA
NFS HA
Instant recovery
Rolling upgrades
HA is built in
• Easy HA at massive scale
• Jobs are not impacted by failures
• Fast, resilient NFS access
• Replicas available within seconds of a node failure
• Upgrade the software with no downtime
• No special configuration to enable HA
High Availability (HA) Everywhere

MapR Mirroring for Disaster Recovery
• Flexible
– Choose the volumes/directories to
mirror
– Scheduled/incremental to set low RPO
– Promotable mirrors to set low RTO
• Fast
– No performance impact
– Block-level (8KB) deltas
• Safe
– Point-in-time consistency, checksums
• Easy
– Takes less than two minutes to
configure!
Production
WAN
Production Research
Datacenter 1 Datacenter 2
WAN EC2

MapR-DB Table Replication for Disaster Recovery
Multi-master (aka, active/active) replication
Active Read/Write
End Users
• Reduced risk of data loss
• Application failover
• Faster data access

Snapshots for Online Consistent Backups
• Point-in-time recovery
• Consistent
• Efficient
• Fast
*

Multi-Tenancy

Multi-Tenancy
Efficient single cluster with:
• Isolation
• Quotas
• Security and delegation
• Reporting

Real-Time Data Access

MapR Platform Services: Open API Architecture
Assures Interoperability, Avoids Lock-in
HDFS
API
POSIX
NFS
SQL,
HBase
API
JSON
API
Kafka
API

MapR NFS: Unique Advantage:
Direct Integration with Enterprise
Real-time
applications
NFS for
file-based
applications
Hadoop APIs
for Hadoop
applications ODBC &
JDBC for
SQL-based
applications
Mission
critical and
SLA
dependent
applications

© 2016 MapR Technologies 45© 2016 MapR Technologies 45
ThankYou!
@mapr
dalekim@mapr.com
Engage with us!
maprtech
mapr-technologies
https://www.mapr.com/get-started-with-mapr
https://www.mapr.com/training
https://www.mapr.com/ebooks/big-data-all-stars/

Q & A

Driving Business Benefits with Hadoop

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Driving Business Benefits with Hadoop

Similar to Driving Business Benefits with Hadoop (20)

More from MapR Technologies

More from MapR Technologies (20)

Recently uploaded

Recently uploaded (20)

Driving Business Benefits with Hadoop