Architecting Business-Critical Hadoop Application for AutoSupport Data

Architecting a
Business-Critical
Application in Hadoop
Stephen Daniel
Technical Director
Marty Mayer
Sr. Manager, AutoSupport

Agenda

 NetApp: Drowning in Data
 Technology Assessment
 Business Drivers to Choose E-Series
 Solution Architecture
 Performance Benchmarks
 Best Practices
 Questions

2

The AutoSupport Family
The foundation of NetApp Support strategies

 Catch issues before they become critical
 Secure automated “call-home” service
 System monitoring and nonintrusive
alerting
 RMA requests without customer action
 Enables faster incident management

“My AutoSupport Upgrade Advisor tool does all the hard work for
me, saving me 4 to 5 hours of work per storage system and
providing an upgrade plan that’s complete and easy to follow.”

3

AutoSupport Capabilities
Customer Install Base NetApp and Partner Usage

Auto Replacement Parts (Reactive) Auto Case
Creation
(Reactive)

Customer Assess &
Optimize
Environments (Proactive)
AutoSupport

Messages (HTTPS)
AutoSupport
Database
NetApp Storage
System
Risk Detection
& Automation Engine
Customer
Messages
Sizing and
(Email) modeling
(Proactive)

Storage My AutoSupport – Customer
Administrator Portal (Proactive and Predictive)

4

Business Challenges

Gateways ETL Data Warehouse Reporting
• 600K ASUPs • Data needs to • Only 5% of data goes into the • Numerous mining
every week be parsed and data warehouse, rest requests are not satisfied
• 40% coming over loaded in 15 unstructured, yet it’s growing currently
the weekend mins 6-8TB per month • Huge untapped potential
• .5% growth week • Oracle DBMS struggling to of valuable information for
over week scale, maintenance and lead generation,
backups challenging supportability, and BI
• No easy way to access this
unstructured content

Finally, the incoming load doubles every 16 months!

5

Incoming AutoSupport Volumes
and TB Consumption
Flat-File Storage Requirement
3500
3000
Total Usage (tb)
2500
2000 Projected Total Usage (tb)
1500 Doubles
1000
500
0
Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13 Jan-14 Jan-15 Jan-16

 At projected current rate of growth, As of June 2011:
total storage requirement will ~ 600,000 events archived each week
double every 16 months ~ 3 TB Disk space used each week
 Cost Model: Events growing at 40% year over year
> $15M per year Ecosystem costs Disk use growing faster
Expanding products & features

7

Big Data is Expensive

Growth Rates (CAGR)

– Data: +68%

– Cost/byte: -30%

– Net cost: +30%

4
Budget is flat

8

Problem Summary

1. Data Growing at 68% CAGR
2. Current implementation will not survive
much longer
– We will fail to meet SLAs on ingest of new
data
– To meet business critical SLAs we will limit
the scope of the data warehouse
3. Many new opportunities / requirements

9

New Functionality Needed

Weeks
Product
Analysis
Service
Cross Sell & Performance
Up Sell Planning
Customer
Intelligence Sales
License
Management Proactive
Support
Customer Product
Self Service Development
Seconds
Gigabytes Petabytes

10

Predictive Analytics Examples
 Proactive Support
– Predict failure probabilities
– Text events, performance changes, lifetime
usage
 Product Analysis
– Feature usage
– Per segment variations
 Capacity Planning
– Growth trends
– Seasonality factors
 Up-sell, cross-sell models

11

Requirements used for POC & RFP

 Cost Effective
 Highly Scalable
 Adaptive
 New Analytical Capabilities

13

POC Tests
Log Data: Report analysis for an event across all install-
base (25% of the install base and 2 months of data
used for benchmarks)
– 6 months to 1 year.
– I/O bound
 Counter Manager : Analysis restricted generally to 1
system or 1 cluster data for a single month (2 days
25% install base used for benchmark)
– Trending across install-base are generally rare and
ad-hoc.
– More CPU bound (some tools will query large
numbers of counters)

14

Prime Hadoop Use Cases in POC
Workload
Use Case Current Capabilities How Hadoop can help?
Type

Logs (EMS) I/O • One month of data is worth • POC shows a 10 node
Find bound 24 B records cluster could process
occurrence • Out of this some 100 M one month of data
of a pattern records are loaded per within 20 minutes
across all log month in DW. Takes 4 days
files in last to load a week
6 months • No ad-hoc capability exists
to mine the pending records

17

Prime Hadoop Use Cases in POC
Workload
Use Case Current Capabilities How Hadoop can help?
Type

Logs (EMS) I/O • One month of data is worth • POC shows a 10 node
Find bound 24 B records cluster could process
occurrence • Out of this some 100 M one month of data
of a pattern records are loaded per within 20 minutes
across all log month in DW. Takes 4 days
files in last to load a week
6 months • No ad-hoc capability exists
to mine the pending records

CM CPU • Up to 10 M records in • Achieved throughput of
Find hot disks bound single CM file 3M records per
by disk types, • 200 B records in a month second during POC
sys model etc. • No capability exists today • 100 node cluster is
in backend infrastructure projected to process
to process these one month of data
in 1.8 hours

18

ASUP.Next Hadoop Architecture

HDFS Lookup
F
L Ingest Logs, R
Ingest Asup
Ingest U E
Performance Config Tools
M and raw config S
Data
E T

Pig

Subscribe
Analyze

Metrics, Analytics, E
BI

20

NetApp Open Solution for Hadoop
Easy to Deploy, Manage, Scale
Performance; Resilience; Density
 Performance
 Bandwidth for streaming
 IOPs for metadata
 Reduced cluster network congestion
 Capacity and density
 4 servers and 120TB fit in 8U
 Fully serviceable storage system
 Reliability
 Hardware RAID and hot swap prevent job
restart in case of media failure
 Reliable metadata (Name Node)
 Enterprise-class fit and finish

Enterprise Class Hadoop

21

NetApp Open Solution for Hadoop
Easy to Deploy, Manage, Scale
Performance; Resilience; Density
 Performance
 Bandwidth for streaming
 IOPs for metadata
 Reduced cluster network congestion
 Capacity and density
 4 servers and 120TB fit in 8U
 Fully serviceable storage system
 Reliability
 Hardware RAID and hot swap prevent job
restart in case of media failure
 Reliable metadata (Name Node)
 Enterprise-class fit and finish

Enterprise Class Hadoop

22

NetApp Storage Solution Architecture

 Key Attributes:
– Storage is protected by in-box RAID
 Shared spare pool defers replacement of drives
 Rebuild does not consume network bandwidth
– Storage is striped
 Maximize performance by minimizing unequal
storage utilization
– Reliable storage: HDFS replication count 2
 Fewer disks
 Less space, power, cooling, cost, …

23


 Primary Questions:
– Performance?
– Cost?

24


RESULTS ARE
PRELIMINARY
 Performance
Concerns
– Initial testing has
focused on using
TestDFSIO

25


RESULTS ARE
PRELIMINARY
 Performance
Concerns
– Initial testing has
focused on using
TestDFSIO

 Per-Disk:
– 14 disks/server in array
– 6 disks/server direct-
attach

26


 Minimizing TCO
– Disk rebuild
 Handled in the controller
 Minimal impact to performance
 No network bandwidth consumed
– Server uptime
 Very high
– Hardware maintenance
 Swap out dead disks as routine, not exception
 Swap out of stateless servers is painless

27

Take Aways

 NetApp Assessed multiple traditional DB
technologies to solve it’s Big Data problem and
determined Hadoop was the best fit

 Moved from direct attach disks to array-based
storage to improve TCO

 The overall architecture supports scale out
growth

30

© 2011 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without prior written consent of
NetApp, Inc. Specifications are subject to change without notice. NetApp, the NetApp logo, Go
further, faster, AutoSupport, Data ONTAP, NOW, and Snapshot are trademarks or registered trademarks of NetApp, Inc. in
the United States and/or other countries. Symantec is a registered trademark of Symantec Corporation. All other brands or
products are trademarks or registered trademarks of their respective holders and should be treated as such.

Architecting Business-Critical Hadoop Application for AutoSupport Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Architecting Business-Critical Hadoop Application for AutoSupport Data

Similar to Architecting Business-Critical Hadoop Application for AutoSupport Data (20)

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Recently uploaded

Recently uploaded (20)

Architecting Business-Critical Hadoop Application for AutoSupport Data

Editor's Notes