Every second of every day you hear about Electronic systems creating ever increasing quantities of data. Systems in markets such as finance, media, healthcare, government and scientific research feature strongly in the Big Data processing conversation. While extracting business value from Big Data is forecast to bring customer and competitive advantage and benefits. In this session hear Vas Kapsalis, NetApp Big Data Business Development Manager, discuss his views and experience on the wider world of Big Data.
3. Convergence of Technology Disrupters
Create Opportunity
Cloud
Mobile
Big Data
Social
Internet of
Things
NetApp Confidential - Internal Use Only
4. Unstructured Data Growth Dominates
Revenue Share by Segment
Traditional structured
Traditional unstructured
Traditional replicated
Content depots / public cloud
Traditional Structured and
Replicated Data mix shift is
driven by:
− Efficiency (Dedup,
Compr, Thin Prov, SATA)
− Growth in new category
of storage consumers
using cloud / content
depots
Unstructured Data (files
and objects) in traditional
storage + Content depots /
Cloud) will be the largest
storage category by 2014
− Content depots / Cloud
expected to be 95%
unstructured data
5. Not Even to The “Peak”
VISIBILITY
Peak of Inflated Expectations
Plateau of Productivity
Slope of Enlightenment
Trough of Disillusionment
Technology Trigger
TIME
40 Zettabytes
5 Billion
Estimated size of the
digital universe in 2020
Smart phones
30 Billion
80%
Pieces of new content to
Facebook per month
Unstructured
data
5
6. Big Data Is All Data From Everywhere
Fundamentally changes your business
Transactional Data
The Jet way
Machine Data
Social Data
Enterprise Content
The Call Center
7. Big Data Vendor Landscape
A Lot of Hype and Buzz – Everyone is Jumping In
Funding for Hadoop and NoSQL
451 Research
400
350
Cloudera series D
10gen series D
MapR series B
DataStax series B
Neo Technology series A
Opera Solutions series A
Platfora series A
Couchbase series C
300
250
200
150
100
Cloudera series C
Cloudera series B
MapR series A
50
0
Jan-08
Nov-11
Market is expected to grow from $3.2 billion
in 2010 to $16.9 billion in 2015
NoSQL $2Bn PA by 2015
Most firms are taking a pragmatic approach
Big data is in the very early stages of maturity
"The Big Data market is expanding rapidly …
For technology buyers, opportunities exist to
use Big Data technology to improve
operational efficiency and to drive innovation.
Use cases are already present across
industries and geographic regions."
Dan Vesset, Vice President, IDC
Best practices are not mature
IDC Big Data Survey
7
8. Data Growth Impact on Business
Complexity
“Big Data” refers to datasets whose size is
beyond the ability of typical tools to capture,
store, manage and analyze
Speed
Volume
Business Velocity
Information Becomes
a Propellant to Business
Inflection
Point
2010
Data Becomes a
Burden to IT Infrastructure
2020
8
9. Why Should You Care?
It’s the Value of Your Data
Top line revenue
– Leverage their data
assets into business
advantage
5 Billion Records
Anywhere, Anytime
Faster time to market
50% Increase in Revenue
Over 1PB of data
Growth of 175% YOY
90 days of data within
24 hours of a failure
Bottom Line savings
– Lower the cost of
compliance
– Manage ever growing
data efficiently
9
11. Why NetApp?
Practical solutions that solve today’s problems
Get
Control
Break
Through
Gain
Insight
NetApp helps you turn your
exploding data from threat to
opportunity. Manage your data
effectively and affordably.
Break through the limits. With
NetApp, you can take on even the
most massive and complex data
projects.
Turn insight to action. NetApp helps
you get to clarity and insight faster
and more reliably.
11
12. Experience Managing Data at Scale
NetApp’s Largest Customer
100 PB
4 Customers
50 PB
10 Customers
20 PB
50 Customers
10 PB
100 Customers
12
13. NetApp Big Data Strategy
Open
Best-of-Breed
Choice
Best of breed storage for Big
Data Applications
Create deep integration and
value add
Build on open standards with
best-in-class partnerships
Validate with Ecosystem
Leaders
– Complete server, network and
storage “Racks”
– Delivered via trusted high-value
partners
13
14. Industry-Leading Storage Innovation
Corporate
Data Centers
Cloud
Data Centers
Flash Arrays
for ultra-high performance
E-Series
Clustered Data ONTAP
for Shared Infrastructure
for price-performance at scale
StorageGRID
for web scale object storage
14
15. Big Data Building Blocks
Applications
Big Bandwidth
Big Analytics
Ingest, Process, Stream
Reduce, Analyze, Report
Retain, Distribute
Retain, Distribute
Extract
Big Content
Retain forever, multi-site distribution
Store
Retrieve
Cloud
Private/Public
15
17. Analytics Oriented Business Processing
Business Applications
Query-based
Retrieval
Commit
Transaction Processing
Transaction granular data
resilience, recoverability &
protection at line speeds
Memory Ingest
Disk/Flash Tier
Performance
optimized query
service
Realtime Analytics
Federated Database Store
(Build/Buy/Partner)
Persisted
Commit
Data organization
optimized by query
interface
RDBMS
Columnar DB
Document Store
K-V Store
General Purpose DB
Data organized to
align with schemas
Fixed consistency
model
Complex queries
supported
Volume based data
management
Analytics Oriented
Data organized in
column files
Tabular interface
without rigid schemas
Fast column scans
Multiple consistency
models
Transaction granular
data management
Transaction Oriented
Data organized in
data structures in
memory
Schemaless
transaction store for
structured data
High transactional
performance
Metadata Service
Oriented
Data organized in key
value pairs
Suitable for metadata
services with CMS’
Associated with
object services
18. Analytics Technologies to look out for!
Old World
New World
Graph
DBs
(Niche)
Key-Value
Stores
(Content/Object
Service)
Row-oriented
RDBMS’
Document
Stores
(Transaction
Oriented)
Columnar
DBs
(Analytics
Oriented)
Datacenter Multi - Datacenter
Relational DBs
• ACID constrained
• Complete query set
• Limited availability
• High consistency
• Rich query set
• Good availability
• Tuneable consistency
• Limited query set
• Highest/WAN availability
19. Analytics & Enterprise Apps Environment
Reporting/Dashboard/Visualization
Applications
OLAP
Analytics
ETL
Data Management
ETL
OLAP
OLTP
Storage File Systems
Mobile Devices
Location/GPS
Logs
Sensors
Applications
Other
Data
Sources
Content
Repositories
Shared Storage
Infrastructure
Storage
Data
Management
NFS/sNFS/pNFS
Storage
(All other storage, i.e. internal DAS)
NetApp Confidential – Limited Use
19
20. Some problems require an Enterprise Class
Hadoop solution
Enterprise Class Hadoop
Enterprise Class Hadoop
Packaged ready-to-deploy modular compute
intensive Hadoop cluster
Compute Power
Compute intensive applications
Video, imaging analysis
Extremely tight Service Level expectations
Severe financial consequences if the
data analytic application or service is
run late
Commodity, Off the Shelf Hadoop
Values associated with early adopters of
Hadoop
Social Media Space
Contributors to Apache
Strong bias to JBOD
Skeptical of ALL vendors
Packaged ready-to-deploy modular Hadoop
cluster
The data has intrinsic value $$$
Capacity and compute requirements
expanding very fast
Higher storage performance
Real human consequences if the system
fails (Threats, treatments, financial losses)
System has to allow for asymmetric growth
Enterprise Class Hadoop
Packaged ready-to-deploy modular storage
intensive Hadoop cluster
Storage intensive applications
Additional CPUs does not help run time
Financial ticker data analysis
Extremely tight Service Level expectations
Need deeper storage per datanode
Storage Capacity
NetApp Confidential – Limited Use
20
21. NetApp Open Solution for Hadoop
Easy to Deploy, Manage and Scale
Uses High Performance storage
HDFS
NameNode
FAS2040
Secondary
NameNode
– Resilient and Compact
– RAID Protection of Data
– Less Network Congestion
Raw Capacity and density
Map
Reduce
JobTracker
DataNodes /
TaskTracker
:
– 120TB or 180TB in 4U
– Fully serviceable storage system
4 separate shared
nothing partitions
E2660
DataNodes /
TaskTracker
Reliability
– Hardware RAID & hot swap prevent
job restart due to node go off-line in
case of media failure
– Reliable metadata (Name Node)
Enterprise Class Hadoop
NetApp Confidential – Limited Use
21
22. NetApp Open Solution for Hadoop
Validated Benefits for the Enterprise
Improved cluster performance by 62%
Completed jobs 200% faster under
drive failure
Delivered linear performance scalability
as nodes, data grew
Per-server capacity increase of 1.5x
The NetApp Open Solution for Hadoop improves capacity
and performance efficiency and recoverability compared to
a server-based DAS deployment.
- ESG, 2012
23. Optimizing Performance and Stay Healthy
Source: Cisco: http://bit.ly/yL54Ts
Availability and
Resiliency
Burst Handling and
Queuing
Oversubscription
Ratio
Network Overhead
Data Node Network
Speed
Network
Latency
Useful Work
Source: Garrett, Brian and Lockner, Julie, “NetApp Open Solution for Hadoop”, ESG Report,
May 2012, http://bit.ly/LyYG0t
23
25. Case Study: ASUP NetApp Analytics
Data Mart
Extract
Transform
Load
Data
Warehouse
Data Mart
Gateways
ETL
Data Warehouse
• 800K ASUPs
every week
• 40% coming
over the
weekend
• Data needs
to be
parsed
and loaded
in 15
minutes
• Only 5% of data goes into
the data warehouse, rest
unstructured, yet it’s growing
7-10 TB per month
• No easy way to access this
unstructured content
Reporting
• Numerous mining
requests are not
satisfied currently
• Huge untapped
potential of
valuable insight
Finally, the incoming load doubles every 16 months!
NetApp Proprietary - Limited Use Only
25
26. Case Study: NetApp Large-Scale Analytics
CHALLENGE
NETAPP
SOLUTION
4 weeks to run a query
on
24 billion unstructured
records
Impossible to run a
query:
240 billion unstructured
records
BENEFITS
Time reduced from
4 weeks to 10.5
hours
10-node
Hadoop
Cluster
Previously
impossible, now
achievable in just 18
hours
NetApp Proprietary - Limited Use Only
26
27. Integrated Big Data Solutions and Expertise
Planning and implementation expertise for Big Data
Turn-key solution stacks and Big Data services
Big Data System Integrators Solutions Built on NetApp®
27
28. Next Steps - Team with the Experts
Strategic Assessment
– Business goals
– Data growth needs
– Use case discovery (partner
delivery)
Consult
– Solution architecture and design
(NetApp delivery)
Support options:
Global support available
from NetApp and partners
Deploy
– Installation and implementation
(NetApp delivery)
– Solution implementation (partner
delivery)
28