This document discusses the real costs of storing and analyzing big data. It summarizes that unstructured data is growing rapidly, with over 70% growth expected between 2013 and 2017. Hadoop has become a popular platform for big data analytics. The document examines the total cost of ownership for big data storage and finds that costs can be significantly reduced by using scale-out NAS solutions like EMC Isilon combined with RainStor's analytical archive software. Case studies show banks and financial institutions saving over 90% on storage costs and getting faster query performance using this approach.
6. Cost of Storing Big Data - TCO
6
Source: Winter Corp Report: Big Data – What Does it Really Cost? 2014
7. 7
Cost of Storing Big Data – 5 yrs
Source: Winter Corp Report: Big Data – What Does it Really Cost?
8. Traditional
(Row/ Columnar) Data
Warehouse
TB 10TB 200TB PB
Low Cost to Scale
QueryResponse
Hrs
Mins
Secs
Hadoop
Big Data – Cost to Scale vs. Performance
8
Big Data Volume (50TB - PB)
Fast Data Load & Massive Scale
Fast Query Across Large Scale
Flexible Deployment Options
??
12. EMC Isilon Scale-Out NAS Environment
Clients and Applications
RESTful API
GET PUT POST DELETE
Gig-e
10 Gig-e
Network
OneFS Operating
Environment
Multi-Protocol
Client/Application
Layer
Ethernet Layer
Protocols
SMBNFS
FTPHTTP
HDFS
for
Hadoop
REST
for
Object
Intra-cluster
Communication
12
13. EMC Isilon - Industry Recognition
Isilon Systems is a successful acquisition for EMC
IDC Marketscape names EMC Isilon a Leader
in Scale-Out File Storage Market
- Worldwide Scale-Out File-Based Storage, December 2012
- Critical Capabilities for Scale-Out File System Storage, January 2013
EMC Isilon “Outstanding” in Critical Capabilities
for Scale-Out File
- Vendor Rating – EMC, May 2014
13
15. 15
Solutions:
Analytical Archive | Compliance Archive
(DW Offload) (Tape Avoidance)
Teradata
Netezza
Oracle Ex
Sybase IQ
Data In
Store
Query
Govern
Data In
Store
Query
Govern
Comply
WORM
SEC 17a-4; Dodd Frank
Source App
EDW
DB
Tape
16. Analytical Archive: End-to-end
16
QUERY/
ANALYZE
SQL
BI Tools; Hive,
MapReduce
SCALE – EMC Isilon
COMPRESSLOAD/
VALIDATE
Billions
Records/Day
10-40X
(90%+)
AVAILABILITY
Replication
DW
Source
Move
RETAIN
/DISPOSE
Rules
Based
IN STORE QUERY GOVERN
SECURE - Enterprise-grade
17. Database Storage - Compression: Up to 40X
Source: Ratios vs. Raw – RainStor Benchmarks using customer data (2012-13)
3X
0
5
10
15
20
25
30
35
40
45
50
6X
40X
8X
Hadoop LZO Compressed
Relational
(e.g. Oracle)
Flatfile
Gzip
Columnar
(e.g. Vertica)
RainStor
7X
17
18. Simplicity and Ease of Use
Single volume and file system that spans nodes
– Directories and files striped across the cluster
Automation:
– NO manual intervention
– NO reconfiguration
– NO server or client mount point or
application changes
– NO data migrations
– NO RAID
EFFICIENCY
18
19. More scalable than traditional storage systems
Largest and Most Scalable File System
OneFS scales from 18 TB to 20 PB in a single file system,
single volume
Under 1 min to scale
with no downtime
20. Document Query
XQUERY
Query - Pick the Best Tool for the Job
20
BI AnalyticsAd-Hoc Query
Interactive
SQL-92
SQL 2013
BI TOOLS
DASHBOARD
Hadoop Tools
Hadoop on Scale-out NAS
MAPREDUCE
PIG, HIVE
21. Hadoop & Big Data
21
LOW VALUE DATA
Recommendation Engines
Data Sandboxing
Log Processing
Audits
Regulatory Reporting (Eg. SEC, SOX)
Lawful Intercept
Social Media
Logs
Clickstreams
Credit Card
Trade
Personal Information
HIGH VALUE DATA
SECURITY?
22. 22
Security Capabilities & Features
Secure Large Volumes of Data on Hadoop
Data Encryption
Data Masking
ViewsPrivacy
Kerberos Authentication
Authorization
LDAP / Active Directory
Linux PAM Support
Trust
Tamper-proofing
Audit Trail
Record-level Delete
Data Disposition
Integrity
23. RainStor-Isilon Architecture Overview
23
Apache Projects RainStor
Programming
Languages
Computation
Security
Database Storage
Object/Hardware
Storage
Vendor Specific
Top of Stack
Standard SQL
(with Oracle,
SQLServer, SybaseIQ
extensions)
Security and Compliance
(Encryption, Masking, Audit Trail, Data Disposition,
Kerberos, LDAP/Active Directory, Immutable)
RainStor Database
(up to 40X Data Compression)
HDFS
(Hadoop Distributed File System)
MapReduce – Batch
(Distributed Programming Framework)
Hive Pig Java
NAS, SAN, CAS, NFS
(On-premise, Cloud)
BI Tools, Dashboards
(ODBC/JDBC Connectivity)
Visualization Layer
EMC Isilon
24. RainStor: Hadoop 2.0 Distro Certifications
Cloudera CDH 5.0
– Certified April 2014
Hortonworks HDP 2.1
– April 2014
“We are delighted with the wide range of technology solution partners that have
certified on CDH 5 …it is testament to the maturity of the platform but also the overall
market demand,”
Tim Stevens, VP of Business & Corporate Development
26. SEC 17a-4(f) Compliance Archive Requirements
26
Records stored in non-erasable media (WORM)
Recording process must be verifiable
Fully Accessible to Authorities & Backed-up
Records should be Recognizable & Identifiable
Downloadable to any acceptable medium
28. 28
Challenges
Cost: Data volumes in disparate trading
applications growing at 70-100% / Year - Storage
costs rising @ 60% / Year
Compliance: Must provide high performance EBS
and other queries for SEC
Solution
A RainStor Archive for storing and reporting
against historical trade data
13 years of history loaded from Sybase IQ
Daily feed from trading application to RainStor
Runs on low-cost NAS Tier 3 storage and VMs
RainStor completely replaced Sybase IQ
90% cost savings - $5MM ROI
6 Projects live - 13 more in Progress
90%
Storage Cost
Reduction
“ It’s like shrink-wrapping your
data…forever!”
– VP, Technology
30X Data Compression
3X Faster Query Compared to Sybase
CONFIDENTIAL
Compliance Archiving: Global Investment Bank
Lower Compliant Data Retention Costs by a Factor of 10
BENEFITS
Enterprise Standard for Data Retention with Faster Analytics
29. Analytical Archiving : Large Multi-national Bank
Retain Trading Data, Stay Compliant at Lowest Cost
RainStor
Active Archive
Equities
BAR
400TB
FastForward™
29
FastConnect™
Trades
200TB
CONFIDENTIAL
EMC WORM
Storage
25X Compression
Meets Query SLAs
BENEFITS
Enterprise Standard for Compliance Driven Analysis
Runs on EMC Centera & Isilon (WORM)
Tape Avoidance
Challenges
Cost: Fast data growth and Costly EDW’s
(Teradata & Netezza) - offload history
Compliance: Must meet SEC compliance and
retain equities data for query - run on
approved WORM / CAS Storage (EMC)
Avoid data on offline tape - reinstate older
Teradata data (BAR) and stay compliant.
Solution
43 Equities apps (Oracle; SQL Server) offload
history to RS
History offload from Netezza - run on WORM
Re-instate Tape and bring online for audits. 43 Apps
30. RainStor + Isilon + Hadoop – TCO
Compression rate 32X (>96% cost savings)
Utilization Rate >80%
Scalability Up to 20 PB per cluster
Query Performance >= Hadoop on DAS
RainStor + Hadoop + Isilon =
Lowest 5yr TCO!
31. Why RainStor-Isilon?
31
Flexible
Architecture –
Hadoop, Cloud
Extract EDW
data for Active
Archiving
Lower Storage
Costs by at
least 90%
Gain Deeper
Insights – SQL,
Hive, Pig,
Search, BI tools
Reliable –
High
Availability,
Disaster
Recovery
Purpose-built
Security and
Compliance
features
First SQL Compatible,
Enterprise-grade Database
(native to Hadoop) to run on
Isilon Scale-out NAS.