More Related Content
Similar to Modern infrastructure for business data lake
Similar to Modern infrastructure for business data lake (20)
Modern infrastructure for business data lake
- 1. 1© Copyright 2015 EMC Corporation. All rights reserved.
© Copyright 2014 EMC Corporation. All rights reserved.
- 2. 2© Copyright 2015 EMC Corporation. All rights reserved.
Scale-out Converged Solutions for Analytics
Julianna DeLua, VCE
Dan Beres, EMC Isilon
- 3. 3© Copyright 2015 EMC Corporation. All rights reserved.
AGENDA
History of Analytic Infrastructure
Why Scale-Out, Converged Solutions
Analytic Workflow
vHadoop
Test Results
Customer Use Cases and Feedback
Conclusion / Next Steps
- 4. 4© Copyright 2015 EMC Corporation. All rights reserved.
A Brief
History
of Analytic
Infrastructure
- 5. 5© Copyright 2015 EMC Corporation. All rights reserved.
VCE Confidential© 2015 VCE Company, LLC. All rights reserved.
2013 – Shared infrastructure? Let me know
when you know “for sure” it works. In the
meanwhile, a few industry pioneers / early
adopters start POC with EMC / VCE
2014 – Extend converged system
benefits with Isilon scale out – augment
enterprise app/data with Hadoop,
Splunk, no-SQL. Great performance!
2015 –Internet of things initiatives
accelerate. Rapid technological
advancements with architectural
flexibility - Vscale
- 6. 6© Copyright 2015 EMC Corporation. All rights reserved.
The Private/Public Cloud
“Infinite, inexpensive compute and storage”
ENABLED BY
Agile Product Development Culture
ANATOMY OF A MODERN DIGITAL BUSINESS
CAPABILITIES
NEEDED
BUSINESS
DRIVERS
• New systems of engagement
• New business models
• Internet of Things
Platform
Data Algorithms
(Code)
“Catch people or things in the
act and affect the outcome”
= $$$
Compelling, Unique User Experience/Model
Existing
SystemsA MAJOR PRESSING CHALLENGE
Analytics/BI
• How do we architect for
agile data-driven
business?
• Can we manage big, fast
data?
• Value driven
CIO
• Meet future business needs
while simplifying and
taking cost out of legacy?
• Avoid lock-in again?
• People and organization
CEO/CMO
• How do we become an
agile, digital business?
• Anticipate and delight
customers?
• Partner collaboration
• Where/how do we start?
A MAJOR
PRESSING
CHALLENGE
- 7. 7© Copyright 2015 EMC Corporation. All rights reserved.
Sub-optimal environment—data locked in
high volume, variety, or velocity.
Lack of service-enablement—difficulties in
optimizing virtualized, multi-tenant service
approach.
Compliance/security exposure—lack of
encryption, exposure, and data loss.
Limited standardization—not using data
center standards.
Downtime/SLA issues—not readily
configurable
to handle mixed workloads.
System utilization—inefficient islands of
storage and systems, inability to reuse data for
multiple solutions.
Long cycles for accessing and sharing
information locked in unstructured data.
Cannot rapidly create value via technology-
enabled XaaS.
Explicitly demonstrate security, compliance,
and governance.
Inability to plan system progression that
combine structured/unstructured –
exacerbating silos of appliances and
hardwares
Insufficient posture against outages and peak
period of IT use.
Escalating deployment management and
maintenance costs for growing data.
CUSTOMER PAINS TECHNICAL PROBLEMS
Typical Customer Pains and Technical problems
- 8. 8© Copyright 2015 EMC Corporation. All rights reserved.
CONVERSATIONS LEAD TO PLATFORM EVOLUTION
Conversations
Downtime and response
time issues missing
business SLA
• Increased flash use
• Continuous need for
migration
• Network scale points
• Data mobility
• Hadoop, Splunk, PaaS,
Cassandra, MongoDB,
Legacy DB
• Aggregate/disaggregate
pool of resources
• Control required for
application proliferation
Faster time to drive
value from innovation
multitude of
applications
• Mobile and social offers
• Turn 360 degree insight
to customer acquisitions
• Fulfillment, inventory and
customer management
AWS is costing too much
but business wants faster
go live and flexibility
- 9. 9© Copyright 2015 EMC Corporation. All rights reserved.
VCE VSCALE™ ARCHITECTURE
FLEXIBLE SCALE-OUT THROUGH EXPANDED
MULTI-SYSTEM ARCHITECTURE
9
MPP DB
Hadoop
PROD & DR
In memory DBBI / DW
Enterprise
App - SAP
Microsoft
Email,
collaboration
Hadoop
POC
Pivotal
Cloud
Foundry
Video
Surveillance
- 10. 10© Copyright 2015 EMC Corporation. All rights reserved.
Edge & Central Analytics Workflow
Swift
HTTP
RAN | DAV
Isilon OneFS
Easy to Grow Manage & Administer
Additional Clients to More Content
Multiprotocol Access to Same Data
Log
OneFS
……..
FTP SyncIQ SyncIQ
HDFS
NFS SMB
HDFS
Glance
External
WAN
Internal
WAN
Oracle
NFS
Mediation
App
Server
- 11. 11© Copyright 2015 EMC Corporation. All rights reserved.
vHadoop+Isilon Install & Deployment Guide
- 12. 12© Copyright 2015 EMC Corporation. All rights reserved.
“Fix These Problems….Prove it Out!”
Expensive and Won’t Scale
– Hundreds of Servers to support less than 2PB Usable Storage (1:7 ratio)
– “We have a guy with shopping carts walking down the rows replacing parts”
– Additional Staging Area for Data before Ingesting into Hadoop
– Can’t Scale Storage without Compute – Locked & Not Elastic
Lacks Enterprise Features
– No Cost Effective Data Redundancy
– Limited File-system Security, only Simple Authentication
– Multiple Points of Failure
– Maintaining Hadoop “PODs” involves significant downtime
Time To Results
– Requires Significant time to ingest and copy Data
– Building Production Hadoop “PODs” can take months
– Network Infrastructure Saturation & Expense
- 13. 13© Copyright 2015 EMC Corporation. All rights reserved.
NFS
NFS
SMB
SMB
SWIFT
HDFS
SWIFT
RAN
RAN
FTP
EMC Isilon Enabled Workflows
- 14. 14© Copyright 2015 EMC Corporation. All rights reserved.
HDFS
SMB, NFS,
HTTP, FTP,
HDFS
node
info
node
info
node
info
node
info
node
info
node
info
node
info
node
info
node
info
Node
reply
Node
reply
Node
reply
Node
reply
Node
reply
Node
reply
Node
reply
Node
reply
Node
reply
file
file
file
file
file
file
file
file
Node
reply
Node
reply
Node
reply
Node
replyNFS
NFS
SMB
SMB
name
node
name
node
name
node
name
node
name
node
name
node
name
node
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
MAP
Reduce
datanodedatanode
Isilon
Original
Data
Original
Data
OneFS
Compute
Data
1X
EMC Isilon Enabled Hadoop
Name node
Data
Compute
- 15. 15© Copyright 2015 EMC Corporation. All rights reserved.
Created and tuned Hadoop VMs to maximize Throughput
– >90% Utilization of CPUs for Compute
– Memory footprint reduced (MEM Page sharing across VMs)
– Hadoop 2.0 with YARN does not need FLASH for HDFS
Incremental testing to validate Scalability
– Validated 2:1 ratio Compute Node to Isilon Node (can also support 3:1)
– 2 VMs per Compute Node for Optimal Performance on Dual Socket
– Linear Scalability in performance by incrementally adding more compute
Validated Enterprise/Production Ready
- Security Greater with AD Authorization and Access
No need to anonymize data
Whitepaper Created
- Deployment & Upgrade Of Hardware and Software in hours not days/weeks
- Validated reduced data-center footprint & environmentals with UCS Blade
Servers, vHadoop & Isilon
Hadoop Test Findings
- 16. 16© Copyright 2015 EMC Corporation. All rights reserved.
1TB Hadoop Job Cycle Comparison
Isilon Significantly Reduces Time To Results
Traditional Hadoop+DAS
17:32 30:18 20:5020:50
Isilon Enabled vHadoop
18:51
Terasort Test on 1TB
DAS Isilon Benefit
MB/s Per Node 55.00 85.00 55%
Compute Min 30.18 18.51 -39%
TTR Min 89.30 18.51 -79%
Isilon Advantages
• Eliminates All Data Movement
• Allows for Virtualized Compute
• Significantly Less Cost
• 79% Faster TTR!
TTR- 89.3
Minutes!
- 17. 17© Copyright 2015 EMC Corporation. All rights reserved.
EMC Isilon – Only Security Compliant Datastore for
Hadoop
Highly resilient architecture
– Robust data protection options (DR, Snapshots, SyncIQ)
– Clustered Multi-Point Name Node with Kerberos
– SEC 17a-4 compliant WORM
– Hadoop multi-tenancy with dedicated network and access zones
Hadoop on Isilon provides full ACLs for NFS, SMB, and HDFS
– Each file/ directory has an Access Control List (ACL) consisting of one or more Access Control Entries
(ACE).
– Each ACE assigns a set of permissions (read, write, delete) to a specific security identifier (user or
group).
– Deny ACEs which remove permissions and override any “Allow ACEs”
Standard Hadoop only provides basic Unix-type “Simple” permissions
– Effective permissions are determined based on the file owner (single user, single group, other/world)
– Read and/or write permissions can be assigned to the owner, the group, and “everyone else”
– What do you do when you need to assign read access to multiple groups (A, B & C)?
– What do you do when you need to assign read access to the group A and read+write access to group
B?
– How do you maintain permissions when files are copied from Windows NTFS shares?
- 19. 19© Copyright 2015 EMC Corporation. All rights reserved.
HCFS Certification: Process Detail
Certification Step Duration
Partner Prep
Partner defines HDP test matrix (platforms, HDP components, HDFS APIs, HDP version and partner product version)
Partner provides sample product to Hortonworks so Engineering and Field teams are familiar with partner technology
Testing
HDFS Test Suite training - at Hortonworks HQ and online
Partner deploys, runs, analyzes, and reports HDFS Test Suite with technical support from Hortonworks
HDP Core Test Suite (Map/Reduce, YARN, Tez and Hbase, Hive and Pig) training – at Hortonworks HQ and online
Partner deploys, runs, analyzes, and reports HDP Core Test Suite with technical support from Hortonworks
Partner deploys, runs, analyzes, and reports on remaining HDP Component Test Suites with technical support from Hortonworks
Testing time allocation
Documentation
Joint review of test suite execution results
Hortonworks creates functional gap analysis document, need partner sign off
Documentation time allocation
Validation
Hortonworks validates test suite execution results and certifies HCFS for specified HDP version and partner product version
Total certification time allocation
90-180
days
- 20. 20© Copyright 2015 EMC Corporation. All rights reserved.
Scale-out Isilon for Scale-out Hadoop
Compute
Nodes
Isilon is a scale-out system; Hadoop
HDFS is partially similar
HDFS on Isilon functions as a Parallel
file system
Each compute node performs I/O on
every Isilon node in the Rack
I/O bandwidth and storage capacity
can be increased linearly simply by
adding Isilon nodes
Compute can be increased or
decreased on the fly and can easily be
virtualized
With a mesh network that is faster
than the disks, data locality is
irrelevant
Isilon
Nodes
- 21. 21© Copyright 2015 EMC Corporation. All rights reserved.
Hadoop Architecture – Traditional DAS
Dozens of Hadoop Racks Requires Significant Investment Network Infrastructure
Rack Ethernet Switch
Compute
Shuffle+HDFS
SATA
10+ Gbps
Core Ethernet Switch
Compute
10 Gbps
…
Shuffle+HDFS
Compute…
Shuffle+HDFS
Rack Ethernet Switch
Compute
Shuffle+HDFS
SATA
10+ Gbps
Compute
10 Gbps
Shuffle+HDFS
Compute…
Shuffle+HDFS
The ratio of compute and
disk space/performance is
fixed.
Non-local HDFS I/O (30-
90% of HDFS I/O) will go
through Ethernet.
Local disk usage is shared
between shuffle I/O (60%
of all I/O during terasort)
and HDFS I/O.
Core Network Switches Are
Additional Cost for
Hadoop+DAS
(more Network traffic required)
- 22. 22© Copyright 2015 EMC Corporation. All rights reserved.
Hadoop Architecture – Isilon for HDFS
Reduced traffic across the Core Ethernet switch--HDFS traffic will only travel
within a rack and across IB.
Isilon InfiniBand Switch
Rack Ethernet Switch
Compute
Shuffle
SATA
10+ Gbps
10 Gbps
Core Ethernet Switch
Compute
Shuffle
10 Gbps
… …
IB
Rack Ethernet Switch
Compute
Shuffle
SATA
10 Gbps
Compute
Shuffle
10 Gbps
…
…
IB
…
The number of compute and
Isilon nodes can be adjusted
independently to achieve the
optimal ratio of compute and I/O
bandwidth
HDFS I/O ALWAYS comes
through a rack-local Isilon node
which collects data blocks from
all other Isilon nodes across the
InfiniBand fabric
(used only for MR copy phase) 10+ Gbps (used only for MR copy phase)
Shuffle I/O (65% of all I/O
during terasort) remains on
local storage.
Isilon
HDFS
Isilon
HDFS
Isilon
HDFS
Isilon
HDFS
- 25. 25© Copyright 2015 EMC Corporation. All rights reserved.
ESG LAB REVIEW – VBLOCK SYSTEMS WITH VCE
TECHNOLOGY EXTENSIONF FOR EMC ISILON
• Objectives
• Underscore business challenges and opportunities
for progressing to enterprise Hadoop
• Establish requirements to be ready for production
– Extensibility, Governance, Security, Availability,
Performance and Multi-Use
• Perform benchmarks Vblock System 340 with
EMC Isilon with Teragen suite
“By leveraging an industry-proven Integrated computing
platform ( ICP) in VCE Vblock Systems and combining it
with EMC Isilon and VMware vSphere Big Data Extensions,
organizations get a fully integrated platform that meets and
grows with their big data and analytics requirements.
— Tony Palmer, Senior Lab Analyst, ESG
- 26. 26© Copyright 2015 EMC Corporation. All rights reserved.
0
200
400
600
800
1,000
1,200
1,400
TeraGen TeraSort TeraValidate
JobDuration(seconds)
Comparing Performance of Traditional Hadoop to VCE
Vblock System with EMC Isilon (TeraSort Suite)
16 Traditional Hadoop Nodes (combined Compute and DAS)
16 VCE Compute Nodes and EMC Isilon Storage
ESG LAB OBSERVATION ON TERAGEN BENCHMARKS
- 28. 28© Copyright 2015 EMC Corporation. All rights reserved.
VCE LOWERS OPERATIONAL COSTS
0
0 IT Staff
Cost
Facilities Infrastructur
e
After Vblock
System
Deployment
Before Vblock
System
Deployment
41%
13%
38%
IDC Research Study OF VCE CUSTOMERS, SEPTEMBER 2013
- 29. 29© Copyright 2015 EMC Corporation. All rights reserved.
GAS AND UTILITY LEADER
• Situation
• Largest provider of gas and electric energy in the US. Innovate to drive clean,
sustainable future. Better management of costs and risks using predictive
models. Operational improvement and compliance management. Expected
data growth and application complexity with smart meter data management.
• Solution
• Vblock System 340 to be used for private and public cloud in the hybrid cloud
model
to keep custom applications and sensitive data in-house while pushing others
to public. Initiated with Pivotal to become software led company with Pivotal
CF.
• Anticipated Business Benefits
– Increase agility for applications deployment using Platform as a Service
(PaaS) and big data solution
– Support 600+ new applications planned annually faster at lower cost
– Improve disaster recovery readiness and data protection
– Lower costs and detect issues by enabling field personnel
– Increased customer satisfaction including cost savings via meter data
Drive to Clean energy transformation while managing cost and risk
Differentiators: Suited to
Hybrid Cloud Model and
future expansion –
upgrades and scaling.
Extending VCE-Pivotal-EMC
relationship while being
open to tap eco-system
- 30. 30© Copyright 2015 EMC Corporation. All rights reserved.
FOOD AND BEVERAGE GIANT
• Situation
• Global food and beverage conglomerate to accelerate financial reporting and
reflect customer behaviors. Seeking a better alternative to third party cloud base
model. Operational improvement and customer intimacy with leading brand
recognition throughout the world. Data loading, processing and end-user impact
crucial
• Solution
• Use Vblock System for a shuffle and extend with VCE technology extension for
EMC Isilon to run Pivotal Hadoop and HAWQ. For Pivotal Greenplum, use VCE
technology extension for compute (Cisco C240). Bring some of the core
applications to the corporate IT.
• Anticipated Business Benefits
– Streamline financial reporting process for goods coming from multiple
geographies while keeping up to data and support broadening user query
– Exploit mobile applications for customer preferences and inventory management
– Support product launches and marketing campaigns based on consumption logs,
brand preferences and social media
– Improve disaster recovery readiness and data protection
– Start with one project, gain momentum while ensuring readiness for the future
Financial reporting and marketing analysis Back to Private Cloud
Differentiators: Ability
to match architecture
to workloads. Reuse
existing environment.
Extensible for future
growth
- 31. 31© Copyright 2015 EMC Corporation. All rights reserved.
WHY VCE AND EMC FOR SCALE-OUT
CONVERGED ANALYTIC SOLUTION?
• Adaptable, modular, and mission critical
• Incremental scaling with your demand from
the broad VCE and EMC portfolio
• Pre-tested, validated and certified by EMC and
VCE
• Exploit end-to-end analytics on the SAME VCE
and EMC platform
• Take advantage of broadening EMC partner
eco-system
• Contact your EMC or VCE representatives
• Contact : EMC – dan.beres@isilon.com
VCE - julianna.delua@vce.com