Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infrastructure

KICK START YOUR BIG DATA PROJECT WITH
HYPERCONVERGED INFRASTRUCTURE
Ray Hassan, Solutions & Performance Engineering
ray@nutanix.com
@cannybag

2
About Nutanix
3750+ customers
Over 70 countries
6 continents
Founded in 2009
1980+ employees
Make datacenter infrastructure invisible,
elevating IT to focus on applications and services

3
Broad Customer Adoption
Healthcare
Technology
Retail
Manufacturing
Financial Services
Energy Public Sector
Education

4
Web-scale Design
Design Principles
• Unbranded x86 servers: fail-fast systems
• No special purpose appliances
• 100% Software-defined
• Extensive automation and rich analytics
• Distributed everything
Benefits
• Linear, predictable scale-out
• Always-on systems
• Fast innovation in software
• Operational simplicity
• Lower TCO

5
Scalable Open Source Technologies
 Cassandra: Distributed
Metadata Store
 Zookeeper: Cluster
Configuration Manager
 Stargate: Data I/O Manager
 Curator: MapReduce
Cluster Management
Powered by ZooKeeper

6
CVM CVM CVMVM VM VMVM VMCVMVMVM
Stargate Cassandra Zookeeper
Storage Provisioning
Storage I/O
VM Configuration
Volume Configuration
Snapshot Information
Write Ahead Log
Leader Election
Configuration Management
Service Discovery
Distributed Web-Scale Services
Virtualization Built on Web-Scale Foundation

7
What Nutanix Does
Converged
compute, storage
and virtualization
Servers
Storage
Network
SAN
Virtualization

8
Local (Flash + HDD)
Single Storage Pool
Evolution of the Datacenter
High Perf and Capacity Optimization
Data Protection and DR
Resilience
Security
Node 1 Node 2 Node N
X86 X86 X86
Hypervisor Hypervisor Hypervisor
CVM CVM CVM

9
Data Locality
 Keep data on the same node
as VM
 All read operations localized
on same node
 ILM transparently moves
remote data to local
controller
 Reduces network chattiness
significantly
 Data follows VM during
vMotion/Live Migration

10
Flash Made Easy
• Advanced Auto-Tiering reduces complicated configuration
• Ability to by-pass the flash tier
• All flash is made accessible to all of the nodes
• Ability to handle a variety of Big Data workloads – Hadoop, NoSQL, Kafka,
Spark, Splunk….
Com put e fingerprint and
st ore in met adat a
SSD
Mem ory
HDD
Cont ent
Cache
Op Log
Ext ent St ore
Ext ensibleCloud NA S, et c.
CacheDrain
Random
Sequential
Read I/ O
W rit e I/ O
NDFSEDE

11
Automatic Disk Balancing
 Real-time balancing of storage within the cluster/nodes
 Supports heterogeneous and homogenous node types (Compute
heavy/Storage heavy)
 Uniform distribution of data
 Leverages MapReduce framework
 Requires no manual intervention
This process is done both during runtime with node/disk placement as well as
back ground Curator process
Larger aka “Storage Heavy” nodes will have larger capacities
and hence hold more data
After disk balancing has run the utilization
will be uniform
NDFS
Hypervisor Hypervisor Hypervisor
VM 1 VM N VM 1 VM N VM 1 VM N
Storage Storage Storage
CVM CVM CVM35% 35%
35%

12
Nutanix Local Snapshots (Time Stream)
RPO: minutes
RTO: minutes
Use Cases
 Protection against Guest OS corruption
 Snapshot VM environments
 Self-Service File Level Restore
Points of differentiation
 VM granularity
 No performance impact
 VM and application level consistency
 Lower $ / GB with storage heavy/only
nodes
vdisk Local VM-Centric
Snapshots
Primary
Cluster
CPU
Memory
CPU
Memory
CPU
Memory
CPU
Memory
Nutanix
Snapshots have
byte-level
Resolution

13
Compression
 Inline and post-process compression
 Inline: Data compressed as it’s written
 MapReduce: Data compressed after “cold”
data is migrated to lower-performance
storage tiers
10100101
10101010
10100101
10101010
10100101
10101010
10100101
10101010
10100101
10101010
10100101
10101010
10100101
10101010
 No impact to normal IO path
 Ideal for random batch workloads
 Uses Snappy algorithm

14
Erasure Coding (EC-X)
RAID-5, RAID-6, RAID-DP on
Disks
Erasure Coding across Nodes
 Storage optimization, keeping resiliency
unchanged
 Optimizes availability (fast rebuilds)
 Uses the power of the entire cluster
 Up to 75% increase in usable capacity
• Bottlenecked by single disk
• Hardware defined
• Hot spares waste space
• Decreased write performance
• Slow rebuilds

16
Save on Archiving and Licensing
NX-8150 Series
Compute and storage
NX-6035C Series
Storage only
10GbpsEthernet
IOPS Storage

17
Rich and Insightful Analytics
Configuration Health Risk Efficiency

18
Simplified Datacenter Management
• Storage Management
• Cluster Management
• VM Management
• Network Management
One-click Infrastructure Management
• Proactive Alert Analysis
• Service Impact Analysis
• Intelligent Root Cause Analysis
• Remediation Advisor (future)
One-click Remediation
• Capacity Behavior Trends
• Capacity Optimization Advisor
• What-if Analysis
• Customizable Dashboards
One-click Operational Insights (Future)

19
Enterprise Databases Meet Web-Scale
Transactional Databases
(OLTP)
• Localized I/O for low latency random
operations
• SSD for working set, indexes and key
database files
• Ability to automatically tier data
depending on usage
Analytical Databases
(OLAP/DSS)
• Local read I/O for high-performance
queries and reporting
• Abundant sequential write and read
throughput
• Scalable performance and capacity
ESXi
Local + Remote
(Flash + HDD)
Distributed Storage Fabric
Intelligent tiering, VM-centric management and
more…
Snapshots
Clones
Compression
Deduplication
Node 1 Node 2 Node N
X86 X86 X86
CVM CVM CVM
AHV Hyper-V AHVESXi Hyper-V AHVESXi Hyper-V
Acropolis App Mobility Fabric
Nutanix Controller VM
(One per node)
Tier 1 Workloads
(Running on all nodes)

20
Splunk on Nutanix
Ability to ingest GBs of data per day
1 TB+/day of data ingest
ample for most deployments
Quick search capabilities
for mission critical applications
Accelerated search
through server-side flash
Ability to support
growth in data ingest rates
Predictable, linear performance
through distributed architecture
Self-contained deployments
due to data security and privacy
Quick, manageable
deployment through appliance

21
Nutanix Big Data Resources
Big Data on Nutanix:
http://www.nutanix.com/solutions/big-data/
Reference Architectures:
http://go.nutanix.com/virtualizing-splunk-on-nutanix-
AHV.html
http://go.nutanix.com/hadoop-virtualization.html
Solution Notes:
http://go.nutanix.com/virtualizing-elastic-stack-on-ahv.html
Best Practice Guides
http://go.nutanix.com/best-practices-to-virtualizing-
mongoDB.html
http://www.nutanix.com/go/docker-container-best-practices-
guide-with-AHV.html

Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infrastructure

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infrastructure

Similar to Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infrastructure (20)

More from Matt Stubbs

More from Matt Stubbs (20)

Recently uploaded

Recently uploaded (20)

Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infrastructure

Editor's Notes