Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems

Delivering a Flexible IT Infrastructure for Analytics
with HDP on IBM Power Systems
Nov 30, 2016

Agenda
• Hortonworks and IBM Power Systems Partnership
• Customer Analytics Journey
• Open Community Innovation
• Leading Time to Insights
2

Hortonworks, IBM Collaborate to Offer Open Source Distribution on Power Systems
Latest Hortonworks Data Platform (HDP) to provide IBM customers with more choice in open
source Hadoop distribution for big data processing
3
Las Vegas, NV (IBM Edge) - 19 Sep 2016: IBM (NYSE: IBM) and Hortonworks (NASDAQ: HDP) today announced
the planned availability of Hortonworks Data Platform (HDP®) for IBM Power Systems enabling POWER8 clients to
support a broad range of new applications while enriching existing ones with additional data sources.
Scott Gnau, CTO, Hortonworks at Edge.
Youtube: http://bit.ly/2dSOliW
James Wade, Director of Application Hosting, Florida Blue.
Youtube: http://bit.ly/2dxVHIY
© 2016 IBM Corporation

Customer Story – Guidewell Health - Florida Blue
• Business Problem
– Transformational journey resulting in rapid expansion of business models
– Technology innovation required to keep up with the business expansion while improving
client satisfaction, reducing costs and supporting the company’s green IT initiatives
o Existing x86 server sprawl not sustainable
• Solution with Hortonworks, IBM OpenPOWER servers and Sage Solutions Consulting
– Embraces the open software and hardware model adopted by Florida Blue
– Hortonworks supporting new fraud analytics initiative to reduce costs and client premiums
– OpenPOWER to enable smaller datacenter footprint with stronger reliability
4
See the full story in this Hortonworks Blog post.

5
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Payment
Tracking
Due
Diligence
Social
Mapping
Product
Design
M & A
Call
Analysis
Machine
Data
Defect
Detecting
Factory
Yields
Customer
Support
Basket
Analysis
Segments
Customer
Retention
Sentiment
Analysis
Optimize
Inventories
Supply
Chain
Cross-
Sell
Vendor
Scorecards
Ad
Placement
Cyber
Security
Disaster
Mitigation
Investment
Planning
Ad
Placement
Risk
Modeling
Proactive
Repair
Inventory
Predictions
Next
Product Recs
OPEX
Reduction
Historical
Records
Mainframe
Offloads
Device
Data
Ingest
Rapid
Reporting
Digital
Protection
Data
as a
Service
Fraud
Prevention
Public
Data
Capture
INNOVATE
RENOVATE
EXPLORE OPTIMIZE TRANSFORM
ACTIVE
ARCHIVE
ETL
ONBOARD
DATA
ENRICHMENT
DATA
DISCOVERY
SINGLE
VIEW
PREDICTIVE
ANALYTICS
M&A
Storage
Blending
M&A
Ingest
Integration

About Hortonworks
The Leader in Connected Data Platforms
Publicly traded on NASDAQ: HDP
Hortonworks DataFlow for data in motion
Hortonworks Data Platform for data at rest
Powering new modern data applications
Partnering for Customer Success
Leader in open-source community, focused
on innovation to meet enterprise needs
Unrivaled support subscriptions
Founded in 2011
Original 24 Architects, Developers,
Operators of Hadoop from Yahoo!
1000+
Employees
1500+
Ecosystem
Partners

HPD is a 100% Open Source Connected Data Platform
Eliminates Risk
of vendor lock-in by delivering
100% Apache open source technology
Maximizes Community Innovation
with hundreds of developers across
hundreds of companies
Integrates Seamlessly
through committed co-engineering
partnerships with other leading
technologies

Apache Hadoop Committers
We Employ the Committers
one third of all committers to the Apache®
Hadoop™ project, and a majority in other
important projects
Our Committers Innovate
and expand Open Enterprise Hadoop
We Influence the Hadoop Roadmap
by communicating important requirements
to the community through our leaders
Hortonworks Influences the Apache Community

Hortonworks Nourishes the Community and Ecosystem
Hortonworks
Community Connection
Hortonworks
Partnerworks
• Community Q/A Resources
• Articles & Code Repos!
• Community of (big data)
developers
• Open Ecosystem of Big Data
for vendors & end-users
• Advance Apache™ Hadoop®
• Enable more Big Data Apps
• World class partner program
• Network of partners providing
best-in-class solutions
Hadoop & Big data ecosystem

Hortonworks Delivers Proactive Support
Hortonworks SmartSense™
with machine learning and
predictive analytics on your cluster
Integrated Customer Portal
with knowledge base and
on-demand training

Client Value proposition for HDP on Power:
• 100% Open Source Hadoop running on OpenPOWER Hardware Platform
– OpenPOWER is open-source hardware solution for open source SW
– Intel x86 – perceived as default/commodity option - but x86 is not open
– Open = no vendor lock-in and flexibility
• Processor and Server Architecture optimized for big data processing
– 2x per core performance compared to intel x86
o Fewer cores & servers needed: contain server sprawl
o Improved HW price/performance
– Higher server & data reliability – designed to run the enterprise
11

5 IBMers contributing to
Linux and Apache Projects
1999
IBM is investing in the
Linux ecosystems &
open innovation
270+ OpenPOWER-based
innovations under way
2016
50k+ IBMers contributing to 150+ open organizations
1. Source: https://developer.ibm.com/start/
1
Blockchain
Hyperledger
Open Source
Databases
12

13 13
Accelerated innovation through
collaboration of partners
Accelerated innovation through
collaboration of partners
Amplified capabilities driving
industry performance leadership
Amplified capabilities driving
industry performance leadership
Vibrant ecosystem through
open development
Vibrant ecosystem through
open development
Cloud Computing
Hyperscale & Large scale
Datacenters
High Performance
Computing & Analytics
Domestic
IT Agendas
Industry adoption, Open choice
OpenPOWER Strategy
Moore’s law no longer satisfies
performance gain
Numerous IT consumption
models
Growing workload
demands
Mature Open software
ecosystem
Market Shifts
OpenPOWER is an open development community,
using the POWER Architecture to serve the evolving needs of customers.
OpenPOWER, a catalyst for Open Innovation

Fueling an Open Development Community
14
Chip /
SOC
Boards
Systems
I/O / Storage
Acceleration
System / Software
Integration
Implementation /
HPC /
Research

- Design and cost
optimized for
deployments of
multiples (cloud and
cluster)
- Broad number of
optimal solutions
- Co-Designed with the
OpenPOWER
Ecosystem
Supported by Canonical
IBM Support
Community / 3rd
Party Support
running
The LC Line
The L Line
PurePower
Enterprise
& IFLs
IaaS
Scale-Out, Linux-Only
Converged
Infrastructure
Scale-Up
- Enterprise level RAS
for single system
deployments
- Solutions for Big Data
& Analytics
- Converged
infrastructure
offering
- Rapid time to value
and simplicity of
management
- Enterprise level
robustness and IFL
capability
- Solution editions for
in memory databases
- (HANA, DB2 BLU)
- Hosted cloud and
hybrid cloud
solutions
- Rapid deployments
and POCs
The IBM Power Systems Linux Portfolio
Pipeline of innovation
Broad Linux portfolio
delivers all your Linux
deployment needs
POWER8 is designed for the Big Data era and
delivers price-performance leadership to the
Linux Market!
155 © 2016 IBM Corporation

Introducing the IBM Power Systems LC Line
S822LC For High
Performance Computing
•Incorporates the new
POWER8 processor with
NVIDIA NVLink
•Delivers 2.8X the bandwidth
to GPUs accelerators
•Up to 4 integrated NVIDIA
“Pascal” GPUs
S822LC For Big Data
•Ideal for storage-centric
and high data through-
put workloads
•Brings 2 POWER8
sockets for Big Data
workloads
•Big data acceleration with
work CAPI and GPUs
S821LC
•Storage rich single
socket system for big
data applications
•Memory Intensive
workloads
S822LC
•2X memory bandwidth
of Intel x86 systems
•Memory Intensive
workloads
S812LC
•2 POWER8 sockets in a
1U form factor
•Ideal for environments
requiring dense
computing
NEW
NEW
NEWBig Data
High Performance
Computing
ComputeIntensive
Announce 9/8, GA 9/26
Announce and GA 9/8
Announce and GA 9/8
OpenPOWER servers for cloud and cluster deployments that are different by design

Innovation Pervasive in the Design
Power Systems S822LC for
Big Data
NVIDIA:
Tesla K80 GPU Accelerator
Red Hat, Ubuntu, SUSE:
Linux OS
Mellanox: InfiniBand/Ethernet
Connectivity in and out of server
HGST: Optional NVMe Adapters
Alpha Data with Xilinx FPGA:
Optional CAPI Accelerator
Broadcom: Optional PCIe Adapters
QLogic: Optional Fiber Channel PCIe
Samsung: SSDs & NVMe
Hynix, Samsung, Micron: DDR4
IBM: POWER8 CPU
17

Leading Operational DBMSs Available & Optimized for Linux on Power
In-Memory, NoSQL
SAP HANA
MongoDB
Neo4j
EnterpriseDB
MariaDB
Open Source
IBM DB2 BLU
RedisLabs
Cassandra
PostgreSQL
18

19
IBM is your single, trusted vendor to support and help you
manage your Linux infrastructure
1
Based on IBM internal data. 2
Original equipment manufacturer

Price/Performance
Moore’s Law
Processor
Technology
2000 2020
Firmware / OS
Accelerators
Software
Storage
Network
20
Data holds competitive valueFull system and stack
open innovation required
You are here
44 zettabytes
unstructured data
2010 2020
structured data
DataGrowth
Today’s challenges demand innovation

4X
Threads per core
4X
Mem. Bandwidth1
6X
More cache2
@
Lower Latency
SMT=Simultaneous Multi-Threading
OLTP = On-Line Transaction Processing
These design decisions result in best performance for data centric workloads like:
Spark, Hadoop, Database, NoSQL, Big Data Analytics, OLTP
POWER8: Designed for data to deliver breakthrough performance
POWER8
SMT8
x86
Hyperthread
Parallel Processing
POWER8
pipe
Data flow
x86 pipe POWER8
x86 POWER8 +
OpenPOWER
x86
21
1. Up to 4X depending on specific x86 and POWER8 servers being compared
2. Up to 6X more cache comparing Intel e7-8890 servers to 12 core POWER8 servers. See speaker notes for more details

SAP S&D industry benchmark demonstrates that
x86 technologies have NOT improved performance per core
Source: http://global.sap.com/solutions/benchmark/sd2tier.epx
Certification # 2016023
Date: 05/10/2016
Fujitsu PRIMEQUEST 2800E3
192 Cores & 384 Threads
E7-8890 v4 at 2.20 GHz
2048 GB Memory
Users: 74,000, SAPS: 404,200
Windows Server 2012 R2
SQL Server 2012
Sandy Bridge EP Ivy Bridge EP Ivy Bridge EX Haswell EP Haswell EX Broadwell EP Broadwell EX
Date: 10/3/2014
IBM Power E870
POWER8 at 4.19 GHz
2 TB Memory
Users: 79,750, SAPS: 436,100
AIX 7.1
DB2 10.5
POWER8
2.58x
Date: 12/10/2013
Cisco UCS C420 M3
E5-4650 at 2.70 GHz
256 GB Memory
Users: 13,010, SAPS: 71,170
Windows Server 2012
SQL Server 2012
Date: 05/05/2014
Dell PowerEdge R720
E5-2697 v2 at 2.70 GHz
256 GB Memory
Users: 10,253, SAPS: 55,970
RHEL 6.5
SAP ASE 16
Date: 05/05/2014
Cisco UCS B260 M4
30 Cores & 60 threads
E7-4890 v2 at 2.80 GHz
512 GB Memory
Users: 12,280, SAPS:
67,020
Windows Server 2012
SQL Server 2012
Certification #
2014033
Date: 09/10/2014
Dell PowerEdge R730
36 Cores & 72 threads
E5-2699 v3 at 2.3 Ghz
256 GB Memory
90,120
RHEL 7.0
SAP ASE 16
Date: 05/05/2015
Dell PowerEdge R930
72 Cores & 144
threads
E7-8890 v3 at 2.5 GHz
1 TB Memory
170,030
RHEL 7.1
SAP ASE 16
Date: 03/31/2016
Cisco UCS C240 M4
E5-2699 v4 at 2.20 GHz
512 GB Memory
Users: 21,210, SAPS: 115,820
Windows Server 2012 R2
DB2 10.1

RelativeRPE2**PerCore
POWER6=1.0
**Gartner RPE2 Details:
http://www.gartner.com/technology/research/RPE2-methodology-details.jsp
RPE2** numbers are derived from the following six benchmark inputs:
SAP SD Two-Tier, TPC-C, TPC-H, SPECjbb2006 and two SPEC
CPU2006 components
The data on this chart is derived from RPE2 from Gartner, Inc.'s Competitive Profile tool. © 2014 Gartner, Inc. and/or its affiliates. All
rights reserved.
This was a POWER8 Design Goal
POWER7
3.55 GHz
16 cores
POWER7+
4.2 GHz
16 cores
POWER8
3.52 GHz
24 cores
POWER6
4.2 GHz
16 cores
POWER Performance per core is increasing!
Increasing
Performance
The servers shown are best in each category (sockets and number of cores)
1.0

Example: OpenPOWER acceleration with NVLink
Current CPU to GPU PCIe Attachment
System
Bottleneck
Graphics
Memory
New
POWER8
CPU
New
POWER8
CPU
P100
Tesla
GPU
P100
Tesla
GPU
NVLink
NVLink
NVLink
115GB/s
80 GB/s
80GB/s
80GB/s
New POWER8 with NVLink Processor Technology
CPUCPU
GPUGPU
PCIe
32GB/s
Graphics
Memory
Graphics
Memory
POWER8 with NVLink
delivers 2.8X the
bandwidth
PCIe Data Pipe
POWER8 NVLink
Data Pipe
P100
Tesla
GPUz
P100
Tesla
GPUz

YCSB running MongoDB on POWER8 delivers leadership performance and 2.2X
better price-performance than Intel Xeon E5-2690 v3 Haswell
IBM Power
S822LC
(16-core, 256GB)
HP
DL380 Gen9
(24-core, 256GB)
Server web price*
-3-year warranty
$16,295 $24,615
System Cost
-Server + RHEL OS + MongoDB
Annual Subscription
$29,584
($16,295 + $1,299 + $11,990)
$37,904
($24,615 + $1,299 +
$11,990)
MongoDB YCSB
(total operations per second)
297.5 k ops 169.5 k ops
$ / Op per Sec 100 $ / ops 223 $ / ops
2.2X
better
2.2X
Better Price-Performance
33%
Lower HW costs and
maintenance
75%
More Performance
per Server
@
•Based on IBM internal testing of single system and OS image running Yahoo Cloud Services Benchmark (YCSB) 0.6.0, 1M record workload at 50/50 read/write factor. Conducted under laboratory condition, individual result can vary based on workload size, use of
storage subsystems & other conditions.
• IBM Power System S822LC; 16 cores (2 x 8c chips) / 128 threads, POWER8; 3.3 GHz, 256 GB memory, MongoDB 3.3.4, RHEL 7.2. Competitive stack: HP Proliant DL380 Gen9; 24 cores (2 x 12c chips) / 48 threads; Intel E5-2690 v3; 2.6 GHz; 256 GB memory, MongoDB
3.3, RHEL 7.2 . Both server priced with 2 x 1TB SATA 7.2K rpm HDD, 1 Gb 2-port, 2 x 16gbps FCA. Configurations represent the highest processor frequency for that specific processor running the MongoDB server on 1 socket & the YCSB application workload on the 2nd
socket. RAM disk was used to focus testing on processor technology differences.
Pricing is based on web pricing for S822LC http://www-03.ibm.com/systems/power/hardware/s822lc-commercial/buy.html and HP DL380 Gen9 https://h22174.www2.hp.com/SimplifiedConfig/Index MongoDB https://www.mongodb.com/compare/mongodb-oracle
Page: 6

POWER Advantages for Spark
• Streaming and SQL benefit
from High Thread Density
and Concurrency
• Machine Learning benefits
from Large Caches and
Memory Bandwidth
• Graph also benefits from
Large Caches, Memory
Bandwidth and Higher
Computational Strength
26
Machine Learning SQL Graph
2X Core-to-Core Advantage
Machine Learning SQL Graph
1.5X Price Performance Advantage
7-Node S812LC 10-core vs. 7-Node E5-2690 v3 12-core

Future Proof Your Hadoop Infrastructure
• Total Cost of Ownership benefits of a Linux on Power decision
– Less infrastructure means reduced costs in many areas:
o Energy, cooling, server administration, floor space, SW licensing
– Position for future growth, avoid hitting the data center wall with cluster sprawl
– As your workloads evolve, POWER8 gives you options:
o Scale up each node by exploiting the memory bandwidth and multi-threading
o Add new workload optimized servers to the cluster (such a GPU with NVLink)
27

HDP and OpenPOWER – Better Together
• Leading Open Hadoop and Spark distribution on the leading Open Server
platform – built for Big Data, driving continuous innovation in the community
• Industry Experience to Lead Clients on an Analytics Journey
– Discovery workshops to identify the key business use cases and the client progression
• Mission Critical Support
– Stable, trusted Hadoop platform on proven Power System with outstanding client support
• Performance optimized for enterprise scale
– Price performance leadership with POWER8
– Exploiting POWER8 advantages for key workloads such as SQL and Spark
– Future exploitation of accelerators to maintain leadership
28© 2016 IBM Corporation

29
Hortonworks (HDP) on Power Roadmap
3Q2016
Ramp Up Program
HDP early code on Power
Head start on your Analytics
Journey
IBM and Hortonworks
assistance for non-prod
POCs
Ramp Up Program
HDP early code on Power
Head start on your Analytics
Journey
IBM and Hortonworks
assistance for non-prod
POCs
General
Availability
HDP on Power
Proven Power
Reference Configs
Hortonworks and
IBM production
Support
General
Availability
HDP on Power
Proven Power
Reference Configs
Hortonworks and
IBM production
Support
General
Availability
HDP next on
Power
Simultaneous
releases
General
Availability
HDP next on
Power
Simultaneous
releases
4Q2016 Q1 2017 later 2017
Partnership
Announce
Edge Ann Sept 19
Partnership
Announce
Edge Ann Sept 19
Note: Timing and content subject to change.

How to Get Started with HDP on OpenPOWER Systems
• Join the Hortonworks Community: https://community.hortonworks.com/
• Learn more about the benefits of Hortonworks: http://hortonworks.com/training/
• Learn more about the benefits of IBM Power Systems and OpenPOWER
• If you are interested in discussing a HDP on Power Systems option or proposal,
talk to your Hortonworks or Power sales reps
30© 2016 IBM Corporation

Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems

More Related Content

What's hot

Viewers also liked

Similar to Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems

More from Hortonworks

Recently uploaded

Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems

Editor's Notes