SlideShare a Scribd company logo
1 of 34
Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM.
9.0
© Copyright IBM Corporation 2015
IBM Solution for Hadoop offering
a modest experience as Techline POWER
Daniela Zuppini
Consulting IT Specialist
IBM
© Copyright IBM Corporation 2015
Session objectives
• What is Hadoop
• How to proceed
• IBM Solution for Hadoop POWER System edition
• IBM Data Engine for Analytics
• Network
• Software
© Copyright IBM Corporation 2015
Techline
Techline Mission
To be a strong technical sales partner
• To be your technical sales support supplier of choice.
• Providing pre-sales technical expertise for client opportunities,
helping you to win business.
• Local links to technical sales making us highly available and
responsive.
• Using our global structure to deliver efficient and consistent service.
© Copyright IBM Corporation 2015
Techline Products supported
• IBM Systems
• Power Systems – AIX, IBM i
• Pure Systems (Power & Hybrid)
• z System – z/OS, z/VM, z/VSE
• System Storage – Disk, Tape
• z System Software
• Linux on all IBM platforms
• Solutions
• ISV Sizing
• IT Optimisation, Virtualisation
• Cloud, Analytics, Mobile, Social, Security
• GTS Services
• Security Services (Appliances and Software)
• Security Services (MSS and PSS)
• Softek TDMF
• Tivoli Live Monitoring Services (TLMS)
• IBM Software Sizing
© Copyright IBM Corporation 2015
How to contact Techline
*Business Partners http://www.ibm.com/partnerworld/techline
IBM Raise a request: http://w3.ibm.com/support/techline/eu/trg/form.html
Quick Chat: https://ibm.biz/SASHA_EU
* Techline Access is limited to Advanced and Premier level Business Partners and Members who purchase a Valuepack
What is Hadoop
Innovation is also the capability to combine
existing practices and ideas in a new way
© Copyright IBM Corporation 2015
διαίρει κα βασίλευεὶ
Better known as
DIVIDE ET IMPERA
© Copyright IBM Corporation 2015
What is Hadoop
Suppose to have a very very simple algorithm
if word then return (word,1)
Suppose to execute this operation in massive parallel way
On a very large amount of input data es. all latin literature production
if you split input files into blocks
and distribute across all nodes
you can obtain
you the
roman emperor hit parade in short time
This parallelism is possible because each block of data is independent from the other
And the order of data processing is not relevant.
Hadoop framework basic components
• EXECUTE: MAP/REDUCE
input output in the form of (key,value) pairs
mapper and reducer executed in parallel
No order control for maps or reducer execution
batch data processing engine
Java code
• STORAGE: HDFS
Data spread across cluster nodes
Write once /read many
sequential read of large files
Optimized to process large file
3 data replica across nodes
© Copyright IBM Corporation 2015
Input data
Map Map Map
Reduce Reduce Reduce
shuffle & sort space
Output data
Hadoop architecture
© Copyright IBM Corporation 2015
3W replication
Name NodeJob Tracker
Data node &
Task tracker
data blocks
Standby Name Node
Cluster network
Data node &
Task tracker
Data node &
Task tracker
map reduce
distributed file system
What should I use Hadoop for?
• log and machine data analysis
leave the page after seeing the cost of shipping?
are certain products abandoned more than others?
• preprocess and transform data for DW
Explore data determining what data can be moved to the DW
• Fraud detection
capture deviation from standard usage
• Risk modeling
customer churn prediction to predict risk client moving to competitor
• Social sentiment analysis
analyzing social activities to predict ratings of popular events
© Copyright IBM Corporation 2015
Hadoop solve my problem?
© Copyright IBM Corporation 2015
Hadoop is not a good choice for
•For intense calculations with little or no data
•When your processing cannot be easily made parallel
•When your data is not self-contained
•When you need interactive results
•Not a RDBMS replacement
Yes if you can rewrite your algorithms as maps and reduces.
If not, then no.
How to proceed
Do not start from a bunch of servers and storage
approach from the perspective of a solution
•Engage Big Data Systems Center
email to BDSC@US.IBM.COM or
Submit Online Request: https://ibm.biz/BdFfcV for IBMers
Analyze your input requirements
Obtain sizing and reference architecture
•Engage IBM Montpellier Client Center - Power Linux Center
email to christophe.menichetti@fr.ibm.com or
BDSC - IOT Europe Leader
Big Data Power Linux Workshop / benchmark / Showcase / Architecture
•Contact Techline
IBM: http://w3.ibm.com/support/techline
BP: http://www.ibm.com/partnerworld/techline
transform HW architecture into a configuration proposal
© Copyright IBM Corporation 2015
What to ask to customer
• Workload definition:
– Will any of the workload be MapReduce and/or NoSQL, BigSQL, Hbase ?
– Does the customer consider the workload simple, medium or complex
– Are there any response time or thru-put requirements ?
• For the Ingestion of data:
– What is the peak ingestion rate required for the data?
– Please describe what type of data (structured, unstructured, semi-
structured)?
• Capacity sizing.
– What is the raw data size ( in TB) the environment is initially required ?
– What is the compression rate the customer assumes ?
– What is the data growth rate over what period of time?
• What are the High Availability requirements for the BigInsights environment ?
• What are the Disaster recovery requirements ?
– What is the RTO (Recovery Time Objective) and RPO (recovery Point
Objective) for the environment ?
• Does the customer expect to use GPFS or HDFS ?
• Do we need to include Linux licenses in the configuration ? (or do they have an
enterprise Lic.)
© Copyright IBM Corporation 2015
BDSC SIZING
© Copyright IBM Corporation 2015
Attention to these points
Compression rate
If compression rate =35% the calculation perfomed are
If the Raw data is 100 GB
(Raw data + replication) is 100 * 3 = 300 GB
(Raw data + replication + 25%ShuffleSort)=300 GB + (300 GB * 0.25) = 375 GB
Applying 35% compression rate = (375 GB * 0.35) =131.25GB
IBM Solution for Hadoop POWER System edition
© Copyright IBM Corporation 2015
Map/Reduce
&
GPFS/FPO
Data node + Metadata
Platform Symphony
Job tracker
Management Nodes
Data Nodes
Platform Symphony
Job tracker
Platform Symphony
Job tracker
Cluster network
Map/Reduce
&
GPFS/FPO
Data node + Metadata
Map/Reduce
&
GPFS/FPO
Data node + Metadata
This architecture refers to IBM Biginsight V3
Basic components
© Copyright IBM Corporation 2015
• Hadoop management nodes
This node provides the Job Tracker function and a web interface that enables
users to access the cluster and run their applications.
Production cluster
minimum of three Management nodes.
for HA minimum six Management nodes
Non-production cluster
it is acceptable one Management Node.
• Data nodes
House IBM GPFS (or the Hadoop HDFS as an alternative) and the
MapReduce. Number of data nodes depends on workload and amount of data
• Edge nodes
These nodes act as a boundary between the Big Data cluster and the
outside (client) environment.
• System management node
This node is an administrative console designed to cover cluster deployment
and management operations.
Data Node POD based design
© Copyright IBM Corporation 2015
Large POD – A
S822L w/ 1 DCS3700: 60 x 3.5” 7.2K RPM 4 TB LFF SAS +
12 x 2.5” 10K RPM 1.2 TB SFF SAS
254.4 TB
Small POD
S822L w/ Internal Drives: 12 x 2.5” 10K RPM 1.2 TB SFF SAS
14.3 TB
Medium POD
S822L w/ EXP24S: 36 x 2.5” 10K RPM 1.2 TB SFF SAS
43.2 TB
Large POD – B
S822L w/ 1 DCS3700: 60 x 3.5” 7.2K RPM 4 TB LFF SAS +
24 x 2.5” 10K RPM 1.2 TB SFF SAS
268.8 TB
Big Data clusters are built using a simple building block approach to tailor the
mix of CPU and storage to application requirements
A POD is represented by server S822L + storage EXP24S/DCS3700
MEDIUM POD
© Copyright IBM Corporation 2015
S822L is a special edition for Linux
POWER8 dual-chip module (DCM) processors
20 cores @ 3.42 GHz or 24 cores @ 3.02 GHz fully activated
128GB or 256GB and up to 1024 GB of memory
Twelve SFF-3 Bays/DVD Bay with 12x disks 2.5” 10K RPM 1.2 TB SFF SAS
Split feature to 6+6 SFF-3 Bays: Add a second SAS Controller
One Expansion EXP24S with 24 x disks 2.5” 10K RPM 1.2 TB SFF SAS
Hot-swap PCIe Gen 3 slots 9 adapter slots
2x EL3B PCIe3 LP RAID SAS ADAPTER to connect EXP24S/DCS3700
1x #5260 PCIe2 LP 4-port 1GbE Adapter for cluster management network
For data network 3 alternatives:
2x #EL27 PCIe2 LP 2-Port 10GbE RoCE SFP+ Adapter + 4x #EN02 cables
2x #EC3A PCIe3 LP 2-Port 40GbE NIC RoCE QSFP+ Adapter + 4x #EB2H cables
2x #EL3D PCIe3 LP 2-port 56Gb FDR IB Adapter x16 + 4x #EB4A cables
Medium POD storage connection
© Copyright IBM Corporation 2015
EXP24S modes are set by IBM manufacturing
EXP24S must be in Mode 1 #EJR2: one set of 24 disk bays
EXP24S
#EL3B
#EL3B
#ECBU
#ECBU
EXP24S Best price performance per IO bandwith
© Copyright IBM Corporation 2015
How to calculate total cluster data space
Space for shuffle/sort data
workload dependent
rule of thumb +25% of total disk space
Number of replicas
Default 3 replicas
Compression rate
dependent on customer data type
user data and shuffle/sort space can be compressed
as in sizing tool compression rate = 1-%compression
Total Disk Space =
(User raw data + 25%) * N. replicas * compression rate
POSH solution in econfig
POD is S822L/S812L with direct connection to EXP24S/DCS3700 used in RAID0
• No preconfigured Solution
• standard POWER + Storage configuration
• HA management nodes have 2 static lpars
no POWERVM
• EXP24S is the server expansion #EL1S
2x#EL3B per node + 2x#ECBUcables
• DCS3700 is the storage product 1818-80C
available in Peripheral section of econfig
2x#EL3B per node + 4x#ECC5 cables
• Consider 40U as rack available space
• Consider 3-phase power
• Add TOR switches
IBM Solution for Hadoop with IDEA
© Copyright IBM Corporation 2015
GPFS/native RAID
Data node
map/reduce
Platform Symphony
HA Job tracker
Platform Symphony
HA Job tracker
Workload Management Nodes
Analytic Nodes
GPFS/native RAID
Data node
map/reduce map/reduce
Cluster network
Platform Symphony
Job tracker
IBM Data Engine for Analytics
© Copyright IBM Corporation 2015
IBM Data Engine for Analytics is based on
IBM Elastic Storage
IDEA in econfig
© Copyright IBM Corporation 2015
Available in Hardware Solutions as preconfigured set of components
5146-S22 Analytic/Workload Management node
24-core 3.34 GHz processor and 256 GB memory
Max 16 analytic nodes x rack
Workload management node default HA
Split bus to boot 2 lpars
Network adapters for external connectivity
#EL27 PCIe2 LP 2-Port 10GbE RoCE SFP+ Adapter (without Optics)
#EL2Z PCIe2 LP 2-Port 10GbE RoCE SR Adapter (Optics) - EC29
#EC3A PCIe3 LP 2-Port 40GbE NIC RoCE QSFP+ Adapter
#EL3D PCIe3 LP 2-port x16 56Gb FDR IB Adapter
ESS 5146 models GL2 or GL4
Select the model then select 2 or 4 TB disks
Mes upgrade via scale out
8147-21L management node
HMC
Ethernet switches configurable with IDEA
G8052 7120-48E
G8124E 7120-24L
G8264 7120-64C
© Copyright IBM Corporation 2015
How to calculate total IDEA cluster data space
Space for shuffle/sort data
workload dependent
rule of thumb +25% of total disk space
Compression rate
dependent on customer data type
user data and shuffle/sort space can be compressed
as in BDSC sizing tool compression rate = 1-%compression
Fit Total Disk Space with ESS Available file system capacity
Total Disk Space = (User raw data + 25%) * compression rate
© Copyright IBM Corporation 2015
Cluster Network
• data network
 Dedicated to Map/reduce
 Needs its own VLAN and subnet (for example, 10.1.x.x as per predefined
configuration)
 Typically it’s private cluster and the interface to the client’s corporate data network
is obtained using one or more edge nodes
• management network
 For management of all nodes: for provisioning the operating system, deploying
software components and applications, monitoring, and workload management.
 Traffic not related to map-reduce
 Needs its own VLAN and subnet
• service network
 Support HMC operations via FSP, but Hadoop has no dependency on it
 Hardware-level management functions include power-cycling the node, hardware
status monitoring, firmware configuration, and hardware console access
 Needs its own VLAN and subnet
© Copyright IBM Corporation 2015
IDEA Cluster Network
© Copyright IBM Corporation 2015
IBM machine
type/model
SPECIFICATIONS Use Cases
7120-24L
G7028 • 24× 1 GbE RJ45 ports
• 4x 10 GbE SFP+ ports
• Service network
7120-48E
G8052
• 48x 1 GbE RJ45 ports
• 4x 10 GbE SFP+ ports
• Service network
• OS mgmnt network
7120-64C
G8264
• 48x 10 Gb SFP+ ports
• 4x 40 Gb QSFP+ ports, or
(16x 10Gb SPF+ ports)
• Data network
• OS mgmnt network
LENOVO
G8316
• 16 QSFP+ 40GbE ports, or (64x
10Gb SPF+ ports)
• Data network
• Core switch for small
cluster
LENOVO
G8332
• 32 QSFP+ 40GbE ports, or (128x
10Gb SPF+ ports)
• Data network
• Core switch for large
cluster
http://www-03.ibm.com/systems/networking/switches/rack.html
TOR Switch
© Copyright IBM Corporation 2015
Adapter and cabling
 1GbE adapter
Adapter: #5260 PCIe2 LP 4-port 1GbE
Cable: CAT5E Ethernet
#1111 1m / #1112 10m / #1113 25m
10GbE adapter
Adapter: #EL27 PCIe2 LP 2-Port 10GbE RoCE SFP+
Cable: 10Gb E'Net Cable SFP+ Act Twinax Copper
Connects to switch with 10GbE SFP+ port
#EN02 3m (9.8ft) / #EN03 5m (16.4-ft)
 40GbE adapter
Adapter: #EC3A PCIe3 LP 2-Port 40GbE NIC RoCE QSFP+
Cables: DAC cable IBM Passive QSFP+ to QSFP+
Connects to switch with 40GbE QSFP+ port
#EB2H 3m (9.8ft) / #ECBN 5m (16.4-ft) /#ECBP 7m (23.1-ft)
 Infiniband adapter
Adapter: #EL3D PCIe3 LP 2-port 56Gb FDR IB Adapter x16
Cables: FDR IB SWITCH OPTICAL CABLE, QSFP/QSFP
#EB4A 3m / #EB4B 5m
© Copyright IBM Corporation 2015
BigInsight Enterprise Edition V3.0
IBM GPFS-FPO 3.5 and Platform Symphony MR are included with BigInsight V3
© Copyright IBM Corporation 2015
SW to be configured with econfig
Data/Analytic/Managememt Node
•Red Hat Linux AS v6 for POWER basic license per socket
1/3 yr support+ maintenance from Red Hat
Or
1 yr IBM support + 1/3 yr maintenance from Red Hat
•IBM Platform Cluster Manager Advanced Edition V4 per Server
•GPFS licenses included with Biginsight Enterprise edition
ESS Server
•Red Hat Enterprise Linux v7 for POWER basic license per socket
1/3 yr support+ maintenance from Red Hat
Or
1 yr IBM support + 1/3 yr maintenance from Red Hat
•IBM GPFS V4 standard Edition per socket +
•IBM GPFS Native RAID for GPFS Server v4 per Server
•IBM Platform Cluster Manager Advanced Edition V4 per Server
© Copyright IBM Corporation 2015
Lab Services 6911-400
If the solution contains an ESS you have to add Lab Services
1. For each 8247-21L (EMS): 3 units
2. For each pair of 8247-22L: add 2 units
3. Minimum # of units is 5 units, even if above resolves to less than 5
units.
Example IDEA with GL2 and 5 Analytic nodes and 3 workload management nodes
GL2 + service Management node = 5 Units
8 S822L nodes = 8 Units
TOTAL = 13 Units
Lab services are in econfig in tab LAB Services
© Copyright IBM Corporation 2015
Special notices
© Copyright IBM Corporation 2015
IBM, the IBM logo, ibm.com AIX, AIX (logo), AIX 6 (logo), AS/400, BladeCenter, Blue Gene, ClusterProven, DB2, ESCON, i5/OS, i5/OS (logo), IBM Business Partner (logo),
IntelliStation, LoadLeveler, Lotus, Lotus Notes, Notes, Operating System/400, OS/400, PartnerLink, PartnerWorld, PowerPC, pSeries, Rational, RISC System/6000,
RS/6000, THINK, Tivoli, Tivoli (logo), Tivoli Management Environment, WebSphere, xSeries, z/OS, zSeries, AIX 5L, Chiphopper, Chipkill, Cloudscape, DB2 Universal
Database, DS4000, DS6000, DS8000, EnergyScale, Enterprise Workload Manager, General Purpose File System, , GPFS, HACMP, HACMP/6000, HASM, IBM Systems
Director Active Energy Manager, iSeries, Micro-Partitioning, POWER, PowerExecutive, PowerVM, PowerVM (logo), PowerHA, Power Architecture, Power Everywhere,
Power Family, POWER Hypervisor, Power Systems, Power Systems (logo), Power Systems Software, Power Systems Software (logo), POWER2, POWER3, POWER4,
POWER4+, POWER5, POWER5+, POWER6, POWER6+, System i, System p, System p5, System Storage, System z, Tivoli Enterprise, TME 10, Workload Partitions
Manager and X-Architecture are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these
and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or
common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A
current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml
The Power Architecture and Power.org wordmarks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org.
UNIX is a registered trademark of The Open Group in the United States, other countries or both.
Linux is a registered trademark of Linus Torvalds in the United States, other countries or both.
Microsoft, Windows and the Windows logo are registered trademarks of Microsoft Corporation in the United States, other countries or both.
Intel, Itanium, Pentium are registered trademarks and Xeon is a trademark of Intel Corporation or its subsidiaries in the United States, other countries or both.
AMD Opteron is a trademark of Advanced Micro Devices, Inc.
Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries or both.
TPC-C and TPC-H are trademarks of the Transaction Performance Processing Council (TPPC).
SPECint, SPECfp, SPECjbb, SPECweb, SPECjAppServer, SPEC OMP, SPECviewperf, SPECapc, SPEChpc, SPECjvm, SPECmail, SPECimap and SPECsfs are
trademarks of the Standard Performance Evaluation Corp (SPEC).
NetBench is a registered trademark of Ziff Davis Media in the United States, other countries or both.
AltiVec is a trademark of Freescale Semiconductor, Inc.
Cell Broadband Engine is a trademark of Sony Computer Entertainment Inc.
InfiniBand, InfiniBand Trade Association and the InfiniBand design marks are trademarks and/or service marks of the InfiniBand Trade Association.
Other company, product and service names may be trademarks or service marks of others.

More Related Content

What's hot

Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldDataWorks Summit
 
S ss0885 spectrum-scale-elastic-edge2015-v5
S ss0885 spectrum-scale-elastic-edge2015-v5S ss0885 spectrum-scale-elastic-edge2015-v5
S ss0885 spectrum-scale-elastic-edge2015-v5Tony Pearson
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit MumbaiAnand Haridass
 
S016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710bS016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710bTony Pearson
 
S100298 pendulum-swings-orlando-v1804a
S100298 pendulum-swings-orlando-v1804aS100298 pendulum-swings-orlando-v1804a
S100298 pendulum-swings-orlando-v1804aTony Pearson
 
Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014
Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014
Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014aziksa
 
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsEsther Vasiete
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowJulien Le Dem
 
Webinar: How Snapshots CAN be Backups
Webinar: How Snapshots CAN be BackupsWebinar: How Snapshots CAN be Backups
Webinar: How Snapshots CAN be BackupsStorage Switzerland
 
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...MapR Technologies
 
Performance tuning your Hadoop/Spark clusters to use cloud storage
Performance tuning your Hadoop/Spark clusters to use cloud storagePerformance tuning your Hadoop/Spark clusters to use cloud storage
Performance tuning your Hadoop/Spark clusters to use cloud storageDataWorks Summit
 
S100294 bcdr-seven-tiers-orlando-v1804a
S100294 bcdr-seven-tiers-orlando-v1804aS100294 bcdr-seven-tiers-orlando-v1804a
S100294 bcdr-seven-tiers-orlando-v1804aTony Pearson
 
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...Anand Haridass
 
S104872 spectrum nas-one-day-jburg-v1809e
S104872 spectrum nas-one-day-jburg-v1809eS104872 spectrum nas-one-day-jburg-v1809e
S104872 spectrum nas-one-day-jburg-v1809eTony Pearson
 
Scaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedInScaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedInDataWorks Summit
 
Migrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeMigrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeDataWorks Summit
 
S cv3179 spectrum-integration-openstack-edge2015-v5
S cv3179 spectrum-integration-openstack-edge2015-v5S cv3179 spectrum-integration-openstack-edge2015-v5
S cv3179 spectrum-integration-openstack-edge2015-v5Tony Pearson
 
Overcoming write availability challenges of PostgreSQL
Overcoming write availability challenges of PostgreSQLOvercoming write availability challenges of PostgreSQL
Overcoming write availability challenges of PostgreSQLEDB
 
Big Data Education Webcast: Introducing DMX and DMX-h Release 8
Big Data Education Webcast: Introducing DMX and DMX-h Release 8Big Data Education Webcast: Introducing DMX and DMX-h Release 8
Big Data Education Webcast: Introducing DMX and DMX-h Release 8Precisely
 
Achieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureAchieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureUtkarsh Pandey
 

What's hot (20)

Mutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable WorldMutable Data in Hive's Immutable World
Mutable Data in Hive's Immutable World
 
S ss0885 spectrum-scale-elastic-edge2015-v5
S ss0885 spectrum-scale-elastic-edge2015-v5S ss0885 spectrum-scale-elastic-edge2015-v5
S ss0885 spectrum-scale-elastic-edge2015-v5
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
 
S016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710bS016828 storage-tiering-nola-v1710b
S016828 storage-tiering-nola-v1710b
 
S100298 pendulum-swings-orlando-v1804a
S100298 pendulum-swings-orlando-v1804aS100298 pendulum-swings-orlando-v1804a
S100298 pendulum-swings-orlando-v1804a
 
Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014
Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014
Cost of Ownership for Hadoop Implementation - Hadoop Summit 2014
 
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache Arrow
 
Webinar: How Snapshots CAN be Backups
Webinar: How Snapshots CAN be BackupsWebinar: How Snapshots CAN be Backups
Webinar: How Snapshots CAN be Backups
 
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
 
Performance tuning your Hadoop/Spark clusters to use cloud storage
Performance tuning your Hadoop/Spark clusters to use cloud storagePerformance tuning your Hadoop/Spark clusters to use cloud storage
Performance tuning your Hadoop/Spark clusters to use cloud storage
 
S100294 bcdr-seven-tiers-orlando-v1804a
S100294 bcdr-seven-tiers-orlando-v1804aS100294 bcdr-seven-tiers-orlando-v1804a
S100294 bcdr-seven-tiers-orlando-v1804a
 
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
 
S104872 spectrum nas-one-day-jburg-v1809e
S104872 spectrum nas-one-day-jburg-v1809eS104872 spectrum nas-one-day-jburg-v1809e
S104872 spectrum nas-one-day-jburg-v1809e
 
Scaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedInScaling Deep Learning on Hadoop at LinkedIn
Scaling Deep Learning on Hadoop at LinkedIn
 
Migrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeMigrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie Mae
 
S cv3179 spectrum-integration-openstack-edge2015-v5
S cv3179 spectrum-integration-openstack-edge2015-v5S cv3179 spectrum-integration-openstack-edge2015-v5
S cv3179 spectrum-integration-openstack-edge2015-v5
 
Overcoming write availability challenges of PostgreSQL
Overcoming write availability challenges of PostgreSQLOvercoming write availability challenges of PostgreSQL
Overcoming write availability challenges of PostgreSQL
 
Big Data Education Webcast: Introducing DMX and DMX-h Release 8
Big Data Education Webcast: Introducing DMX and DMX-h Release 8Big Data Education Webcast: Introducing DMX and DMX-h Release 8
Big Data Education Webcast: Introducing DMX and DMX-h Release 8
 
Achieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureAchieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azure
 

Viewers also liked

Don bosco y sus acciones de responsabilidad social
Don bosco y sus acciones de responsabilidad socialDon bosco y sus acciones de responsabilidad social
Don bosco y sus acciones de responsabilidad socialSusy Inés Bello Knoll
 
Hq34 article how would google run an association?
Hq34 article how would google run an association?Hq34 article how would google run an association?
Hq34 article how would google run an association?Ruud Janssen, DES, CMM
 
Guide to NoSQL with MySQL
Guide to NoSQL with MySQLGuide to NoSQL with MySQL
Guide to NoSQL with MySQLSamuel Rohaut
 
Resumen del Microsoft Big Data Stack
Resumen del Microsoft Big Data StackResumen del Microsoft Big Data Stack
Resumen del Microsoft Big Data StackEduardo Castro
 
Introducción al SQL Server 2016 Query Store
Introducción al SQL Server 2016 Query StoreIntroducción al SQL Server 2016 Query Store
Introducción al SQL Server 2016 Query StoreEduardo Castro
 
The Value of a Digital Ad
The Value of a Digital AdThe Value of a Digital Ad
The Value of a Digital AdcomScore, Inc.
 
seeding-knowledge2016
 seeding-knowledge2016 seeding-knowledge2016
seeding-knowledge2016Anca Dimulescu
 
Hyper city (Gap Analysis)
Hyper city (Gap Analysis)Hyper city (Gap Analysis)
Hyper city (Gap Analysis)Anjum Patel
 
Microsoft Dynamics 365 for Financials FAQ
Microsoft Dynamics 365 for Financials FAQMicrosoft Dynamics 365 for Financials FAQ
Microsoft Dynamics 365 for Financials FAQSolution Systems, Inc.
 
Hierarchy of organization objectives
Hierarchy of organization objectivesHierarchy of organization objectives
Hierarchy of organization objectivesBharti Bhutani
 
Retail Digital Transformation
Retail Digital TransformationRetail Digital Transformation
Retail Digital TransformationSonata Software
 

Viewers also liked (17)

Neo catalogue 2015
Neo catalogue 2015Neo catalogue 2015
Neo catalogue 2015
 
Company Profile
Company ProfileCompany Profile
Company Profile
 
Don bosco y sus acciones de responsabilidad social
Don bosco y sus acciones de responsabilidad socialDon bosco y sus acciones de responsabilidad social
Don bosco y sus acciones de responsabilidad social
 
Hq34 article how would google run an association?
Hq34 article how would google run an association?Hq34 article how would google run an association?
Hq34 article how would google run an association?
 
Naturaleza
NaturalezaNaturaleza
Naturaleza
 
san Portfolio
san Portfoliosan Portfolio
san Portfolio
 
Enmiendas de UPYD Presupuestos 2016
Enmiendas de UPYD Presupuestos 2016Enmiendas de UPYD Presupuestos 2016
Enmiendas de UPYD Presupuestos 2016
 
Guide to NoSQL with MySQL
Guide to NoSQL with MySQLGuide to NoSQL with MySQL
Guide to NoSQL with MySQL
 
Resumen del Microsoft Big Data Stack
Resumen del Microsoft Big Data StackResumen del Microsoft Big Data Stack
Resumen del Microsoft Big Data Stack
 
Introducción al SQL Server 2016 Query Store
Introducción al SQL Server 2016 Query StoreIntroducción al SQL Server 2016 Query Store
Introducción al SQL Server 2016 Query Store
 
The Value of a Digital Ad
The Value of a Digital AdThe Value of a Digital Ad
The Value of a Digital Ad
 
seeding-knowledge2016
 seeding-knowledge2016 seeding-knowledge2016
seeding-knowledge2016
 
Hyper city (Gap Analysis)
Hyper city (Gap Analysis)Hyper city (Gap Analysis)
Hyper city (Gap Analysis)
 
Microsoft Dynamics 365 for Financials FAQ
Microsoft Dynamics 365 for Financials FAQMicrosoft Dynamics 365 for Financials FAQ
Microsoft Dynamics 365 for Financials FAQ
 
Hierarchy of organization objectives
Hierarchy of organization objectivesHierarchy of organization objectives
Hierarchy of organization objectives
 
6 bkr v
6 bkr v6 bkr v
6 bkr v
 
Retail Digital Transformation
Retail Digital TransformationRetail Digital Transformation
Retail Digital Transformation
 

Similar to IBMHadoopofferingTechline-Systems2015

Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Boni Bruno
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsYong Feng
 
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...Filipe Miranda
 
ITsubbotnik Spring 2017: Dmitriy Yatsyuk "Готовое комплексное инфраструктурно...
ITsubbotnik Spring 2017: Dmitriy Yatsyuk "Готовое комплексное инфраструктурно...ITsubbotnik Spring 2017: Dmitriy Yatsyuk "Готовое комплексное инфраструктурно...
ITsubbotnik Spring 2017: Dmitriy Yatsyuk "Готовое комплексное инфраструктурно...epamspb
 
xTech2006_DB2onRails
xTech2006_DB2onRailsxTech2006_DB2onRails
xTech2006_DB2onRailswebuploader
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data AnalyticsAmazon Web Services
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutionssolarisyougood
 
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAccelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAlluxio, Inc.
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERinside-BigData.com
 
Design Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsDesign Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsAshish Mrig
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalAvere Systems
 
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...StampedeCon
 
In-Place analytics with Unified Data Access
In-Place analytics with Unified Data AccessIn-Place analytics with Unified Data Access
In-Place analytics with Unified Data AccessDataWorks Summit
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauSam Palani
 
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...DataWorks Summit
 
Why Hadoop is important to Syncsort
Why Hadoop is important to SyncsortWhy Hadoop is important to Syncsort
Why Hadoop is important to Syncsorthuguk
 
Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Tony Pearson
 

Similar to IBMHadoopofferingTechline-Systems2015 (20)

Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810
 
Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
 
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
 
The CDO Agenda: how data architecture can help?
The CDO Agenda: how data architecture can help?The CDO Agenda: how data architecture can help?
The CDO Agenda: how data architecture can help?
 
ITsubbotnik Spring 2017: Dmitriy Yatsyuk "Готовое комплексное инфраструктурно...
ITsubbotnik Spring 2017: Dmitriy Yatsyuk "Готовое комплексное инфраструктурно...ITsubbotnik Spring 2017: Dmitriy Yatsyuk "Готовое комплексное инфраструктурно...
ITsubbotnik Spring 2017: Dmitriy Yatsyuk "Готовое комплексное инфраструктурно...
 
xTech2006_DB2onRails
xTech2006_DB2onRailsxTech2006_DB2onRails
xTech2006_DB2onRails
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
 
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAccelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
Design Choices for Cloud Data Platforms
Design Choices for Cloud Data PlatformsDesign Choices for Cloud Data Platforms
Design Choices for Cloud Data Platforms
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute final
 
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
 
In-Place analytics with Unified Data Access
In-Place analytics with Unified Data AccessIn-Place analytics with Unified Data Access
In-Place analytics with Unified Data Access
 
Tame that Beast
Tame that BeastTame that Beast
Tame that Beast
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
 
Why Hadoop is important to Syncsort
Why Hadoop is important to SyncsortWhy Hadoop is important to Syncsort
Why Hadoop is important to Syncsort
 
Has Your Data Gone Rogue?
Has Your Data Gone Rogue?Has Your Data Gone Rogue?
Has Your Data Gone Rogue?
 

More from Daniela Zuppini

IBM Cognitive platform: IBM Watson
IBM Cognitive platform: IBM WatsonIBM Cognitive platform: IBM Watson
IBM Cognitive platform: IBM WatsonDaniela Zuppini
 
IBM Bluemix cloudfoundry platform
IBM Bluemix cloudfoundry platformIBM Bluemix cloudfoundry platform
IBM Bluemix cloudfoundry platformDaniela Zuppini
 
Octobus technical university def
Octobus technical university   defOctobus technical university   def
Octobus technical university defDaniela Zuppini
 
Approfondimento-cloud-IBM
Approfondimento-cloud-IBMApprofondimento-cloud-IBM
Approfondimento-cloud-IBMDaniela Zuppini
 

More from Daniela Zuppini (6)

Conoscerehyperledger
ConoscerehyperledgerConoscerehyperledger
Conoscerehyperledger
 
IBM Cognitive platform: IBM Watson
IBM Cognitive platform: IBM WatsonIBM Cognitive platform: IBM Watson
IBM Cognitive platform: IBM Watson
 
IBM Bluemix cloudfoundry platform
IBM Bluemix cloudfoundry platformIBM Bluemix cloudfoundry platform
IBM Bluemix cloudfoundry platform
 
Octobus technical university def
Octobus technical university   defOctobus technical university   def
Octobus technical university def
 
l011029
l011029l011029
l011029
 
Approfondimento-cloud-IBM
Approfondimento-cloud-IBMApprofondimento-cloud-IBM
Approfondimento-cloud-IBM
 

IBMHadoopofferingTechline-Systems2015

  • 1. Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. 9.0 © Copyright IBM Corporation 2015 IBM Solution for Hadoop offering a modest experience as Techline POWER Daniela Zuppini Consulting IT Specialist IBM
  • 2. © Copyright IBM Corporation 2015 Session objectives • What is Hadoop • How to proceed • IBM Solution for Hadoop POWER System edition • IBM Data Engine for Analytics • Network • Software
  • 3. © Copyright IBM Corporation 2015 Techline Techline Mission To be a strong technical sales partner • To be your technical sales support supplier of choice. • Providing pre-sales technical expertise for client opportunities, helping you to win business. • Local links to technical sales making us highly available and responsive. • Using our global structure to deliver efficient and consistent service.
  • 4. © Copyright IBM Corporation 2015 Techline Products supported • IBM Systems • Power Systems – AIX, IBM i • Pure Systems (Power & Hybrid) • z System – z/OS, z/VM, z/VSE • System Storage – Disk, Tape • z System Software • Linux on all IBM platforms • Solutions • ISV Sizing • IT Optimisation, Virtualisation • Cloud, Analytics, Mobile, Social, Security • GTS Services • Security Services (Appliances and Software) • Security Services (MSS and PSS) • Softek TDMF • Tivoli Live Monitoring Services (TLMS) • IBM Software Sizing
  • 5. © Copyright IBM Corporation 2015 How to contact Techline *Business Partners http://www.ibm.com/partnerworld/techline IBM Raise a request: http://w3.ibm.com/support/techline/eu/trg/form.html Quick Chat: https://ibm.biz/SASHA_EU * Techline Access is limited to Advanced and Premier level Business Partners and Members who purchase a Valuepack
  • 6. What is Hadoop Innovation is also the capability to combine existing practices and ideas in a new way © Copyright IBM Corporation 2015 διαίρει κα βασίλευεὶ Better known as DIVIDE ET IMPERA
  • 7. © Copyright IBM Corporation 2015 What is Hadoop Suppose to have a very very simple algorithm if word then return (word,1) Suppose to execute this operation in massive parallel way On a very large amount of input data es. all latin literature production if you split input files into blocks and distribute across all nodes you can obtain you the roman emperor hit parade in short time This parallelism is possible because each block of data is independent from the other And the order of data processing is not relevant.
  • 8. Hadoop framework basic components • EXECUTE: MAP/REDUCE input output in the form of (key,value) pairs mapper and reducer executed in parallel No order control for maps or reducer execution batch data processing engine Java code • STORAGE: HDFS Data spread across cluster nodes Write once /read many sequential read of large files Optimized to process large file 3 data replica across nodes © Copyright IBM Corporation 2015 Input data Map Map Map Reduce Reduce Reduce shuffle & sort space Output data
  • 9. Hadoop architecture © Copyright IBM Corporation 2015 3W replication Name NodeJob Tracker Data node & Task tracker data blocks Standby Name Node Cluster network Data node & Task tracker Data node & Task tracker map reduce distributed file system
  • 10. What should I use Hadoop for? • log and machine data analysis leave the page after seeing the cost of shipping? are certain products abandoned more than others? • preprocess and transform data for DW Explore data determining what data can be moved to the DW • Fraud detection capture deviation from standard usage • Risk modeling customer churn prediction to predict risk client moving to competitor • Social sentiment analysis analyzing social activities to predict ratings of popular events © Copyright IBM Corporation 2015
  • 11. Hadoop solve my problem? © Copyright IBM Corporation 2015 Hadoop is not a good choice for •For intense calculations with little or no data •When your processing cannot be easily made parallel •When your data is not self-contained •When you need interactive results •Not a RDBMS replacement Yes if you can rewrite your algorithms as maps and reduces. If not, then no.
  • 12. How to proceed Do not start from a bunch of servers and storage approach from the perspective of a solution •Engage Big Data Systems Center email to BDSC@US.IBM.COM or Submit Online Request: https://ibm.biz/BdFfcV for IBMers Analyze your input requirements Obtain sizing and reference architecture •Engage IBM Montpellier Client Center - Power Linux Center email to christophe.menichetti@fr.ibm.com or BDSC - IOT Europe Leader Big Data Power Linux Workshop / benchmark / Showcase / Architecture •Contact Techline IBM: http://w3.ibm.com/support/techline BP: http://www.ibm.com/partnerworld/techline transform HW architecture into a configuration proposal © Copyright IBM Corporation 2015
  • 13. What to ask to customer • Workload definition: – Will any of the workload be MapReduce and/or NoSQL, BigSQL, Hbase ? – Does the customer consider the workload simple, medium or complex – Are there any response time or thru-put requirements ? • For the Ingestion of data: – What is the peak ingestion rate required for the data? – Please describe what type of data (structured, unstructured, semi- structured)? • Capacity sizing. – What is the raw data size ( in TB) the environment is initially required ? – What is the compression rate the customer assumes ? – What is the data growth rate over what period of time? • What are the High Availability requirements for the BigInsights environment ? • What are the Disaster recovery requirements ? – What is the RTO (Recovery Time Objective) and RPO (recovery Point Objective) for the environment ? • Does the customer expect to use GPFS or HDFS ? • Do we need to include Linux licenses in the configuration ? (or do they have an enterprise Lic.) © Copyright IBM Corporation 2015
  • 14. BDSC SIZING © Copyright IBM Corporation 2015 Attention to these points Compression rate If compression rate =35% the calculation perfomed are If the Raw data is 100 GB (Raw data + replication) is 100 * 3 = 300 GB (Raw data + replication + 25%ShuffleSort)=300 GB + (300 GB * 0.25) = 375 GB Applying 35% compression rate = (375 GB * 0.35) =131.25GB
  • 15. IBM Solution for Hadoop POWER System edition © Copyright IBM Corporation 2015 Map/Reduce & GPFS/FPO Data node + Metadata Platform Symphony Job tracker Management Nodes Data Nodes Platform Symphony Job tracker Platform Symphony Job tracker Cluster network Map/Reduce & GPFS/FPO Data node + Metadata Map/Reduce & GPFS/FPO Data node + Metadata This architecture refers to IBM Biginsight V3
  • 16. Basic components © Copyright IBM Corporation 2015 • Hadoop management nodes This node provides the Job Tracker function and a web interface that enables users to access the cluster and run their applications. Production cluster minimum of three Management nodes. for HA minimum six Management nodes Non-production cluster it is acceptable one Management Node. • Data nodes House IBM GPFS (or the Hadoop HDFS as an alternative) and the MapReduce. Number of data nodes depends on workload and amount of data • Edge nodes These nodes act as a boundary between the Big Data cluster and the outside (client) environment. • System management node This node is an administrative console designed to cover cluster deployment and management operations.
  • 17. Data Node POD based design © Copyright IBM Corporation 2015 Large POD – A S822L w/ 1 DCS3700: 60 x 3.5” 7.2K RPM 4 TB LFF SAS + 12 x 2.5” 10K RPM 1.2 TB SFF SAS 254.4 TB Small POD S822L w/ Internal Drives: 12 x 2.5” 10K RPM 1.2 TB SFF SAS 14.3 TB Medium POD S822L w/ EXP24S: 36 x 2.5” 10K RPM 1.2 TB SFF SAS 43.2 TB Large POD – B S822L w/ 1 DCS3700: 60 x 3.5” 7.2K RPM 4 TB LFF SAS + 24 x 2.5” 10K RPM 1.2 TB SFF SAS 268.8 TB Big Data clusters are built using a simple building block approach to tailor the mix of CPU and storage to application requirements A POD is represented by server S822L + storage EXP24S/DCS3700
  • 18. MEDIUM POD © Copyright IBM Corporation 2015 S822L is a special edition for Linux POWER8 dual-chip module (DCM) processors 20 cores @ 3.42 GHz or 24 cores @ 3.02 GHz fully activated 128GB or 256GB and up to 1024 GB of memory Twelve SFF-3 Bays/DVD Bay with 12x disks 2.5” 10K RPM 1.2 TB SFF SAS Split feature to 6+6 SFF-3 Bays: Add a second SAS Controller One Expansion EXP24S with 24 x disks 2.5” 10K RPM 1.2 TB SFF SAS Hot-swap PCIe Gen 3 slots 9 adapter slots 2x EL3B PCIe3 LP RAID SAS ADAPTER to connect EXP24S/DCS3700 1x #5260 PCIe2 LP 4-port 1GbE Adapter for cluster management network For data network 3 alternatives: 2x #EL27 PCIe2 LP 2-Port 10GbE RoCE SFP+ Adapter + 4x #EN02 cables 2x #EC3A PCIe3 LP 2-Port 40GbE NIC RoCE QSFP+ Adapter + 4x #EB2H cables 2x #EL3D PCIe3 LP 2-port 56Gb FDR IB Adapter x16 + 4x #EB4A cables
  • 19. Medium POD storage connection © Copyright IBM Corporation 2015 EXP24S modes are set by IBM manufacturing EXP24S must be in Mode 1 #EJR2: one set of 24 disk bays EXP24S #EL3B #EL3B #ECBU #ECBU EXP24S Best price performance per IO bandwith
  • 20. © Copyright IBM Corporation 2015 How to calculate total cluster data space Space for shuffle/sort data workload dependent rule of thumb +25% of total disk space Number of replicas Default 3 replicas Compression rate dependent on customer data type user data and shuffle/sort space can be compressed as in sizing tool compression rate = 1-%compression Total Disk Space = (User raw data + 25%) * N. replicas * compression rate
  • 21. POSH solution in econfig POD is S822L/S812L with direct connection to EXP24S/DCS3700 used in RAID0 • No preconfigured Solution • standard POWER + Storage configuration • HA management nodes have 2 static lpars no POWERVM • EXP24S is the server expansion #EL1S 2x#EL3B per node + 2x#ECBUcables • DCS3700 is the storage product 1818-80C available in Peripheral section of econfig 2x#EL3B per node + 4x#ECC5 cables • Consider 40U as rack available space • Consider 3-phase power • Add TOR switches
  • 22. IBM Solution for Hadoop with IDEA © Copyright IBM Corporation 2015 GPFS/native RAID Data node map/reduce Platform Symphony HA Job tracker Platform Symphony HA Job tracker Workload Management Nodes Analytic Nodes GPFS/native RAID Data node map/reduce map/reduce Cluster network Platform Symphony Job tracker
  • 23. IBM Data Engine for Analytics © Copyright IBM Corporation 2015 IBM Data Engine for Analytics is based on IBM Elastic Storage
  • 24. IDEA in econfig © Copyright IBM Corporation 2015 Available in Hardware Solutions as preconfigured set of components 5146-S22 Analytic/Workload Management node 24-core 3.34 GHz processor and 256 GB memory Max 16 analytic nodes x rack Workload management node default HA Split bus to boot 2 lpars Network adapters for external connectivity #EL27 PCIe2 LP 2-Port 10GbE RoCE SFP+ Adapter (without Optics) #EL2Z PCIe2 LP 2-Port 10GbE RoCE SR Adapter (Optics) - EC29 #EC3A PCIe3 LP 2-Port 40GbE NIC RoCE QSFP+ Adapter #EL3D PCIe3 LP 2-port x16 56Gb FDR IB Adapter ESS 5146 models GL2 or GL4 Select the model then select 2 or 4 TB disks Mes upgrade via scale out 8147-21L management node HMC Ethernet switches configurable with IDEA G8052 7120-48E G8124E 7120-24L G8264 7120-64C
  • 25. © Copyright IBM Corporation 2015 How to calculate total IDEA cluster data space Space for shuffle/sort data workload dependent rule of thumb +25% of total disk space Compression rate dependent on customer data type user data and shuffle/sort space can be compressed as in BDSC sizing tool compression rate = 1-%compression Fit Total Disk Space with ESS Available file system capacity Total Disk Space = (User raw data + 25%) * compression rate
  • 26. © Copyright IBM Corporation 2015 Cluster Network • data network  Dedicated to Map/reduce  Needs its own VLAN and subnet (for example, 10.1.x.x as per predefined configuration)  Typically it’s private cluster and the interface to the client’s corporate data network is obtained using one or more edge nodes • management network  For management of all nodes: for provisioning the operating system, deploying software components and applications, monitoring, and workload management.  Traffic not related to map-reduce  Needs its own VLAN and subnet • service network  Support HMC operations via FSP, but Hadoop has no dependency on it  Hardware-level management functions include power-cycling the node, hardware status monitoring, firmware configuration, and hardware console access  Needs its own VLAN and subnet
  • 27. © Copyright IBM Corporation 2015 IDEA Cluster Network
  • 28. © Copyright IBM Corporation 2015 IBM machine type/model SPECIFICATIONS Use Cases 7120-24L G7028 • 24× 1 GbE RJ45 ports • 4x 10 GbE SFP+ ports • Service network 7120-48E G8052 • 48x 1 GbE RJ45 ports • 4x 10 GbE SFP+ ports • Service network • OS mgmnt network 7120-64C G8264 • 48x 10 Gb SFP+ ports • 4x 40 Gb QSFP+ ports, or (16x 10Gb SPF+ ports) • Data network • OS mgmnt network LENOVO G8316 • 16 QSFP+ 40GbE ports, or (64x 10Gb SPF+ ports) • Data network • Core switch for small cluster LENOVO G8332 • 32 QSFP+ 40GbE ports, or (128x 10Gb SPF+ ports) • Data network • Core switch for large cluster http://www-03.ibm.com/systems/networking/switches/rack.html TOR Switch
  • 29. © Copyright IBM Corporation 2015 Adapter and cabling  1GbE adapter Adapter: #5260 PCIe2 LP 4-port 1GbE Cable: CAT5E Ethernet #1111 1m / #1112 10m / #1113 25m 10GbE adapter Adapter: #EL27 PCIe2 LP 2-Port 10GbE RoCE SFP+ Cable: 10Gb E'Net Cable SFP+ Act Twinax Copper Connects to switch with 10GbE SFP+ port #EN02 3m (9.8ft) / #EN03 5m (16.4-ft)  40GbE adapter Adapter: #EC3A PCIe3 LP 2-Port 40GbE NIC RoCE QSFP+ Cables: DAC cable IBM Passive QSFP+ to QSFP+ Connects to switch with 40GbE QSFP+ port #EB2H 3m (9.8ft) / #ECBN 5m (16.4-ft) /#ECBP 7m (23.1-ft)  Infiniband adapter Adapter: #EL3D PCIe3 LP 2-port 56Gb FDR IB Adapter x16 Cables: FDR IB SWITCH OPTICAL CABLE, QSFP/QSFP #EB4A 3m / #EB4B 5m
  • 30. © Copyright IBM Corporation 2015 BigInsight Enterprise Edition V3.0 IBM GPFS-FPO 3.5 and Platform Symphony MR are included with BigInsight V3
  • 31. © Copyright IBM Corporation 2015 SW to be configured with econfig Data/Analytic/Managememt Node •Red Hat Linux AS v6 for POWER basic license per socket 1/3 yr support+ maintenance from Red Hat Or 1 yr IBM support + 1/3 yr maintenance from Red Hat •IBM Platform Cluster Manager Advanced Edition V4 per Server •GPFS licenses included with Biginsight Enterprise edition ESS Server •Red Hat Enterprise Linux v7 for POWER basic license per socket 1/3 yr support+ maintenance from Red Hat Or 1 yr IBM support + 1/3 yr maintenance from Red Hat •IBM GPFS V4 standard Edition per socket + •IBM GPFS Native RAID for GPFS Server v4 per Server •IBM Platform Cluster Manager Advanced Edition V4 per Server
  • 32. © Copyright IBM Corporation 2015 Lab Services 6911-400 If the solution contains an ESS you have to add Lab Services 1. For each 8247-21L (EMS): 3 units 2. For each pair of 8247-22L: add 2 units 3. Minimum # of units is 5 units, even if above resolves to less than 5 units. Example IDEA with GL2 and 5 Analytic nodes and 3 workload management nodes GL2 + service Management node = 5 Units 8 S822L nodes = 8 Units TOTAL = 13 Units Lab services are in econfig in tab LAB Services
  • 33. © Copyright IBM Corporation 2015
  • 34. Special notices © Copyright IBM Corporation 2015 IBM, the IBM logo, ibm.com AIX, AIX (logo), AIX 6 (logo), AS/400, BladeCenter, Blue Gene, ClusterProven, DB2, ESCON, i5/OS, i5/OS (logo), IBM Business Partner (logo), IntelliStation, LoadLeveler, Lotus, Lotus Notes, Notes, Operating System/400, OS/400, PartnerLink, PartnerWorld, PowerPC, pSeries, Rational, RISC System/6000, RS/6000, THINK, Tivoli, Tivoli (logo), Tivoli Management Environment, WebSphere, xSeries, z/OS, zSeries, AIX 5L, Chiphopper, Chipkill, Cloudscape, DB2 Universal Database, DS4000, DS6000, DS8000, EnergyScale, Enterprise Workload Manager, General Purpose File System, , GPFS, HACMP, HACMP/6000, HASM, IBM Systems Director Active Energy Manager, iSeries, Micro-Partitioning, POWER, PowerExecutive, PowerVM, PowerVM (logo), PowerHA, Power Architecture, Power Everywhere, Power Family, POWER Hypervisor, Power Systems, Power Systems (logo), Power Systems Software, Power Systems Software (logo), POWER2, POWER3, POWER4, POWER4+, POWER5, POWER5+, POWER6, POWER6+, System i, System p, System p5, System Storage, System z, Tivoli Enterprise, TME 10, Workload Partitions Manager and X-Architecture are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml The Power Architecture and Power.org wordmarks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org. UNIX is a registered trademark of The Open Group in the United States, other countries or both. Linux is a registered trademark of Linus Torvalds in the United States, other countries or both. Microsoft, Windows and the Windows logo are registered trademarks of Microsoft Corporation in the United States, other countries or both. Intel, Itanium, Pentium are registered trademarks and Xeon is a trademark of Intel Corporation or its subsidiaries in the United States, other countries or both. AMD Opteron is a trademark of Advanced Micro Devices, Inc. Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries or both. TPC-C and TPC-H are trademarks of the Transaction Performance Processing Council (TPPC). SPECint, SPECfp, SPECjbb, SPECweb, SPECjAppServer, SPEC OMP, SPECviewperf, SPECapc, SPEChpc, SPECjvm, SPECmail, SPECimap and SPECsfs are trademarks of the Standard Performance Evaluation Corp (SPEC). NetBench is a registered trademark of Ziff Davis Media in the United States, other countries or both. AltiVec is a trademark of Freescale Semiconductor, Inc. Cell Broadband Engine is a trademark of Sony Computer Entertainment Inc. InfiniBand, InfiniBand Trade Association and the InfiniBand design marks are trademarks and/or service marks of the InfiniBand Trade Association. Other company, product and service names may be trademarks or service marks of others.

Editor's Notes

  1. Notes: Instructor notes: Purpose — List the unit objectives Details — Additional information — Transition statement —
  2. Notes: Instructor notes: Purpose — List the unit objectives Details — Additional information — Transition statement —
  3. Notes: Instructor notes: Purpose — List the unit objectives Details — Additional information — Transition statement —
  4. Notes: Instructor notes: Purpose — List the unit objectives Details — Additional information — Transition statement —
  5. Notes: Instructor notes: Purpose — List the topic objectives Details — Additional information — Transition statement —
  6. Notes: Instructor notes: Purpose — List the topic objectives Details — Additional information — Transition statement —
  7. Notes: Instructor notes: Purpose — List the topic objectives Details — Additional information — Transition statement —
  8. Notes: Instructor notes: Purpose — List the topic objectives Details — Additional information — Transition statement —
  9. Notes: Instructor notes: Purpose — List the topic objectives Details — Additional information — Transition statement —
  10. Notes: Instructor notes: Purpose — List the topic objectives Details — Additional information — Transition statement —
  11. Notes: Instructor notes: Purpose — List the topic objectives Details — Additional information — Transition statement —
  12. Notes: Instructor notes: Purpose — List the unit objectives Details — Additional information — Transition statement —
  13. Notes: Instructor notes: Purpose — List the topic objectives Details — Additional information — Transition statement —
  14. Notes: Instructor notes: Purpose — List the topic objectives Details — Additional information — Transition statement —