© 2017 IBM Corporation
A	Modern	Data	Platform to	meet all	of	
your	data	type needs
@mRcSqZd
Technical	Leader
IBM	Cognitive	Systems
#ThinkCIO
Your	customer	expect	innovative	and	
personalized	digital	experiences….driven	by	data	
Personalized	
and	Engaging
Location	Aware On	Demand
By	2018,	more	than	
of	new	in-
house	apps			
will	be	developed	on	
an	Open	Source	database,	
of	existing	apps
will	be	converted.
70%
50%
“The State of Open-Source RDBMSs”
Superior	digital	experiences	
must	be	built,	they	can’t	be	bought
75%
of	application	development	supporting	digital	
business	will	be	built	not	bought by	2020
Source:	Gartner
Seizing	Opportunity…	
Sources:	E&Y	and	Forrester
12%Amount	of	data	most	companies	
estimate	they	are	analyzing
80%	Time	analysts	will	spend	trying	to	
create	data	sets	to	draw	insights
83%Of	companies	agree	that	data	
is	their	most	strategic	asset
Crossing	The	Data	Chasm
DB-Engines Ranking
https://db-engines.com/en/ranking_trend
The	Data	Analysis	Journey
Data
Foundation
New Data
Source
Data
Lake
Data
Science
Differentiation
Business	Value
New	Business	
Models
BigData &	
Analytics
NextGen
Data	Sources
Traditional
Data	Flood
• INCREASING DATA VARIETY
Search
Marketing
Behavioral
Targeting
Dynamic
Funnels
User
Generated
Content
Mobile Web
SMS/MMS
Sentiment
HD Video
Speech To
Text
Product/
Service Logs
Social
Network
Business
Data Feeds
User Click
Stream
Sensors Infotainment
Systems
Wearable
Devices
Cyber
Security
Logs
Connected
Vehicles
Machine
Data
IoT Data
Dynamic
Pricing
Payment
Record
Purchase
Detail
Purchase
Record
Support
Contacts
Segmentation
Offer
Details
Web
Logs
Offer
History
A/B
Testing
BUSINESS
PROCESS
PETABYTESTERABYTESGIGABYTESEXABYTESZETTABYTES
Streaming
Video
Natural
Language
Processing
WEB
DIGITAL
Intelligence
Data	NeedFrom to
Source: NVIDIA
New	data	sources Social LocationSensors
Data
Services
Transactional
b
SQL	
PostgreSQL	
compatible
SQL
MySQL	
compatible
New	data	models	- NoSQL
Document,	Column,	Key	Value,	Graph	
Conventional	data	platform
Mobile
“New”	relational	databases
Open	Source	SQL	
New	data	models
Hadoop/Spark
Unstructured	Data	
A	Modern	Data	Platform	is	born
11
Leading Open Source Technologies
Available and Optimized for Linux on Power
***Paquetes y Software disponible para Linux en Power: https://developer.ibm.com/linuxonpower/open-source-pkgs/
New	Database	Paradigms
Distributed
Hierarchical
Databases
A scalable data
architecture
A parallel and distributed
programming model
Open source community
innovation (Apache
Hadoop)
Relational
Databases
NoSQL
Databases
Analytics capability for
multiple
data types, often used in
mobile and social
workloads
Scalability and flexibility
for different data store
models
ACID compliance for
Transactions and
Reporting
Instant insight from
real-time operational
data
Tuning and indexing for
performance
In Memory Databases
Increased performance by bringing data closer to compute
ACID = Atomicity, Consistency, Isolation,
Durability
BASE = Basically Available, Soft State,
Eventually Consistent
IBM	Internal	and	Business	Partners	Only
What is NoSQL ?
• Stands for “Not Only SQL” *
• Introduced in 2009 by Eric Evans of Rackspace, a committer on the Cassandra project
• Often used in the sense of “Not only SQL” to describe the surge of new projects and products.
• Use non-relational data stores, does not require fixed table schema (rows/columns)
• More flexible data schemas, variety of data store models, run well on distributed networks
• Key value, document store, wide column, graph, search ……
• Popular with developers building mobile/cloud scale apps with frequently changing data
• NoSQL data stores relax one or more of the ACID properties used by Relational data
•Atomicity, Consistency, Isolation, Durability
• NoSQL shifts data store properties to BASE to support exponential growth of data
•Basically Available, Soft State, Eventually Consistent
* Source: A Brief History of NoSQL - http://blog.knuthaugen.no/2010/03/a-brief-history-of-nosql.html
13
NoSQL	Open	Source	DB	Types
Complexity
Size
Key Value
Stores
Wide Column
Document
Stores Graph
Scales Easily Hard to Scale
Basic Function Rich Function
14
Apache	Spark
• An	Apache	Foundation	open	source	project.		Not	
a	product.
• An	in-memory	compute	engine	that	works	with	
data.		Not	a	data	store.
• Enables	highly	iterative	analysis	on	large	volumes	
of	data	at	scale
• Unified	environment	for	data	scientists,	
developers	and	data	engineers
• Radically	simplifies	process	of	developing	
intelligent	apps	fueled	by	data.
from	
http://spark.apache.org
Spark	on	POWER	in	the	Enterprise:	Workload	
characteristics
Machine	Learning
SNAP	ML
Graph	Analytics
Security,	Fraud	Detection
Social	Network	Analytics
Video/Speech	Analytics
Object	Recognition
Dialog
Transaction	processing
Websphere
Future	Workloads	add	a	new	dimension:
Understanding	the	WorldTraditional	Workloads:
Automating	the	World
Spark	on	Power	– Many	Options
• Hortonworks	Data	Platform
• Spark comes with managed version of Hadoop
• Integrates seamlessly with YARN and HDFS
• Spark is part of larger ecosystem of Hadoop
components
• Platform	Conductor
• Purpose built Spark platform
• Optimized for running spark based workloads
• Data	Science	Experience	Local
• Data science collaboration environment
• Brings together Spark / Python / Rstudio / Notebooks /
in single platform
• Open	source	spark
• Build and manage your own platform
The	Modern	Data	Platform	on	IBM	POWER	Systems
Data Lakes
Data Management, Streaming
New Apps on Open Source DBs
Advanced Analytics & Insights Cognitive
Workloads
PERSIST ANALYZE & ACT
Data
Sources
INGEST
Conventional
Data	Sources
Third-Party
Data
Transactional	
Data
Application	
Data
New
Data	Sources
Machine	&
Sensor	Data	
Image	&		Video	
Mobile,			
Location	
Social	Data	
Internet	of	
Things	Data	
Relational	/	
Transactional	
Databases
Data	
Warehouses
NoSQL	DBs
Data	in	
Motion	apps
Sense	and	
Respond
Discovery	and	
Exploration
Machine	
Learning	/	
Deep	Learning
Predictive
Artificial	
Intelligence
Open	Source	SQL
Hadoop
Spark New	Ind.	Apps:
360	client	view
Fraud	Detect.
Recommend.	
Conventional	
Data	Platform
THINK &
AUTOMATE
Data	Engineering	- Basic	Workflow	summary
PERSIST
ANALYZE & ACT
Relational	/	
Transactional	
Databases
Data	
Warehouses
NoSQL	DBs
Open	Source	SQL
Streams	/	IOT	 Conventional	
Data	Platform
Speed and Accuracy are critical desired outcomes for AI projects
Unstructured,	Landing,	Exploration	and	Archive
Operational	Data
Real-time	Data	Processing	&	Analytics
Transaction	and	
application	data	
Machine,
sensor	data	
Enterprise
content
Image,	geospatial,	
video
Social	data
Third-party	data
Information	Integration	&	Governance
Data	is	Prerequisite	to	AI
Deep	
Analytics	data	
zone
EDW	and
data	mart	zone
Risk,	Fraud
Chat	bots,	
personal	
assistants
Supply	Chain	
Optimization
Dynamic	
Pricing,	
Recommenders
Behavior	
Modeling
Vision,	
Autonomous	
Systems	
Speed	and	Accuracy	are	critical	desired	outcomes	for	AI	projects
MongoDB,
Postgres
MongoDB
Postgres
MongoDB,
Redis
MongoDB
MariaDB
Community
EnterpriseDB
Postgres, HWX
Cassandra
MongoDB, EDB,
Neo4j
Postgres
Ambientia
MariaDB
EnterpriseDB
Splendid Data
(Postgres)
Redis
Postgres
PowerAI / SunDB c
PowerAI / MongoDB
MongoDB, MariaDB
MariaDB,
Redis
Postgres, Redis
MongoDB
Internal & BP use only
Hortonworks
Hortonworks
Hortonworks
Hortonworks
Hortonworks
Hortonworks
MongoDB, EDB,
PowerAI
MongoDB
Hortonworks
Egyptian Ministry of
Interior
Hortonworks
Hortonworks
Clients	with	Modern	Databases	on	Power
“AI	LADDER”
Fixed	
Deployment
Private
Cloud
Machine	
Learning
Analytics
Data
AI
©	Hortonworks,	Inc.	2011-2018.	All	rights	reserved.	|	Hortonworks	confidential	and	proprietary	information.
Machine	
Learning
Analytics
Data
AI
©	Hortonworks,	Inc.	2011-2018.	All	rights	reserved.	|	Hortonworks	confidential	and	proprietary	information.
EDW	Optimization	– Enterprise	Data	Lake
• Offload	data	and	ETL	to	Hadoop
Hortonworks	Data	Platform
IBM	Elastic	Storage	Server
EDW	Modernization	– New	Data	+	Analytics
• Capture	and	analyze	new	streaming	data	
• Federate	across	EDW	and	Hadoop
Hortonworks	DataFlow
IBM	BigSQL
SAP	Data	Hub	and	Vora
Data	Science	– Machine	Learning
• Build	accurate	predictive	models	
• Leverage	Data	Lake
Data	Science	Experience
Data	Science	– Deep	Learning
• Use	larger	data	sets	and	neural	networks	to	
unleash	deep	insight
PowerAI
Spectrum	Conductor	with	Deep	
Learning	Impact
Say	“Hello”	
to	POWER9
1.8x
more	memory	
bandwidth	
vs	x86
2x
faster	core
performance
vs. x86
2.6x
more	RAM	
supported	
vs	x86
9.5x
max	I/O	bandwidth	
vs. x86
A	Modern	Data	Platform to	meet all	of	
your	data	type needs
@mRcSqZd
Technical	Leader
IBM	Cognitive	Systems
#ThinkCIO
Big	Data	creating	new	Cognitive	opportunities	and	
use-cases
Industry Data Source Cognitive Use-Case
Manufacturing / Factory /
Assembly
Audio, Video, Images Analysis of equipment sounds to anticipate machine failures; Manufacturing and Robotics.
Human Resources Text Data extraction and sentiment analysis for automated best-practice HR advice
All / Report Generation Sales data, etc.. Generative models for natural language detailed and summary reports automatically
generated.
Retail / Commerce Transactional Data Find correlations between customer demographics and products; Correlations between
products and other products; etc.
Hyperscale Datacenter Sensor / Logs Reinforcement learning to optimize best possible cooling and operating configurations for
energy efficiency.
Medical Images Automatic detection of diabetic retinopathy.
Retail / Commerce Behavior history Recommendations based on behavior analysis, patterns; Cross-sell and up-sell;
Telco CDRs/switches, subscribers, billing
systems, etc.
Customer retention, targeted promotions, fraud detection, service optimization etc..
Banking Emails, customer account
histories/behaviors, sentiment, securities
data, loan records, etc…
Customer insights to cross sell products, target offers; Call center
automation/optimization; Financial fraud and risk management.
Mobile Banking (w/
Telco)
Airtime usage, contacts, transactions,
mobile wallet, personal info
Customer insights to target offers; New forms of credit risk / loan approval / credit
worthiness.
Financial Financial market data, Quandl public DB,
NY Times database
Leverage market data and sentiment from earlier time zone for actionable insights in
current market (“follow the sun”).
Finance	Industry	specific	Use	Cases
Industry Data Source Use-Case
Financial Customer account history data, emails, call center logs, web
interaction,
Customer behavioral analysis, attrition prediction. (Spectrum Conductor,
PowerAI)
Financial Customer data, tranactions Fraud detections based on pattern / behavior analysis. ; Trend analysis.
Financial Credit card transactional data (OracleDB/AIX), Teradata
warehouse, Object store, Account history data.
Credit risk and fraud analysis
(OracleDB/AIX, Teradata, Puresystem, Spark, Swift/Cleversafe, PowerAI)
Financial Mortgage processing data, Core Logic, Analyze and then predict mortgage defaults and/or pre-payments; Analyze
macro effect on housing market; Rental rate estimation, credit analytics,
etc..
Financial / Insurance Image Automated claims processing (image classification of pictures of damage)
Financial KDB Timeseries Database (high frequency trading) Pattern analysis, clustering, decision trees. (Anaconda/Python, XGBoost,
K-Means, Tensorflow, Jupyter)
Financial RDBMS, Timeseries behavioral (premium payments, call center
interactions, emails, trades, calls, portfolio performance); Static data
(demographic, mortality, policies, portfolio strategy,
Customer attrition prediction and Product recommendation (up-sell / cross-
sell).
Financial Social media Natural language processing interpreting customer sentiment. (Tensorflow)
Sample	Industry	Use	Cases		
Industry Data Source Use-Case
Retail Images of products Computer vision based check-out. (PowerAI, FPGA based
Inferencing acceleration)
Supply Chain IoT, RFID, sensors, traffic data, weather Dynamic Route Optimization
Government Video Traffic management with real-time vehicle classification (VisionBrain,
Caffe, Tensorflow)
Government Large scale HPC operational data Optimize large scale HPC workload plan, parameters, experiments and
execution.
Government Drone, Satellite and other Images; Text Image analysis for real time threat protection; Anomaly detection,
patterns, sentiment, measures of opinion/action changing; Facial
recognition; Language translation, augment human translators.
Security Aggregation of community cameras and wecams,
images/video.
Aggregated community surveillance, learning normal community
behaviors (traffic, vehicles, pedestrians, etc.) and flagging/alerting
anomalies.
Telecom/Media Language, text, video, images. Improve accessibility, accuracy and speed of captioning; Meta-data
for content libraries and automated media tagging. Drone/aerial data
for preventative maintenance of equipment.
Media and
Entertainment
Augmented reality, Intelligent game agents, precision
recommendations
Auto / Aero Images, sensor data, Voice Autonomous and assisted driving; Accident avoidance; Voice
controlled controls/actions.
Industry /
Function
Data Source Use-Case
Energy / Utilities /
Oil and Gas
Facility/Equipment sensors; Seismic data and
images.
Identify failing or failed components; Anomaly detection and
customer pattern predictions; Re-interpret existing seismic
topology.
Energy / Utilities Drone/aerial images and video; Replace hazardous manual electrical tower inspections with
remote-controlled drones, leveraging image/video
classification to identify maintenance issues. (E870 in-
memory RDBMS, ESS)
Healthcare Biopsy images. Deep learning based early cancer detection.
Healthcare /
Lifescience
MongoDB, Neo4j, Enterprise DB Genomics sequencing with applications like BWA and
GATK, with additional workflows leveraging ML/DL
(Spectrum Conductor for Spark, Spectrum Scale, H2O,
Tensorflow )
Healthcare /
LifeSciences
Medical images, CAT scan, X-ray, MRI Improved diagnostic capability, anomaly detection;
Combine image recognition with genomic data; Drug
discovery – improved molecular analysis
Sample	Industry	Use	Cases

A modern data platform meets the needs of each type of data in your business