Client approaches to successfully navigate through the big data storm

© 2014 IBM Corporation
Client Approaches to
Successfully Navigate through
the Big Data Storm
June 2014

© 2014 IBM Corporation2
Does Your Big Data Project Look Like This?
IBM Presentation Template Full Version
You need cost predictability,
together with a solution that
can quickly take you places!
 Hadoop is a fascinating, exciting engine. However, it is:
 Ungoverned
 All custom, all the time
 Requires expensive, constantly changing skills
 Includes no concept of quality, governance or lineage
And, MapReduce was originally designed for finely grained fault
tolerance, which makes it slow for big data integration processing
Hadoop is just not a solution for big data integration

If so, that’s because 80% of the development work for a big data
project is to address Big Data Integration challenges
“By most accounts, 80 percent of the development effort in a big data project goes
into data integration and only 20 percent goes towards data analysis.”
Intel Corporation: Extract, Transform, and Load Big Data With
Apache Hadoop (White Paper)
Most Hadoop initiatives end up achieving garbage in,
garbage out faster, against larger data volumes and:
 MapReduce was not designed to accommodate the
processing all the logic necessary for big data
integration
 Teams forget that Hadoop initiatives require:
collecting, moving, transforming, cleansing,
integrating, exploring & analyzing volumes of
disparate data (of various types, from various
sources) --- AKA Data Integration
To succeed, you need Data Integration
capabilities that create consumable data by:
 Collecting, moving, transforming, cleansing,
governing, integrating, exploring & analyzing
volumes of disparate data
 Providing simplicity, speed, scalability and
reduced risk

A large US Bank needed to reduce total cost of ownership …
Business Problem Challenges
 Primary: Reduce Teradata total
cost of ownership
 Secondary: Allow for
new analytic exploration
& asset optimization
 Create a Data Distribution Hub / Big
Data platform to cut costs
 Move front-end processing from
Teradata to the Data Distrubion Hub
 Needed to offload ELT workload in a
cost-effective, efficient way

… and successfully offloaded ELT workloads to reduce costs
Approach Outcome
 Reduce costs by offloading ELT
workloads from Teradata to a Big
Data platform
 Leverage existing InfoSphere
Information Server data
integration skills and assets (jobs)
 Hand coding: Client would not
consider hand coding for data
integration capabilities
 Client decides to deploy IBM
PureData for Hadoop
 Client uses InfoSphere Information
Server as their single scalable &
flexible Big Data Integration solution
 Client successfully migrated their
Teradata ELT and now uses
InfoSphere Information Server to
exploit the lower cost of running
data integration on Hadoop

A government entity anticipated the need to support 10x increase in
incoming data volumes over 3-5 years …
Business Problem Project Challenges
 This Master Data Management
(MDM) client compares
frequently updated records to
identify potential national
security threats. They needed to:
– Support a 10X increase in
incoming data volumes (in
the next 3-5 years)
– Reduce high software and
hardware costs
 Create a solution that could support
scalable probabilistic matching for up
to 10X data growth
 Modernize ETL practices and remove
bottlenecks

… and replaced an expensive and failing hand-coding approach with
a massively scalable Big Data Integration solution
Approach Outcome
 Eliminate hand coding for data
integration to significantly reduce
software costs
 Deploy a data integration solution
that can scale fast enough to feed
the MDM system
 Reduce high costs of ELT running
in their database
 Removed hand coding & replaced it
with InfoSphere InfoSphere
Information Server for massively
scalable data integration processing
 Stopped running ELT in the
database, leveraging Hadoop instead
 Client purchased an end-to-end Big
Data solution from IBM – across
MDM, Hadoop, and Information
Integration areas

A large European telco wants to leverage big data to increase
revenue and customer satisfaction …
Business Problem Project Challenges
 Increase revenue & customer
satisfaction by analyzing usage
patterns of mobile devices to
match user demand
 Needed a comprehensive Big
Data platform that could keep up
with analytics requirements
 Reduce costs by reducing
inventory
 Client used Informatica for ETL,
generally, and planned to extend use
to the Big Data effort. They asked
Informatica to improve (existing)
Netezza loading performance in
support of their goals and:
– The ETL process broke with a
small sample of jobs
– They switched to an ELT
approach and encountered
technical problems

… and learned that ELT only was not sufficient to support Big Data
Integration
Approach Outcome
 Leverage a worldwide predictive
solution to anticipate customer
requirements
 Add a Hadoop layer to enrich
predictive models with
unstructured social media data
 Expand existing IBM Netezza
footprint to keep pace with new
data volumes
 Client requested a full-workload
data integration POC with IBM
 Client realized ELT only was not
sufficient for Big Data Integration
(all data integration logic cannot be
pushed into IBM Neteeza or Hadoop)
 Client found InfoSphere Information
Server can often run data integration
faster than either Neteeza or Hadoop
 Client selected InfoSphere
Information Server over Informatica
for Big Data Integration and
InfoSphere BigInsights over Cloudera

Plan for Success!
Successfully navigate the big data maze
Hadoop is not a Data
Integration platform,
80% of the work is
around Big Data
Integration, and
MapReduce is slow
To move into production
successfully, you need to
plan ahead and make
sure you have accounted
for your Big Data
Integration needs: Hand
coding does not meet
Big Data Integration
scalability, flexibility,
or performance
requirements
Get more information
about Big Data Integration requirements and key
success factors
ELT only is NOT
sufficient to meet
most Big Data
Integration
requirements,
because you cannot
push ALL the data
integration logic into
the data warehouse or
into Hadoop

Client approaches to successfully navigate through the big data storm

More Related Content

What's hot

Similar to Client approaches to successfully navigate through the big data storm

More from IBM Analytics

Recently uploaded

Client approaches to successfully navigate through the big data storm