Enterprise Integration of Disruptive Technologies

Big Data Adoption
Enterprise Integration of Disruptive Technologies

Prepared by: Date:
Alasdair Anderson 18 March 2013

PUBLIC

Big Data and the Enterprise

!=
2 PUBLIC

Business Context: HSBC (HSS) a business with a lot of data…..

Global Business

Global outsourcer of
investment operations

Active in 40+ countries
& jurisdictions

Over 150 operational
technology systems

Outsourcing is a
diverse and
incrementally complex
business

3 PUBLIC

Challenges in building Big Data Environments

ETL is a brittle 1 shot at success One version of the truth….
Design
Tight coupling to the relational model Any significant change initiates data migration Time

Source Integration Warehouse Division Marts Channels

Ops Product
Product Read
ODS ETL Product eCommerce
Trades Product
ETL

Position
ETL Enterprise
Logical Strategic Marts Analytical
Corp CMF
Actions Model Tools
Function
Function Read
External ETL Function
ETL Staging Function Reporting
Market Data

Client

Exchange Vertical Scale RDBMS struggle with scale out Multi-Marts increase duplication Run
Big Batch Appliances are uneconomic Cost increases with proliferation
Time

Time to Market: Months for any given slice, years in total
Total Cost: Any volume or low latency environment requires annual spend in the millions to 10’s of millions

4 PUBLIC

Building Big Data platforms has been an unhappy experience

Time to market has increased proliferation not consolidation

Delivery risk is high, as witnessed in industry wide failure rates

Ultimate Customer satisfaction is low, we often end up
answering yesterdays questions tomorrow

The economics of traditional technologies are against
proliferation of analytical platforms
– Costs increase with addition of data sources
– Costs of change increase with addition of data sources

Processing ceilings are reached quickly when adding newer
sources of data to traditional platforms

5 PUBLIC

Crisis of Supply and Demand, we need a new approach

High level requirements……

A single data platform that can provide 360 views of clients, operations and products

– Functionally the platform should support:
– Continual development, integration and deployment
– Parallel development streams
– Integration of poly-structured datasets
– Multi-views on single data sets
– ……..act as an ENABLER of change

– Non-functionally the platform should support:
– A low cost economic model for analytical platforms
– Scale to terabytes with high throughput ingest and integration
– Co-exist with our current estate
– Be accessible to business and technology teams

Enter Hadoop!
6 PUBLIC

Introducing any new technology to an enterprise

Adoption Lifecycle: Hadoop

Learn Plan Build

Proof Business Pilot Projects
Of Concept Value Strategic Stack

What have we done?
Whats left, whats next?

7 PUBLIC

Big Data Vision

8 PUBLIC

Big Data Vision: The Agile Information Lifecycle

Data
Events
Discovery

Analytical
Blotters
Application

Map
Reduce Ingest
Processing

Insights rarely happen on the first query or build, more likely to occur after
several iterations on a dataset

9 PUBLIC

Hadoop Proof of Concept Scope: Gaungzhou China

Using Time to install Ease of
Performance
a vendor maintaining
Hadoop comparison
package the cluster

Developing Integration of Building Porting
existing applications existing code
on Hadoop databases on the cluster to Hadoop

Advanced Enhance an
Build out a
Development existing
Analytics skills levels analytics
new modelling
service
on Hadoop package

10 PUBLIC

Proof of Concept Results

Hadoop was installed and operational in a week

18 RDBMS Warehouse and Marts databases were ported to
Hadoop in 4 weeks

A existing batch that currently take 3 hours was reengineering
on Hadoop: Run Time 10 minutes

A current Java based analytics routine was ported onto Hadoop
increasing data coverage and reducing execution time

We lost the namenode and had to rebuild the cluster…..

11 PUBLIC

Hadoop Code Day: Gaungzhou China

We sponsored a 24 hour code competition
to allow the off-shore teams to show their
stuff

We had over 50 volunteers for the event

The volunteers were split into teams of 3
and given 24 hours to develop an
application using the Proof of Concept
cluster

1 weeks training was offered to the
participant on a casual basis

All the teams delivered…………

12 PUBLIC

Next Step: Planning

Adoption Lifecycle

Learn Plan Build


13 PUBLIC

Big Data Plan: Big Data Economics (names removed to protect the innocent)

14 PUBLIC

Hadoop Economics: Technology for Austerity

REVENUE

MARGIN

COST

Hadoop speaks to the economics of today
Growing product and capacity at the same time as increasing margin
15 PUBLIC

Generic HSBC Big Data Use Cases

Volume File Processing Big Warehouse Advanced Analytics
Characteristics Characteristics Characteristics
• High Volume, High Throughput • Multi-source warehouse analytics • Statistical modeling and what if
processing of legacy flat files, XML environment providing a single data analysis on group wide data across
or other structured and semi- platform across multiple business multiple business lines
structured data lines • Production of data derived products
• Integration of polystructred data
Current challenges Current challenges Current challenges
• Cost: High volumes processing • Time to Market: Data Warehouse / • Scale: Traditional Analytic Data
predominantly still reside on the MI projects have proved extremely platforms have only been able to
mainframe, making low complexity challenging to implement in HSBC scale on the vertical
processing expensive and in the Finance Industry in • Cost: The amount of compute
• Scale: the ability to grow out general power required to perform volume
mainframe capacity quickly is • Complexity: Data Integration of statistical operations is cost
limited, the ability to scale on even group standard systems has prohibitive
distributed platforms is limited proved difficult due the variety of • Fidelity: Analytical calculations are
data structures and content typically run on aggregate totals
• Latency: Real Time MI is still only leading to a disconnect between
available via reporting from source events and the derived conclusions
directly or decisions
.

Day 1 Value

Strategic Value

16 PUBLIC

Big Data Plan: When and Where

17 PUBLIC

Next Step: Planning

Adoption Lifecycle

Learn Plan Build


18 PUBLIC

So we’re done?

Not quite……

19 PUBLIC

Remaining Challenges: Big Data Operations

Big Data Operations Big Data Organisation Hype / Cynicism

Is Hadoop anti-virtualisation? Segregation of duties USE IT AS A POSITIVE!!!

High Availability / disaster Big Data doesn’t want a Place Big Data into a competitive
Recovery needs to improve separate app, database, os & situation against your existing
storage team. The platform Information Management
Security and data privacy demands skilled generalists technologies, if you can’t get the
concerns job done better/faster/cheaper
then alter your decision tree?

Data Federation
PUBLIC
20

The art of the possible in 24 hours…..

Hadoop excites……
Hadoop on iPad & Android
(and tires)

The Winners….
Hadoop on HTML5 & Flex
Hadoop & R for Portfolio Optimisation

21 PUBLIC

Enterprise Integration of Disruptive Technologies

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (12)

Similar to Enterprise Integration of Disruptive Technologies

Similar to Enterprise Integration of Disruptive Technologies (20)

More from DataWorks Summit

More from DataWorks Summit (20)

Recently uploaded

Recently uploaded (20)

Enterprise Integration of Disruptive Technologies

Editor's Notes