Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Big Data in Banking
Risk Systems Perspective
Andre	Langevin
langevin@utilis.ca
www.swi.com
Agenda
Ø Big	Data	at	the	Big	6
Ø RDAR	Data	Hubs
Ø Lessons	Learned	(so	far)
Ø Technology	Themes	in	2016
An	important	 note	...
Big Data at the Big 6
RDARR	Drives	Big	6	Adoption
Ø RDARR	is	a	mandatory	regulatory	project:
v Regulatory	response	to	2008	credit	crisis
v Requi...
Big	6	Hadoop	Risk	Applications
Ø Many	projects	are	underway,	but	relatively	few	are	in	production:
v Plans	for	enhanced	mo...
Importing	US	Risk	Applications
Ø Expect	to	see	more	risk	applications	pioneered	by	leading	US	banks:
v Trading	Strategy	Ba...
Big	6	Vendor	Alignments
Ø Banks	have	each	chosen	a	strategic	
Hadoop	vendor:
v TD,	CIBC	and	NB	use	Cloudera
v RBC	and	BNS	...
Deployment	Patterns
Ø Mix	of	virtual	and	physical	server	deployments:
v Cisco	UCS	and	VMWare	vSphere	are	leading	infrastru...
RDARR Data Hubs
Typical	RDARR	Data	Hub
Ø RDARR	focus	drives	Data	Hub	solution	characteristics:
v RDARR	objective	is	auditable	batch	report...
RDARR	Data	Hub	Challenges
Ø Hadoop	Data	Governance	is	early	stage	and	poorly	integrated:
v No	good	Hadoop	solution	to	data...
Leaving	Business	Value	on	the	Table
Ø Rudimentary	governance	and	security	tools	produce	a	
bias	against	self-serve	access	...
Lessons Learned (so far)
Choosing	a	Hadoop	Distribution
Ø Maximize	your	exposure	to	change:
v Hadoop	moves	at	very	fast	pace:		expect	to	deploy	a	m...
Data	Engineering
Ø Risk	modelling	is	often	very	inefficient:
v A	quantitative	modeler	typically	spends	80%	of	their	time	d...
Developer	Lessons	Learned
Ø Productivity	and	performance	improve	with	native	Hadoop	tools:
v The	“Hadoop	edition”	of	most	...
Risk	Architecture	Insights
Ø Hadoop	is	a	compute	grid:
v Yarn	is	a	functionally	equivalent	to	DataSynapseor	Platform	Symph...
Infrastructure	Lessons	Learned
Ø Pay	attention	to	the	network:
v Hadoop	needs	a	fast	network	backbone	between	nodes
v Appl...
Infrastructure	Lessons	Learned
Ø Don’t	try	to	prevent	infrastructure	failure:
v Hadoop	is	very	fault	tolerant–it	is	design...
Technology Themes in 2016
Technology	Themes	for	2016
Ø Mix-and-match	SQL	engines:
v Native	Hadoop	SQL	engines	lack	many	advanced	features	in	databas...
Data	Governance	Themes	for	2016
Ø Native	Hadoop	Data	Governance:
v Hortonworks	has	partnered	with	JP	Morgan,	Merck	and	Aet...
Risk	Technology	Themes	for	2016
Ø Model	development	on	Hadoop:
v As	RDARR	data	hubs	hit	critical	mass,	risk	model	developm...
Questions?
Upcoming SlideShare
Loading in …5
×
Upcoming SlideShare
Data Mining Technique Clustering on Bank Data Set
Next
Download to read offline and view in fullscreen.

0

Share

Download to read offline

TechConnex Big Data Series - Big Data in Banking

Download to read offline

TechConnex is an industry forum for Canadian IT executives. This presentation from the fall of 2015 provides a survey of Hadoop adoption in the Canadian banking industry. Most adoption is driven by BCBS-239 implementation projects. The talk provides a broader risk systems perspective on Hadoop and discusses challenges and opportunities around the technology.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

TechConnex Big Data Series - Big Data in Banking

  1. 1. Big Data in Banking Risk Systems Perspective Andre Langevin langevin@utilis.ca www.swi.com
  2. 2. Agenda Ø Big Data at the Big 6 Ø RDAR Data Hubs Ø Lessons Learned (so far) Ø Technology Themes in 2016 An important note about this presentation: in order to respect the commercial interests and privacy of my clients, I have refrained from using specific company names, unless information is publicly available.
  3. 3. Big Data at the Big 6
  4. 4. RDARR Drives Big 6 Adoption Ø RDARR is a mandatory regulatory project: v Regulatory response to 2008 credit crisis v Requires re-build of data gathering and regulatory reporting to implement measurable data quality, operational metadata and auditable data lineage v Regulatory enforcement starts in 2017 Ø Big 6 IT spend of ~$800MM over three years on RDARR v Combined Big 6 IT spend on all Risk Systems projects is ~$400MM per year v RDARR spend has largely been incremental – other regulatory initiatives have continued to drive project spend separate from RDARR Ø Hadoop data hub is a typical RDARR solution element The investment spend by G-SIBs on RDARR is very significant, averaging US$230MM per bank. These investment costs are likely to increase. Oliver Wyman “BCBS 239: Learning from the Prime Movers” All of Canada’s Big 6 banks were designated as Domestically Systematically Important Banks (D-SIBS) by OSFI, meaning they must fully comply with BCBS-239.
  5. 5. Big 6 Hadoop Risk Applications Ø Many projects are underway, but relatively few are in production: v Plans for enhanced model building and analytics for retail banking following 2016 RDARR deadline v Capital Markets has been leading driver of Hadoop adoption for compute applications Ø Risk Systems teams have started building Hadoop-based applications: v Volcker Rule Compliance Metrics (e.g. RENTD) v Portfolio Stress Testing v Market Risk VaR History v On-Demand Risk Ø Trading Floor Risk Managers have installed stand-alone Hadoop instances: v Often cloud-based, used in specialized analysis of derivative sensitivities or historical market data
  6. 6. Importing US Risk Applications Ø Expect to see more risk applications pioneered by leading US banks: v Trading Strategy Back Testing v Granular Capital, CVA and Market Risk Trending v Capital Markets Dealer Compliance v Credit Adjudication Models v Behavioral Models (Often for Collections) v Fast-time Transactional Fraud Detection v AML v Commercial Credit Network Analysis
  7. 7. Big 6 Vendor Alignments Ø Banks have each chosen a strategic Hadoop vendor: v TD, CIBC and NB use Cloudera v RBC and BNS use Hortonworks v BMO uses Pivotal (Hortonworks) Ø “Land grab” among vendors: v Multi-year subscription deals at large discounts to lock in customers Ø IBM struggling for share despite entrenched starting position: v Lack of SAS support was a show stopper Forrester Wave Q1 2014
  8. 8. Deployment Patterns Ø Mix of virtual and physical server deployments: v Cisco UCS and VMWare vSphere are leading infrastructure choices Ø Many banks report using multiple grids aligned to business units*: v Tools to manage multi-tenancy on Hadoop are still nascent v Organizational issues (cost allocation, support team alignments) inhibit shared deployments Ø Vendor community has invested heavily in cloud deployment tools: v One-click deployments of all major Hadoop distributions are available on public clouds Ø Banks looking at “hub and sandbox” deployments on private clouds: v Popular pattern in established US deployments v Big 6 all have a built internal private cloud or access to one through a major infrastructure provider v Notable S3/AWS deployment by US regulator FINRAsets the standard * Hortonworks CAB
  9. 9. RDARR Data Hubs
  10. 10. Typical RDARR Data Hub Ø RDARR focus drives Data Hub solution characteristics: v RDARR objective is auditable batch reporting – tied in to central lineage and metadata solutions v Little consideration of unstructured or real-time data sources v Often characterized as a raw-data landing zone for otherwise inaccessible mainframe data v Resistance to fully adopt Hadoop as a data hub – often paired with legacy database hubs Ø Retail data focus drives emphasis on security v PIPEDA/GBL compliance deemed critical despite little to no use of PII/PCI data in reports v SOX compliance mandatory Ø Architecture teams are the dominant view in data hub projects v Business sponsor is often a newly established Data Management Office v Focus on cost and process optimization of data flows to downstream reporting solutions Ø Internal build – low to no adoption of commercial hub solutions
  11. 11. RDARR Data Hub Challenges Ø Hadoop Data Governance is early stage and poorly integrated: v No good Hadoop solution to data governance (yet) v Data linage is at the file level in Hadoop – not suitable for RDARR critical data element traceability v Policy-based data access solutions still in development (e.g. Navigator, Atlas) Ø Enterprise ETL tools not Hadoop enabled: v Many tools unable to push transformation work to Hadoop (or only as rudimentary Hive SQL) v Performance of established ETL tools often poor on Hadoop Ø Early mover penalty: Hadoop 2.x included solutions to many early security and operational problems “in the box:” v Projects with 2013 start dates were based on Hadoop 1.x – and so are usually Cloudera-based v Established US banking shops are usually on Cloudera or MapR implementations for same reason
  12. 12. Leaving Business Value on the Table Ø Rudimentary governance and security tools produce a bias against self-serve access to data: v Transfer modelling and analytic users’ frustrations with existing data warehouse solutions to a new platform v PII/PCI data control solutions can prevent deployment of analytical tools Ø Design for static regulatory reporting objectives ignores high-value interactive exploration and discovery uses: v Standardized reporting schemas (such as IBM BDW) have limited value to risk modelers and analysts Ø Focus on meeting operational SLAs over sharing of grids Banks are struggling to understand the concrete business impact associated with BCBS 239; nearly 70 per cent of domestic systemically important banks (D-SIBs) and half of G-SIBs have not quantified the benefits. Oliver Wyman “BCBS 239: Learning from the Prime Movers”
  13. 13. Lessons Learned (so far)
  14. 14. Choosing a Hadoop Distribution Ø Maximize your exposure to change: v Hadoop moves at very fast pace: expect to deploy a meaningful update every 3-6 months v Avoid designs and products that try to encapsulate Hadoop – they fall behind faster than you can recover your investment Ø Legacy tool compatibility is important: v SAS compatibility is critical (even though SAS doesn’t integrate well with Hadoop) v Does your organization have DB2 or PL/SQL skills to preserve? Ø It’s not as easy to switch distributions as you think Ø Wait for the features you like to become free: v Strong history of the open-source distribution incorporating features that were previously proprietary – newer vendors attack incumbents by producing open-source replacements for proprietary extensions
  15. 15. Data Engineering Ø Risk modelling is often very inefficient: v A quantitative modeler typically spends 80% of their time data gathering and preparing data v Specialized data preparation is often difficult to repeat in production environments Ø Data Engineering accelerates quantitative modelling: v Advanced research labs hire data engineers to support their quantitative modelers v Data Engineers are a hybrid of computer programmer and mathematician: they use IT-friendly tools to source and package data into forms that are tailored to the modeler’s tool set (e.g. building a smoothing a time series) v Marketing teams use a 1:5 ratio of modelers and data engineers – but 10:1 is common on the “buy side” and so is a better staffing target for a bank Ø Data hubs should target data engineers as users: v Build sophisticated tools for expert consumers, rather than rudimentary tools for casual users
  16. 16. Developer Lessons Learned Ø Productivity and performance improve with native Hadoop tools: v The “Hadoop edition” of most legacy ETL packages perform slowly and are poorly integrated with Hadoop – you are usually just buying an HDFS adapter Ø Learn the native tools – it’s easier than you think: v A Java programmer can learn Map/Reduce in a week v Most end-users already know how to use SQL and python Ø Use Pig to tune your SQL queries: v The best optimization for Hive SQL is often to structure data on ingestion in a Hadoop-friendly way Ø You will find lots of small bugs in Hadoop: v Your Hadoop vendor’s support team are a critical resource to your success
  17. 17. Risk Architecture Insights Ø Hadoop is a compute grid: v Yarn is a functionally equivalent to DataSynapseor Platform Symphony Ø You can wrap most computations using map/reduce: v Writing a map/reduce wrapper to feed data to your C#, Java, C++, or python applications is surprisingly easy – a hundred lines of code usually does it Ø Use Hadoop to bring the computation to the data: v Re-process your data files into computationally efficient HDFS blocks v Eliminating movement of data in a compute-centric risk application improves performance dramatically v Still need caching of intermediate valuation products (e.g. zero curves)
  18. 18. Infrastructure Lessons Learned Ø Pay attention to the network: v Hadoop needs a fast network backbone between nodes v Applications and databases that draw data from Hadoop (e.g. Tableau) should be co-located Ø Hadoop grids should cost less than $1,000/TB: v Including hardware and support subscription for a major Hadoop distribution v Hadoop reference configurations are based on mid-price commodity hardware, so use that v Virtualization will provide cheaper infrastructure, but higher node counts offset savings by driving up support subscription costs Storage Costs (TB) Hadoop $1,000 SAN $5,000 Database $12,000 InformationWeek 07/27/2012
  19. 19. Infrastructure Lessons Learned Ø Don’t try to prevent infrastructure failure: v Hadoop is very fault tolerant–it is designed to handle an annual equipment failure rate of 8% v Do not use fault tolerant hardware – use JBOD instead of RAID arrays v A well-designed Hadoop grid will keep running for the 24 hours it takes your hardware vendor to replace a broken machine under a normal support contract Ø The best back-up for Hadoop is Hadoop: v Hadoop is the cheapest form of on-line storage available, and is cost-competitive and more reliable than tape. v Replicate your Hadoop grid to a second grid at a different site for a high-grade disaster recovery solution.
  20. 20. Technology Themes in 2016
  21. 21. Technology Themes for 2016 Ø Mix-and-match SQL engines: v Native Hadoop SQL engines lack many advanced features in database SQL engines v Oracle and IBM are unbundling their Hadoop implementations of PL/SQL and DB2 v Oracle’s PL/SQL engine for Hadoop runs on Cloudera and could be available on Hortonworks v IBM is releasing BigSQL (DB2) for ODP – meaning it won’t be available on Cloudera Ø Open Data Platform: FUD or fantastic? v Pivotal has used ODP to partner with Hortonworks and focus on their other tools v IBM has promised to release all of their data science tools for ODP, but has been slow to deliver Ø IBM “all in” on Spark: v IBM’s data science tools (e.g. BigR) complement typical Spark use cases (e.g. clustering) Ø Tableau displacing Cognos & BOBJ
  22. 22. Data Governance Themes for 2016 Ø Native Hadoop Data Governance: v Hortonworks has partnered with JP Morgan, Merck and Aetna to build an advanced Hadoop data governance solution in the Apache Atlas project v Atlas is intended to govern Hadoop data in a federated governance model – partner adoption will drive success Ø Federated Data Governance: v The Big 6 have all adopted IBM IGC as their enterprise RDARR lineage and metadata solution. v IBM provides REST APIs to integrate IGC with non-IBM products. v Will ODP partners Hortonworks and IBM manage to establish Atlas on IGC as the definitive Hadoop solution in a distributed governance model?
  23. 23. Risk Technology Themes for 2016 Ø Model development on Hadoop: v As RDARR data hubs hit critical mass, risk model development will gravitate to Hadoop-based tools Ø Notebook workspaces: v Increased use of Hadoop modelling environments will drive demand for Notebook environments based on Jupyter and Apache Zeppelin (e.g. IBM Knowledge Anyhow) Ø On-Demand Risk on Hadoop: v Next generation on-demand risk applications will converge stand-alone compute grid and data cache and persistence onto Hadoop stack to eliminate data movement – better performance and lower costs
  24. 24. Questions?

TechConnex is an industry forum for Canadian IT executives. This presentation from the fall of 2015 provides a survey of Hadoop adoption in the Canadian banking industry. Most adoption is driven by BCBS-239 implementation projects. The talk provides a broader risk systems perspective on Hadoop and discusses challenges and opportunities around the technology.

Views

Total views

567

On Slideshare

0

From embeds

0

Number of embeds

52

Actions

Downloads

24

Shares

0

Comments

0

Likes

0

×