Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Big Data in Financial Services
We Do Hadoop
Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Global banking trends
Source: E&Y
Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Key focus areas within the financial services industry
Internal domains..
Please note that this is not a comprehensive list of deployed use-cases
across the major domains, just the major areas in which industry shifts are
occurring and where customers are looking to deploy enterprise Big Data in.
External Customer facing domains..
Risk Mgmt
Cyber
Security
Fraud
Detection
Data
Compliance
Digital Banking
360
degree view
Capital Markets
Retail Banking
and Lending
Credit Cards;
Payment Networks
Wealth Mgmt
Corporate Banking
Asset Mgmt
(Brokerage, MF and
Stock exchanges)
Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
My Experience in Risk & Compliance
Mike DeSanti
November 4th, 2015
Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
• The move from proprietary to flow-based trading in Capital Markets is affecting
revenue
• Fintech companies are taking over traditional revenue sources in the lending and
payments spaces
• The millenials reliance on mobile technology and instant gratification
• Increased regulatory spending is strangling discretionary spending
• Having to deal with very dated and expensive IT infrastructure
Issues Affecting Most Banks
Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
• Fragmented Book of Record Transaction systems
– Lending systems along geographic and business lines
– Trading systems along desk and geographic lines
• Fragmented enterprise systems
– Multiple general ledgers
– Multiple risk systems by risk function
• Credit limit management, traded credit, Basel capital systems, CVA, Market Risk VaR, Stress VaR, Market Risk reporting…
– Multiple compliance systems by business line by compliance initiative
• AML for Retail, AML for Commercial Lending, AML for Capital Markets…
• They are full of proprietary vendor and in-house built solutions that have been
acquired over the years
What I Have Seen at Banks
Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Leads to Current State Complexity
• Thousands of point-to point feeds to each
enterprise system from each transaction
system
• Data is independently sourced leading to
timing and data lineage issues
• Close processes are complicated and
error prone
• Reconciliation requires a large effort and
has significant gaps
Book of Record Transaction Systems
Enterprise Risk, Compliance and Finance Systems
Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
• Centralize business and operation functions
• Incentivize technology to shrink not grow
• Populate a data lake that with a set of canonical feeds from the transaction
systems
• Create a linearly scalable platform to host enterprise applications on top of this
data lake, including hot, warm and cold computing zones
• Develop a partnership with Hortonworks to evolve the platform to meet the
Banking Industry’s needs
How Do We Change?
Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
• A free open source linearly scalable platform has only become available within
the last few years
• Due to the amount of regulation over the last 15 years all bank enterprise
compliance, risk and finance systems now function essentially the same way
• Banks partnering with an open source partner is very different from partnering
with a vendor who develops proprietary software
• Proprietary software vendors will adopt the new standards since it is in their self
interest to do so
• Regulators can now streamline their regulatory practices by adopting this
platform
Why Will This Work Now?
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Open Platform for Risk & Compliance
David Lattimore-Gay (Eikos Partners)
Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
What is OPRC ?
• Leveraging Open Source
– Large Development Community
– Innovation
– Reuse not build
• Commodity hardware
– Data Storage
– In Memory Cache
– Computing Power
• Light weight way for ingesting and storing data
– Single Source Data
– Multiple views on the same data with Schema on Read capability
– Built in data lineage
• Unified development environment for analytics
– Partnership between quants and IT developers that will allow IT to package analytics for
deployment rather than recoding them
Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data Lake
OPRC Originating Source System
Source System
Canonica
l
L1
Mapped
to
Schema
Cache
Reference / Transaction Data / Market Data / Static Data / Meta Data
Reference Data
Validation
Normalization
PK/AK
What has
changed ?
Load
L0
Raw Data
Reporting
Derived
Data
Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Compute Service
Grid
Web Services
UI Framework
In Memory
API
Data
Fabric
Desktop Mobile
Data Access
API
Task
Calculator Framework
(Job Create/Monitor/Control)
Task
Task
TaskTask Task
Task
Task
Compute Service Strategy Engine
Task Dependency
Scheduler
Task Dependency
Scheduler
Task
Aggregation &
Reporting Engine
Data
Lake
Standardized Book of
Record Transaction
(BORT) & Other
Systems Feeds
ETL
Framework
OPRC High Level Architecture
Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Technology Architecture
Nadeem Asghar
Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
OPRC Functional Requirements
LANDING DATA
ZONE
L0
STANDARDIZED
DATA ZONE
L1
CANONICAL
DATA ZONE
L2 Regulatory
Reports
Internal
Reports
External
Reports
Search
REPORTING/
ANALYTICS ZONE
L3
Golden Source &
Feeds
Master Data
Contrats
Balances
Transaction
Positions
Factors/
Scenarios
Market Data
Unstructured
Data
Original
Data
RAW Data
Standardized
Data
Materialized
View
Extract & Load
Extract & Load
Connectors/
Adaptors
Standardized
Data
DQ & Validation
Normalization
Mapping &
Schemas
Data Cleansing
Standardized
Data
Data Join/merging
Data
Transformation
Business Rules
Model Rep &
Validation
Canonical
Position
Data
Canonical
Transaction
Data
Data Aggregation
Calculators
Scenarios Run
Revision Hisotry
Scenarios
Results
Data
Aggregations
Analytics/
Reports
Revision
History
Common Repositories/Meta Data Management
Security
Connectors Schema Mapping Catalogging End to End Lineage Tracing
Authenication RBAC Authorization Encryption Audit
Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Proposed Solution based on Hortonworks Stack
Hortonworks Data Platform
LANDING DATA
ZONE
L0
STANDARDIZED
DATA ZONE
L1
CANONICAL
DATA ZONE
L2 Regulatory
Reports
Internal
Reports
External
Reports
Search
REPORTING/
ANALYTICS ZONE
L3
Golden Source &
Feeds
Master Data
Contrats
Balances
Transaction
Positions
Factors/
Scenarios
Market Data
Unstructured
Data
(hdfs)
Original
Data
(hdfs)
RAW Data
hdfs)
Standardized
Data
(Hive/orc)
Materialized
View
(Hive/orc)
sqoop/hadoop fs/
nfs
Kafka/Storm
Java/Scala
Standardized
Data
(Hive/Orc)
Hive/Spark/Scala
Hive/Spark/Scala
MHive/Spark/Scal
Hive/Spark/Scala
Standardized
Data
(Hive/Orc)
Hive/Spark/Scala
Hive/Spark/Scala
Hive/Spark/Scala
TBD??
Canonical
Position
Data
(hive/orc)
Canonical
Transaction
Data
(Hive/orc)
Hive/Spark
Scala/Python/R
etc
Scala/Java
Hive/Spark
Scenarios
Results
(Hive/orc)
Data
Aggregations
(Hive/orc/
Hbase)
Analytics/
Reports
(Hive/orc/
HBase)
Revision
History
(Hive/orc)
Common Repositories/Meta Data Management
Security
Apache Atlas/Falcon/ Custom Solution
Apache Ranger/ Atlas and Custom/Partner Solution
Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
OPRC Game Plan:
Solution Based on IP(Above the line)/ Open Source(Below the Line)
Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Wrap up..
Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop is a Platform Decision
Adoption follows a consistent journey
Data architecture efficiencies, new analytic apps, and
ultimately to a “data lake”.
HDP: A centralized architecture built on YARN
Any application, any data, anywhere.
HDP: A completely open data platform
Platforms are ultimately defined by open communities.
HDP subscription supports entire lifecycle
World class experience to ensure success from architecture to
production to expansion.
Page20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cautionary Statement Regarding Forward-Looking Statements
This presentation contains forward-looking statements involving risks and uncertainties. Such
forward-looking statements in this presentation generally relate to future events, our ability to
increase the number of support subscription customers, the growth in usage of the Hadoop
framework, our ability to innovate and develop the various open source projects that will enhance
the capabilities of the Hortonworks Data Platform, anticipated customer benefits and general
business outlook. In some cases, you can identify forward-looking statements because they
contain words such as “may,” “will,” “should,” “expects,” “plans,” “anticipates,” “could,” “intends,”
“target,” “projects,” “contemplates,” “believes,” “estimates,” “predicts,” “potential” or “continue” or
similar terms or expressions that concern our expectations, strategy, plans or intentions. You
should not rely upon forward-looking statements as predictions of future events. We have based
the forward-looking statements contained in this presentation primarily on our current expectations
and projections about future events and trends that we believe may affect our business, financial
condition and prospects. We cannot assure you that the results, events and circumstances
reflected in the forward-looking statements will be achieved or occur, and actual results, events, or
circumstances could differ materially from those described in the forward-looking statements.
The forward-looking statements made in this prospectus relate only to events as of the date on
which the statements are made and we undertake no obligation to update any of the information in
this presentation.
Trademarks
Hortonworks is a trademark of Hortonworks, Inc. in the United States and other jurisdictions. Other
names used herein may be trademarks of their respective owners.

Big Data in Financial Services

  • 1.
    Page1 © HortonworksInc. 2011 – 2015. All Rights Reserved Big Data in Financial Services We Do Hadoop
  • 2.
    Page2 © HortonworksInc. 2011 – 2015. All Rights Reserved Global banking trends Source: E&Y
  • 3.
    Page3 © HortonworksInc. 2011 – 2015. All Rights Reserved Key focus areas within the financial services industry Internal domains.. Please note that this is not a comprehensive list of deployed use-cases across the major domains, just the major areas in which industry shifts are occurring and where customers are looking to deploy enterprise Big Data in. External Customer facing domains.. Risk Mgmt Cyber Security Fraud Detection Data Compliance Digital Banking 360 degree view Capital Markets Retail Banking and Lending Credit Cards; Payment Networks Wealth Mgmt Corporate Banking Asset Mgmt (Brokerage, MF and Stock exchanges)
  • 4.
    Page4 © HortonworksInc. 2011 – 2015. All Rights Reserved My Experience in Risk & Compliance Mike DeSanti November 4th, 2015
  • 5.
    Page5 © HortonworksInc. 2011 – 2015. All Rights Reserved • The move from proprietary to flow-based trading in Capital Markets is affecting revenue • Fintech companies are taking over traditional revenue sources in the lending and payments spaces • The millenials reliance on mobile technology and instant gratification • Increased regulatory spending is strangling discretionary spending • Having to deal with very dated and expensive IT infrastructure Issues Affecting Most Banks
  • 6.
    Page6 © HortonworksInc. 2011 – 2015. All Rights Reserved • Fragmented Book of Record Transaction systems – Lending systems along geographic and business lines – Trading systems along desk and geographic lines • Fragmented enterprise systems – Multiple general ledgers – Multiple risk systems by risk function • Credit limit management, traded credit, Basel capital systems, CVA, Market Risk VaR, Stress VaR, Market Risk reporting… – Multiple compliance systems by business line by compliance initiative • AML for Retail, AML for Commercial Lending, AML for Capital Markets… • They are full of proprietary vendor and in-house built solutions that have been acquired over the years What I Have Seen at Banks
  • 7.
    Page7 © HortonworksInc. 2011 – 2015. All Rights Reserved Leads to Current State Complexity • Thousands of point-to point feeds to each enterprise system from each transaction system • Data is independently sourced leading to timing and data lineage issues • Close processes are complicated and error prone • Reconciliation requires a large effort and has significant gaps Book of Record Transaction Systems Enterprise Risk, Compliance and Finance Systems
  • 8.
    Page8 © HortonworksInc. 2011 – 2015. All Rights Reserved • Centralize business and operation functions • Incentivize technology to shrink not grow • Populate a data lake that with a set of canonical feeds from the transaction systems • Create a linearly scalable platform to host enterprise applications on top of this data lake, including hot, warm and cold computing zones • Develop a partnership with Hortonworks to evolve the platform to meet the Banking Industry’s needs How Do We Change?
  • 9.
    Page9 © HortonworksInc. 2011 – 2015. All Rights Reserved • A free open source linearly scalable platform has only become available within the last few years • Due to the amount of regulation over the last 15 years all bank enterprise compliance, risk and finance systems now function essentially the same way • Banks partnering with an open source partner is very different from partnering with a vendor who develops proprietary software • Proprietary software vendors will adopt the new standards since it is in their self interest to do so • Regulators can now streamline their regulatory practices by adopting this platform Why Will This Work Now?
  • 10.
    Page10 © HortonworksInc. 2011 – 2015. All Rights Reserved Open Platform for Risk & Compliance David Lattimore-Gay (Eikos Partners)
  • 11.
    Page11 © HortonworksInc. 2011 – 2015. All Rights Reserved What is OPRC ? • Leveraging Open Source – Large Development Community – Innovation – Reuse not build • Commodity hardware – Data Storage – In Memory Cache – Computing Power • Light weight way for ingesting and storing data – Single Source Data – Multiple views on the same data with Schema on Read capability – Built in data lineage • Unified development environment for analytics – Partnership between quants and IT developers that will allow IT to package analytics for deployment rather than recoding them
  • 12.
    Page12 © HortonworksInc. 2011 – 2015. All Rights Reserved Data Lake OPRC Originating Source System Source System Canonica l L1 Mapped to Schema Cache Reference / Transaction Data / Market Data / Static Data / Meta Data Reference Data Validation Normalization PK/AK What has changed ? Load L0 Raw Data Reporting Derived Data
  • 13.
    Page13 © HortonworksInc. 2011 – 2015. All Rights Reserved Compute Service Grid Web Services UI Framework In Memory API Data Fabric Desktop Mobile Data Access API Task Calculator Framework (Job Create/Monitor/Control) Task Task TaskTask Task Task Task Compute Service Strategy Engine Task Dependency Scheduler Task Dependency Scheduler Task Aggregation & Reporting Engine Data Lake Standardized Book of Record Transaction (BORT) & Other Systems Feeds ETL Framework OPRC High Level Architecture
  • 14.
    Page14 © HortonworksInc. 2011 – 2015. All Rights Reserved Technology Architecture Nadeem Asghar
  • 15.
    Page15 © HortonworksInc. 2011 – 2015. All Rights Reserved OPRC Functional Requirements LANDING DATA ZONE L0 STANDARDIZED DATA ZONE L1 CANONICAL DATA ZONE L2 Regulatory Reports Internal Reports External Reports Search REPORTING/ ANALYTICS ZONE L3 Golden Source & Feeds Master Data Contrats Balances Transaction Positions Factors/ Scenarios Market Data Unstructured Data Original Data RAW Data Standardized Data Materialized View Extract & Load Extract & Load Connectors/ Adaptors Standardized Data DQ & Validation Normalization Mapping & Schemas Data Cleansing Standardized Data Data Join/merging Data Transformation Business Rules Model Rep & Validation Canonical Position Data Canonical Transaction Data Data Aggregation Calculators Scenarios Run Revision Hisotry Scenarios Results Data Aggregations Analytics/ Reports Revision History Common Repositories/Meta Data Management Security Connectors Schema Mapping Catalogging End to End Lineage Tracing Authenication RBAC Authorization Encryption Audit
  • 16.
    Page16 © HortonworksInc. 2011 – 2015. All Rights Reserved Proposed Solution based on Hortonworks Stack Hortonworks Data Platform LANDING DATA ZONE L0 STANDARDIZED DATA ZONE L1 CANONICAL DATA ZONE L2 Regulatory Reports Internal Reports External Reports Search REPORTING/ ANALYTICS ZONE L3 Golden Source & Feeds Master Data Contrats Balances Transaction Positions Factors/ Scenarios Market Data Unstructured Data (hdfs) Original Data (hdfs) RAW Data hdfs) Standardized Data (Hive/orc) Materialized View (Hive/orc) sqoop/hadoop fs/ nfs Kafka/Storm Java/Scala Standardized Data (Hive/Orc) Hive/Spark/Scala Hive/Spark/Scala MHive/Spark/Scal Hive/Spark/Scala Standardized Data (Hive/Orc) Hive/Spark/Scala Hive/Spark/Scala Hive/Spark/Scala TBD?? Canonical Position Data (hive/orc) Canonical Transaction Data (Hive/orc) Hive/Spark Scala/Python/R etc Scala/Java Hive/Spark Scenarios Results (Hive/orc) Data Aggregations (Hive/orc/ Hbase) Analytics/ Reports (Hive/orc/ HBase) Revision History (Hive/orc) Common Repositories/Meta Data Management Security Apache Atlas/Falcon/ Custom Solution Apache Ranger/ Atlas and Custom/Partner Solution
  • 17.
    Page17 © HortonworksInc. 2011 – 2015. All Rights Reserved OPRC Game Plan: Solution Based on IP(Above the line)/ Open Source(Below the Line)
  • 18.
    Page18 © HortonworksInc. 2011 – 2015. All Rights Reserved Wrap up..
  • 19.
    Page19 © HortonworksInc. 2011 – 2015. All Rights Reserved Hadoop is a Platform Decision Adoption follows a consistent journey Data architecture efficiencies, new analytic apps, and ultimately to a “data lake”. HDP: A centralized architecture built on YARN Any application, any data, anywhere. HDP: A completely open data platform Platforms are ultimately defined by open communities. HDP subscription supports entire lifecycle World class experience to ensure success from architecture to production to expansion.
  • 20.
    Page20 © HortonworksInc. 2011 – 2015. All Rights Reserved Cautionary Statement Regarding Forward-Looking Statements This presentation contains forward-looking statements involving risks and uncertainties. Such forward-looking statements in this presentation generally relate to future events, our ability to increase the number of support subscription customers, the growth in usage of the Hadoop framework, our ability to innovate and develop the various open source projects that will enhance the capabilities of the Hortonworks Data Platform, anticipated customer benefits and general business outlook. In some cases, you can identify forward-looking statements because they contain words such as “may,” “will,” “should,” “expects,” “plans,” “anticipates,” “could,” “intends,” “target,” “projects,” “contemplates,” “believes,” “estimates,” “predicts,” “potential” or “continue” or similar terms or expressions that concern our expectations, strategy, plans or intentions. You should not rely upon forward-looking statements as predictions of future events. We have based the forward-looking statements contained in this presentation primarily on our current expectations and projections about future events and trends that we believe may affect our business, financial condition and prospects. We cannot assure you that the results, events and circumstances reflected in the forward-looking statements will be achieved or occur, and actual results, events, or circumstances could differ materially from those described in the forward-looking statements. The forward-looking statements made in this prospectus relate only to events as of the date on which the statements are made and we undertake no obligation to update any of the information in this presentation. Trademarks Hortonworks is a trademark of Hortonworks, Inc. in the United States and other jurisdictions. Other names used herein may be trademarks of their respective owners.

Editor's Notes

  • #12 Provides a single consolidated group of generic processors, running and open source data ingestion, persistence and parallel computing stack Eliminate computational grid licensing costs associated with Grid Computing technology Eliminate the data caching grid licensing costs associated with IMDGs Eliminate Relational Database and Data Appliance costs that are associated with Risk and Compliance applications. Eliminate the costs associated with ETL tools such as Data Stage and Ab Initio. Provides a unified development environment for analytics Use open source languages for analytic development such as R as well as supporting SAS and MatLab interfaces. Provide open source source code management and QA tools. Create a partnership between quants and IT developers that will allow IT to package analytics for deployment rather than recoding them Sharply decrease analytic development and test times by providing a calculator framework that provides data as a service Provides a light weight way for ingesting and storing data Store and record each risk and compliance groups transformations to the base data providing BCBS 239 compliance Support multiple views on the same data with Schema on Read capability Support multiple language access to the data via SQL, PIG, Scala and Java interfaces
  • #13 We use Cache to keep frequently access reference data for Normalization and Validation Data Green represents plugin’s to the flow, out of the box and also custom my business. Orange lines represent Connectivity back to the previous steps (Data Lineage) Is it possible that data loading and validation could be done in a MapReduce fashion ?? Need to be able to plug and play with the Source Systems, i.e. a new Source System can be added with no impact on existing feeds being consumed. Pluggable infrastructure L0 is OSS Specific, L1 defines the mandatory / optional columns needed for each entity type (trade, position, security), could be extended dependant on OSS, L2 is the final form, including any additional attributes, in a key value pair structure Entity, AttributeName, Value, DataType (+ Bi Temporal fields)
  • #14 High Level architecture, highlighting the components that are needed
  • #16 DLG: Diagram is too busy, there is a lot going on, so for me it just says busy
  • #17 DLG: again Diagram is too busy, there is a lot going on, so for me it just says busy
  • #20 Hundreds of organizations have turned to Hortonworks because Hadoop is ultimately a platform decision. It is typically the first step towards re-architecting your back end data systems. These organizations that have already been successful with Hadoop have required not just a stable, reliable and complete Hadoop solution, but more importantly a connection with the architects, builders and operators of this open source technology. They saw this in Hortonworks. And as with any platform decision, it is imperative that Hadoop integrates with the tools and systems that are already resident in your data center. We forge deep relationships with our hundreds of partners so that you can not only ensure integration but also effectively reapply existing systems and skillsets toward your big data challenges. At Hortonworks, we hold true to these foundational beliefs and have partnered with hundreds of organizations from some of the largest and earliest big data adopters to the most conservative and data rich companies on the planet. We ensure that your Hadoop journey is successful and more companies are turning to Hortonworks today than any other offering on the marketplace. We invite you to join our community.