DMT 3260
Citizens Bank Data Lake Implementation: Selecting
BigInsights ViON Spark/Hadoop Appliance
Dana Rafiee, Destiny Corporation
John DiFranco, Citizens Bank
DMT 3260
Order of Presentation
Destiny Background
The Data Scientist
Client Infrastructure Challenges
Tools Used at Clients
Client Architecture Case Studies
Citizens Bank
Financial Processing Organization
DMT
Citizens Bank, formerly part of the Royal Bank of Scotland, is implementing
a BigInsights Hadoop Data Lake with PureData System for Analytics
(Netezza) to support all of its internal data initiatives. The goal is to provide
an improved experience for customers and to grow market share. Along
their ETL journey, we’ve used Netezza SQL, Hadoop and finally IBM
BigIntegrate and BigInsights. Testing BigIntegrate on BigInsights yielded the
productivity, maintenance and performance that Citizens was looking for,
and this all came prepackaged in the the ViON Hadoop Appliance that was
rolled into its data centers—greatly simplifying entry into the Hadoop world
Abstract
DMT 3260
Destiny Background
• Business and Technology Consulting Firm
• Advising Fortune 500 Corporations for 30 years
• Build Data Lakes, Warehouses, Reporting and
Analytics environments for large corporations and
government
• Business Consultants
• Data Warehouse/Modeling Specialists
• Advanced Analytic Practitioners
• SAS and IBM Business Partner
• Objective Opinions
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Who is the Data Scientist?
• Data science is an interdisciplinary field about
processes and systems to extract knowledge or
insights from data in various forms, either structured
or unstructured.
• Statistics
• Machine learning
• Data mining
• Predictive analytics
• “Data Scientist is the new title for the Analyst”
• Paul Kent, VP of Big Data at SAS Institute
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Requirements of the Data Scientist Community
• Immediate access to data no matter where it exists
• Simple access to systems
• Legacy and Open Community Tools
• Ample resources to do their work
• Ability to store analytical results
• Fast Execution
• Access to In-House Data and External Data
• Nimble IT shop or I will find another option (Cloud)
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Why is the Playing Field Different Today?
• Legacy Data and Systems
• OLTP Systems of Record
• Mainframes
• Data Warehouses and Marts
• Dark Data (Archived)
• New Data Sources
• Social Media
• Internet of Things
• Streaming Data
• Data Brokers – Search Yourself?
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Some Big Data Use Cases
• Macy’s Inc. - Real-Time Pricing on 73 Million items based on demand and inventory.
• Tipp24AG - Betting on European lotteries with predictive analytics, building models in less than 10% of the time.
• Walmart – Text Analytics, machine learning and synonym mining to produce relevant web site search results increasing
conversions by 10-15%.
• Fast Food and Digital Menus – Long drive through lines display quick delivered items, while short lines display higher margin
items that take longer to prepare.
• Morton’s Steak House – For a publicity stunt, analyzed tweets about Morton’s, matched data to a frequent Morton’s diner and
then delivered him dinner has he landed in the airport.
• PredPol Inc. – Los Angles and Santa Cruz Police use data about earthquakes and crime to predict where crimes will happen
after an earthquake. There is up to a 33% reduction in crimes.
• Tesco PLC – Track 70 million refrigerator data points to be more proactive with maintenance and cut down energy costs.
• American Express – Predicting and reducing customer churn through analysis of historical buying patterns.
• Express Scripts Holding Co. – Through analysis, determined people were forgetting to take their medications. Invented beeping
medicine capsules and implemented automated phone calls.
• Infinity Property and Casualty Corp.- Re-analyzing dark data on claims now allow them to recover $12M in subrogation claims.
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
IT’s Challenges in Supporting the Data Scientists
• Building Proper Infrastructure to Support the Business
– Timely Access to data and systems
– Simple to use
– Open to new technologies and capabilities
– Accurate data
– Current data to support business needs
– Powerful enough to crunch all the data
– Fast or Cheap
– Robust and Reliable in an Open Environment
– On-Premise or Cloud or Hybrid
– Support Mandated Regulations
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
The Traditional IT Architecture
Mainframe Data WarehouseData Input Analyst
Information
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Why is it Not Enough?
• Inflexible
• Cannot capture new forms of data
• Cannot easily analyze new forms of data
• Cannot economically handle large data
volumes
• Cannot easily integrate with the Open
Community
• Long Lead Times for IT Projects
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Designing the New Infrastructure
• New Non-Standard Data Sources
• Structured
• Unstructured
• Streaming
• NOSQL forms
• External Sources
• Ability to Land All Data Economically
• Let the business decide what data is required
• New Analytics Requirements
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Some IT Infrastructure Considerations
• Limited Budgets and Resources
• Master Data Management
• Hadoop
– Bronze, Silver, Gold
– Single copy of the Data
– Spectrum Scale/GPFS
– Other Options
• Storage Mechanisms
– Elastic Storage Server
– DS8800, XIV
– Flash
• Types of Queries
• Historical Information
• Speed of Processing
– Fast, Expensive
– Slow, Cheap
• Location
– On-Premise
– Cloud
• Mobile Device Requirements
• Virtual Desktop
• Keeping Data In-Sync – Production and DR
– Update Strategies
– Replication Strategies
– Database
– SAN Store Utilities
• Data In-Flight
• Data Lineage
• Appliances
– PDA/Netezza
– SAP/Hana on Power
– DB2 Blu – On Premises
– DataAdapt Spark Hadoop Appliance
(BigInsights)
• Grid Processing
• Regulatory Compliance
• Data Governance
• In-house maintenance or Managed Service
• IEEE 802.3ba 40GbE, Direct Attached SAN,
NAS
• Politics
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Data Classifications
0
0.5
1
1.5
2
2.5
3
Bronze Silver Gold
Volume
Data Scientist Power User BI End User
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Discovery and Transformation of Data
• Tools to Analyze and Transform Data
– Data Stage
– Podium
– Trillium
– DataFlux
– Informatica
– Talend
• User Tools to Gain Insight into the Data
– Watson Explorer
– Attivio
• In-Database
• In Memory and Machine Learning
– Apache Spark – Micro Batches
– Apache Flink – Streaming Data Flow Engine and Memory
Management
• Other
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Building Analytics Processes and the
Challenges
• Three Categories
– Ad Hoc
– Standard Analysis and Reporting
– Statistical Models
• Challenges for IT
– Skill Sets of the Data Scientist and Power Users
– Playing Nicely Together
– Structure of the Data – Data Modeling vs. SQL Tools
– Location and Movement of the Data
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Case Studies
• Citizens Bank BigInsights Deployment
• Global Financial Advisors Deployment
• Financial Processing Organization Design
Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
DMT 3260
Citizens Bank Original Environment
• Teradata Data Warehouse
• Raw Data and History (Staging from record systems)
• Conformed Data to a Data Model (Mapped to industry standard model)
• Data Marts (Fit for purpose business specific)
DMT 3260
Challenges with the Teradata Environment
• Processing on Teradata was slow due to:
• Traditional Teradata Data Warehouse Framework
• Reference Model
• Slow Time to Market
• Extremely Expensive in Labor Costs
• Extremely Expensive to add Additional Computing Capacity
• System and SAS costs increasing
DMT 3260
Looking for Alternatives
• Execution of an information Proof of Concept
• IBM
• Oracle
• Cloudera
• Hortonworks
DMT 3260
Conclusions and Choices Made
• The IBM BigInsights Appliance is the most cost effective
• Minimal engagement from internal infrastructure organization
• Delivered fully assembled with hardware and software
• Appliance Model value proposition similar to a Netezza Appliance
DMT 3260
Standard Tools at Citizens
• IBM BigSQL
• assurance that standard tools would work well with (DB2 LUW V 10.5)
• All products support this platform
• Oracle OBI-EE – Operational Reporting
• SAS for Statistical Modeling
• Tableau for Visual Reporting
• Datastage for ETL – centralized application development model
• Spectrum Scale(GPFS) vs. Hadoop for better management of the data
and less raw storage
• Fluid Query for connections to BigInsights
DMT 3260
POC on BigInsights Appliance
• Datastage processing running on Teradata was moved to BigInsights
• Client Connectivity, queries, testing and validation
• Proved that the platform could be used as the server and storage to run
enterprise data stage processing
DMT 3260
Results
• Moved Analytics processing from Teradata to Netezza
(cost/performance)
• Increase in SAS performance by running in Netezza database
• Repurposed some SAS costs
• Reduced data warehouse admin support costs (Teradata DBAs
reallocated)
• Implemented BigInsights Hadoop for a data lake (staging and
conformity)
• Avoided large capital outlays for additional Teradata capacity
• Reduction in Labor Effort to use the new platforms
DMT 3260
Future Plans
• Evaluating and Planning Implementation of dashDB (Bridge to Cloud) to
move some items to Cloud
• Instead of paying for another year of S&S, using the funds for Bridge to
Cloud
• Attractive price point
• Adding new applications (Risk) to Netezza and the Data Lake
DMT 3260
Complimentary Consultation
o Contact Us at: info@destinycorp.com
• Discovery Session
• Analysis of Architecture
• Business Process
• Governance
• High Level Recommendations
DMT 3260
Questions and Answers
DMT 3260
Contact Information
Dana Rafiee
Managing Director
Destiny Corporation
860-721-1684 x2007
drafiee@destinycorp.com
www.destinycorp.com
John DiFranco
SVP - Director of Enterprise Data Management
Citizens Bank
John.difranco@citizensbank.com
www.citizensbank.com
781-655-4489
Thank you for your time

Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Hadoop Appliance

  • 1.
    DMT 3260 Citizens BankData Lake Implementation: Selecting BigInsights ViON Spark/Hadoop Appliance Dana Rafiee, Destiny Corporation John DiFranco, Citizens Bank
  • 2.
    DMT 3260 Order ofPresentation Destiny Background The Data Scientist Client Infrastructure Challenges Tools Used at Clients Client Architecture Case Studies Citizens Bank Financial Processing Organization
  • 3.
    DMT Citizens Bank, formerlypart of the Royal Bank of Scotland, is implementing a BigInsights Hadoop Data Lake with PureData System for Analytics (Netezza) to support all of its internal data initiatives. The goal is to provide an improved experience for customers and to grow market share. Along their ETL journey, we’ve used Netezza SQL, Hadoop and finally IBM BigIntegrate and BigInsights. Testing BigIntegrate on BigInsights yielded the productivity, maintenance and performance that Citizens was looking for, and this all came prepackaged in the the ViON Hadoop Appliance that was rolled into its data centers—greatly simplifying entry into the Hadoop world Abstract
  • 4.
    DMT 3260 Destiny Background •Business and Technology Consulting Firm • Advising Fortune 500 Corporations for 30 years • Build Data Lakes, Warehouses, Reporting and Analytics environments for large corporations and government • Business Consultants • Data Warehouse/Modeling Specialists • Advanced Analytic Practitioners • SAS and IBM Business Partner • Objective Opinions Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 5.
    DMT 3260 Who isthe Data Scientist? • Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured. • Statistics • Machine learning • Data mining • Predictive analytics • “Data Scientist is the new title for the Analyst” • Paul Kent, VP of Big Data at SAS Institute Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 6.
    DMT 3260 Requirements ofthe Data Scientist Community • Immediate access to data no matter where it exists • Simple access to systems • Legacy and Open Community Tools • Ample resources to do their work • Ability to store analytical results • Fast Execution • Access to In-House Data and External Data • Nimble IT shop or I will find another option (Cloud) Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 7.
    DMT 3260 Why isthe Playing Field Different Today? • Legacy Data and Systems • OLTP Systems of Record • Mainframes • Data Warehouses and Marts • Dark Data (Archived) • New Data Sources • Social Media • Internet of Things • Streaming Data • Data Brokers – Search Yourself? Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 8.
    DMT 3260 Some BigData Use Cases • Macy’s Inc. - Real-Time Pricing on 73 Million items based on demand and inventory. • Tipp24AG - Betting on European lotteries with predictive analytics, building models in less than 10% of the time. • Walmart – Text Analytics, machine learning and synonym mining to produce relevant web site search results increasing conversions by 10-15%. • Fast Food and Digital Menus – Long drive through lines display quick delivered items, while short lines display higher margin items that take longer to prepare. • Morton’s Steak House – For a publicity stunt, analyzed tweets about Morton’s, matched data to a frequent Morton’s diner and then delivered him dinner has he landed in the airport. • PredPol Inc. – Los Angles and Santa Cruz Police use data about earthquakes and crime to predict where crimes will happen after an earthquake. There is up to a 33% reduction in crimes. • Tesco PLC – Track 70 million refrigerator data points to be more proactive with maintenance and cut down energy costs. • American Express – Predicting and reducing customer churn through analysis of historical buying patterns. • Express Scripts Holding Co. – Through analysis, determined people were forgetting to take their medications. Invented beeping medicine capsules and implemented automated phone calls. • Infinity Property and Casualty Corp.- Re-analyzing dark data on claims now allow them to recover $12M in subrogation claims. Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 9.
    DMT 3260 IT’s Challengesin Supporting the Data Scientists • Building Proper Infrastructure to Support the Business – Timely Access to data and systems – Simple to use – Open to new technologies and capabilities – Accurate data – Current data to support business needs – Powerful enough to crunch all the data – Fast or Cheap – Robust and Reliable in an Open Environment – On-Premise or Cloud or Hybrid – Support Mandated Regulations Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 10.
    DMT 3260 The TraditionalIT Architecture Mainframe Data WarehouseData Input Analyst Information Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 11.
    DMT 3260 Why isit Not Enough? • Inflexible • Cannot capture new forms of data • Cannot easily analyze new forms of data • Cannot economically handle large data volumes • Cannot easily integrate with the Open Community • Long Lead Times for IT Projects Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 12.
    DMT 3260 Designing theNew Infrastructure • New Non-Standard Data Sources • Structured • Unstructured • Streaming • NOSQL forms • External Sources • Ability to Land All Data Economically • Let the business decide what data is required • New Analytics Requirements Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 13.
    DMT 3260 Some ITInfrastructure Considerations • Limited Budgets and Resources • Master Data Management • Hadoop – Bronze, Silver, Gold – Single copy of the Data – Spectrum Scale/GPFS – Other Options • Storage Mechanisms – Elastic Storage Server – DS8800, XIV – Flash • Types of Queries • Historical Information • Speed of Processing – Fast, Expensive – Slow, Cheap • Location – On-Premise – Cloud • Mobile Device Requirements • Virtual Desktop • Keeping Data In-Sync – Production and DR – Update Strategies – Replication Strategies – Database – SAN Store Utilities • Data In-Flight • Data Lineage • Appliances – PDA/Netezza – SAP/Hana on Power – DB2 Blu – On Premises – DataAdapt Spark Hadoop Appliance (BigInsights) • Grid Processing • Regulatory Compliance • Data Governance • In-house maintenance or Managed Service • IEEE 802.3ba 40GbE, Direct Attached SAN, NAS • Politics Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 14.
    DMT 3260 Data Classifications 0 0.5 1 1.5 2 2.5 3 BronzeSilver Gold Volume Data Scientist Power User BI End User Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 15.
    DMT 3260 Discovery andTransformation of Data • Tools to Analyze and Transform Data – Data Stage – Podium – Trillium – DataFlux – Informatica – Talend • User Tools to Gain Insight into the Data – Watson Explorer – Attivio • In-Database • In Memory and Machine Learning – Apache Spark – Micro Batches – Apache Flink – Streaming Data Flow Engine and Memory Management • Other Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 16.
    DMT 3260 Building AnalyticsProcesses and the Challenges • Three Categories – Ad Hoc – Standard Analysis and Reporting – Statistical Models • Challenges for IT – Skill Sets of the Data Scientist and Power Users – Playing Nicely Together – Structure of the Data – Data Modeling vs. SQL Tools – Location and Movement of the Data Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 17.
    DMT 3260 Case Studies •Citizens Bank BigInsights Deployment • Global Financial Advisors Deployment • Financial Processing Organization Design Copyright © 2016 Destiny Corporation – Business and Technology Consulting - www.destinycorp.com
  • 18.
    DMT 3260 Citizens BankOriginal Environment • Teradata Data Warehouse • Raw Data and History (Staging from record systems) • Conformed Data to a Data Model (Mapped to industry standard model) • Data Marts (Fit for purpose business specific)
  • 19.
    DMT 3260 Challenges withthe Teradata Environment • Processing on Teradata was slow due to: • Traditional Teradata Data Warehouse Framework • Reference Model • Slow Time to Market • Extremely Expensive in Labor Costs • Extremely Expensive to add Additional Computing Capacity • System and SAS costs increasing
  • 20.
    DMT 3260 Looking forAlternatives • Execution of an information Proof of Concept • IBM • Oracle • Cloudera • Hortonworks
  • 21.
    DMT 3260 Conclusions andChoices Made • The IBM BigInsights Appliance is the most cost effective • Minimal engagement from internal infrastructure organization • Delivered fully assembled with hardware and software • Appliance Model value proposition similar to a Netezza Appliance
  • 22.
    DMT 3260 Standard Toolsat Citizens • IBM BigSQL • assurance that standard tools would work well with (DB2 LUW V 10.5) • All products support this platform • Oracle OBI-EE – Operational Reporting • SAS for Statistical Modeling • Tableau for Visual Reporting • Datastage for ETL – centralized application development model • Spectrum Scale(GPFS) vs. Hadoop for better management of the data and less raw storage • Fluid Query for connections to BigInsights
  • 23.
    DMT 3260 POC onBigInsights Appliance • Datastage processing running on Teradata was moved to BigInsights • Client Connectivity, queries, testing and validation • Proved that the platform could be used as the server and storage to run enterprise data stage processing
  • 24.
    DMT 3260 Results • MovedAnalytics processing from Teradata to Netezza (cost/performance) • Increase in SAS performance by running in Netezza database • Repurposed some SAS costs • Reduced data warehouse admin support costs (Teradata DBAs reallocated) • Implemented BigInsights Hadoop for a data lake (staging and conformity) • Avoided large capital outlays for additional Teradata capacity • Reduction in Labor Effort to use the new platforms
  • 25.
    DMT 3260 Future Plans •Evaluating and Planning Implementation of dashDB (Bridge to Cloud) to move some items to Cloud • Instead of paying for another year of S&S, using the funds for Bridge to Cloud • Attractive price point • Adding new applications (Risk) to Netezza and the Data Lake
  • 26.
    DMT 3260 Complimentary Consultation oContact Us at: info@destinycorp.com • Discovery Session • Analysis of Architecture • Business Process • Governance • High Level Recommendations
  • 27.
  • 28.
    DMT 3260 Contact Information DanaRafiee Managing Director Destiny Corporation 860-721-1684 x2007 drafiee@destinycorp.com www.destinycorp.com John DiFranco SVP - Director of Enterprise Data Management Citizens Bank John.difranco@citizensbank.com www.citizensbank.com 781-655-4489 Thank you for your time