Modern data analytics platforms that fuel enterprise-wide data hubs are critical for decision making and information sharing. The problem? Integrating legacy data stores into these hubs is just plain hard, and there is no magic bullet. However, the best data hubs include ALL enterprise data.
So how can you ensure that you are building the best modern data analytics platform possible?
Watch this webinar to learn more on:
- Best practices for integrating legacy data sources, such as mainframe and IBM i, into modern data analytics platforms such as Cloudera, Databricks, and Snowflake
- How Precisely Connect customers are incorporating legacy data sources into enterprise data hubs to inform strategic use cases such as claims, banking, and shipping experiences
26. What Can You Do with
Modern Analytics Platforms?
CentralizedBI
andanalytics
Data discovery Data democratization with
governance
Next-gen projects –AI and
ML
27. What are
the
benefits
of a modern
analytics
platform?
Visibility
intoall data Sets course
for real-time pipelines
Limits
skillsgaps
Removes
data silos
28. Reality is not so simple
Silos of multi-
structured data
Legacy IT
infrastructure
Employees
Data
archives
29. Value that Data
from Legacy
Systems Brings
• Holds importanttransactiondata
• Mostcorebusinessapplicationsrunningon legacy
systems
• High volumesof data
33. Shipping Company requires
real-time delivery status
Toplevel mandatedriven bycustomerdemandsto:
1. Integratecustomerandshipment informationthatresides on multiplesystemsofrecord
2. Improveintegrationofmainframesystems withanalyticsplatform
3. Replicatechanges ofmainframedatatolargerbusiness in real-time
Challenge: Mainframe data not readable for
downstream
tracking dashboards
34. Precisely makes mainframe data
readable in Snowflake for real-
time tracking
Solution
• Connect (ETL +CDC)
• Snowflake
Results
• Power business user andcustomer dashboards with the
latest shipment information
• Report shipment information in ways that give business
competitive edge
• Integrate andreplicate hundreds ofz/OS Db2 tables to
Snowflake
• All data is integrated andreadable across platforms
36. Creating enterprise claims hub,
required quickly adding new targets
Strategicdecision tousedatato:
1. Improvethe claims experienceforend customers
2. Identifyofpatternsin claims toalertthe businessto unexpectedsevere claims
3. Automatethe fast-trackingoflow dollarclaims withoutthe need foranadjuster
Challenge: Current methods of integrating
mainframe data
37. Precisely and Databricks helps to
create high performance data hub
Solution
• Connect (ETL)
• Databricks
Results
• No downtime or reworkfor implementing a new approach to
legacy source integration
• Ability to meet requirements of high-volume processing fordata
hub
• Faster time toclose claims andimproved customer experiences
39. Financial Services Company needs to
build a real-time AML process
Toplevel mandatedriven byregulatorydemandsto:
1. Have consolidated,clean, verified dataforall analyticsandreporting
2. Providealertstoanysuspiciousactivityin real-time
3. Integratemainframedatatoanalyticsbutalsomaintainanunmodifiedcopyofmainframedatastored
Challenge: Disparate systems and slow time to update
mainframe data
caused major process delays in
meeting AML monitoring
40. Precisely and Cloudera enable AML
with timely delivery
Solution
• Connect (ETL +CDC)
• Trillium
• Cloudera
Results
• High performance AMLresults
• Faster time tovalue
• Data lake is trusted source
• Data feeding critical machine
learning-based fraud detection
Looking forward…
• Expanding toadditional Customer Engagement solutions and
applications
42. Credit Union looks to enable a data
hub for all lines of business
Toplevel mandatetoopenup dataacrossthe organization:
1. Toimprovecustomer bankingexperiences
2. Providetransparencyofdatato lines ofbusiness foranalyticsandBI
3. EnableAI/ML usecaseswithricherlegacy datasets
Challenge: Core banking functions run on mainframe but
lack of skills in
house incurred high development
costs and made it difficult
to scale
43. By the numbers…the cost of legacy
data
$95
per hour
40
hour work week
$3800
cost per week
6 months
average project
time
2
programmers
$7600
cost per week
$7600
cost per week
$197,600
cost per project
X X
X =
= =
44. Connect’s ETL helps to lower
costs and solve skills gap
Solution
• Connect (ETL)
Results
• Reduced costs todevelopment
• Leverage existing skills in house andenable
• Delivers all enterprise data for distribution across an proprietary
analytics platform
45. What you can do in the next 90
days…
• Assesshowyouarecurrentlyusing mainframeandIBM i datatoday
• Lookatwaysin which youcanleverage datafromlegacy systemstomaximizeimpact
• Keep bothbest practicesandlessons learned in mind whendeveloping yourapproach
• Remember Precisely is heretobeyourpartnerin innovation!
Centralized business insights – central management of business insights, helps to shift insights from one offs in isolation to
Data discovery - business end-users can work with large data sets and get answers to questions they are asking. Data Discovery is helping the enterprise lose some of the bulk when it comes to running analytics.
Data democratization – enables more users to have autonomy with data but without the risk of exposing sensitive data in a way that could violate regulations or internal best practices
Visibility into all data – it provides views that make data look simpler and more unified than it actually is in today's complex, multiplatform data environments
Sets course for real-time pipelines - the modern hub, it regularly instantiates data sets quickly on the fly. It may also handle terabyte-scale bulk data movement. Either, way a modern data hub requires modern pipelining for speed, scale, and on-demand processing.
Limits skills gas - The IT world is full of old-fashioned data hubs that are homegrown or consultant-built. Support advanced forms of orchestration, pipelining, governance, and semantics, all integrated in a unified tools
Removes data silos - Again, this is accomplished without consolidating silos. Think of the data views, semantic layers, orchestration, and data pipelines just discussed. All these create threads that weave together into a data fabric, which is a logical data architecture for all enterprise data that can impose functional structure over hybrid chaos
When it comes to building up unified analytics platforms there is a level of complexity that exists across an enterprise
We have silos of multi-structured data difficult to integrate (ERP, CRM, mainframes, RDBMS, Files, logs, cloud data sources)
heterogeneous legacy IT infrastructure (EDWs, data lakes, marts, severs, storage, archives and more)
and thousands maybe more of employees and lots of inaccessible information
Your traditional systems – including mainframes, IBM i servers & data warehouses – adapt and deliver increasing value with each new technology wave
Even with the growth of next-gen technologies, legacy systems (i.e. mainframes and IBM i) still play an important role within many businesses. More than 70% of Fortune 500 enterprises continue to use mainframes for their most crucial business functions. Mainframes often hold critical information – from credit card transactions to internal reports.
Most large enterprises have made major investments in mainframe data environments over a period of many years and will not be leaving these investments anytime soon. It is estimated that 2.5 billion transactions are run per day, per mainframe across the world.
This high volume of data is one that organizations cannot choose to ignore or neglect. Additionally, mainframes often have no peer when it comes to the volume of transactions they can handle and cost-effectiveness.
As a result, these environments contain the data that organizations run on, and in turn, power the strategic big data initiatives driving the business forward – machine learning, AI and predictive analytics.
Business insights, artificial intelligence and machine learning efforts are only as good as the data that is being fed in and out of them. Leaving mainframe data out of the equation when building strategic initiatives risks omitting critical information that could greatly influence business outcomes.
Specifically, neglecting mainframe data from strategic initiatives results in:
• The value of an organization’s big data investments being diminished
• Analytics that are not accurate or complete• Large, rich enterprise datasets that never even get analyzed
So how do we get around these and make a true enterprise data hub? Let’s take a look
Break down legacy data silos – removing the barriers that come with accessing and integrating data from legacy data stores, mainframe, IBM i and more
Rethink – sometimes you might be already doing something with legacy data, you have the access but the needs of the organization may be changing causing you to think about how you might implement a new solution in line with or to replace existing
Real-time, data is only as good as how quickly it is delivered, to do this you need to have a way to build real-time delivery of changes in legacy systems to
One of the biggest hinderances to unified analytics hubs can be the lack of skills or costs associated with accessing legacy data
This company wants to vastly improve its tracking and package visibility. They feel that they need to offer customers more visibility into the movement of goods. Pushing the status of goods to customer dashboards will give them the ability to provide more real-time location and updates to transit time and delivery.
This concept is familiar to consumer shipping, we know when and where our package is in real-time, not so much when it comes to freight. To accomplish this, they needed data from disparate data sources, including DB2/z and SQL Server. They connect their legacy sources and their target Snowflake.
Repeatable
An American insurance company wanted to take a variety of data from across their organization to build an enterprise-wide claims data lake. The purpose of the claims data lake was to receive data from across the lines of business and improve analysis of customer activity, historical data, and richer analytics. In its ideal scenario, the claims data would help identification of patterns in claims to alert the business to unexpected severe claims or to automate the fast-tracking of low dollar claims without the need for an adjuster.
Data funneling into the hub would include information from core systems such as actuary, call center, claims, and billing different departments. Most of this data existed on mainframes. Mainframe data file formats included EBCDIC-encoded VSAM data with binary and packed data types mapped by multiple complex copybooks. When it came time to integrate all these data sources, the insurance company struggled to get data from the mainframe to its data lake. Getting mainframe data into the data lake meant that they had to spin up an entirely separate process for data ingestion. As a result, the insurance company had a siloed process that caused lost time, delayed delivery, and incomplete claims analytics.
Once mainframe data ingest was complete, the insurance company then needed to modernize its ETL processes to scale within Databricks. The insurance company had been using Precisely Connect with Spark on Azure HDInsights for ETL transformation on its claims data hub data and determined a need to move these existing workflows into Databricks. However, the insurance company did not want to perform any rework to their data integration workflows, especially as many had complex data transformations upon the mainframe data.
Using Precisely Connect, the insurance company built ETL processes that took a design-once, deploy anywhere approach, and as a result, had no rework or redesigns required to migrate the Azure HDInsights pipelines to run on Databricks. Data migration from Hive on HDInsights to Delta Lake was achieved via JDBC connectivity and the Precisely Connect high-performance integration engine to sufficiently parallelize the data load. Furthermore, Precisely Connect was able to produce the high-performance, self-tuning sorts, joins, aggregation, merges, and look-ups required for the organization to get the data they needed in the right way. Precisely Connect’s ability to run natively in the Databricks run-time also ensured they were able to optimize the data integration workflow for the high-volume requirements of the claims data hub.
Meet AML transaction monitoring and Financial Conduct Authority (FCA) compliance
Challenges
Data volume too large, diversely scattered to analyze
Disparate data sources – Mainframe, RDBMS, Cloud, etc.
Maximize the value/ROI of the data lake
Requirements:
Consolidated, clean, verified data for all analytics and reporting.
MUST have complete, detailed data lineage from origin to end point
MUST be secure: Kerber-ose and LDAP integration required
Need unmodified copy of mainframe data stored on Hadoop for backup, archive
Connect to create “Golden Record” on Hadoop for compliance archiving
Trillium for cluster-native data verification, enrichment, and demanding multi-field entity resolution on Spark framework
Cloudera provides end….
Full end-to-end lineage from all sources, through transformations, to data landing,
Benefits:
Ensure Anti-Money Laundering regulatory compliance is met through financial crimes data lake – high performance results at massive scale.
Achieve fast time to value with flexible deployment and ease of use
Ensure the data lake is trusted source of data feeding critical machine learning-based fraud detection
Expanding use to additional Customer Engagement solutions and applications.
Needed to access Db2 and VSAM files need to be accesses for AI/ML use cases
Current solution that they had for DI was complex and not dynamic
Connect helped to extract COBOL program on mainframe making it scalable for big data platforms
Decided to attempt doing the work in house with contractors…..
Per project prior to Connect required 1-2 programmers @ $95/per hour they were hired for 6-8 months, roughly cost savings is $104K per project – could not quantify the overhead related to systems
Assuming 26 weeks in a 6 month period