1. It is now a well-documented realization among Fortune 500 companies and
high-tech start-ups that Big Data analytics can transform the enterprise, and
organizations which lead the way will drive the most value.
But where does that value come from and how is it sustained? Is it just from
the data itself? No.
The real value of Big Data does not come from the data in its raw form, but
from its analysis - the insights derived, the products created, and the services
that emerge.
Big Data allows for dramatic shifts in enterprise level decision making and
product/ service innovation, but in order to reap the real rewards of it,
organizations must keep pace at every level, from management approaches to
technology and infrastructure.
As your business increasingly demands more and more from your data,
chances are strong that your existing data warehouse is also near capacity. In
fact, according to Gartner, 70% of all data warehouses are straining the limits
of their capacity and performance levels. If this is true for you, it is time to
modernize your data warehouse environment.
This paper addresses the need to modernize today’s data warehouse
environment and outlines best practices and approaches.
White Paper
Guide to Modernize Your
Enterprise Data Warehouse
Motivation for Modernization
Enterprise data warehouses were originally created for exploration and
analysis, but with the arrival of Big Data, they have frequently become archival
data repositories. And, what’s worse is that for many organizations getting data
into them requires expensive, time-consuming ETL extraction, transformation
and loading work.
Does This Sound Like You?
How to Migrate to a Hadoop-based Big Data Lake
2. Make a Move to Modern Data Architecture
The standard analytics environment at the majority of enterprise-level
companies includes the operational systems that serve as the sources for
data; a data warehouse or group of associated data marts which house and
sometimes integrate the data for a range of analysis functions; and a set of
business intelligence and analytics tools that enable insight discovery and
decision making from the use of queries, visualization, dashboards, and data
mining.
Most big companies have invested millions of dollars in their analytics
ecosystems. This includes hardware platforms, database systems, ETL
software, analytics tools and BI dashboards middleware, as well as storage
systems, all with their attendant maintenance contracts and software
upgrades.
Ideally, these environments have given enterprises the power to understand
their customers and, as a result, also helped them streamline their business
and even optimize their product and enhance their brands.
However, in the worst case scenario, current data warehouse infrastructure is
not able to affordably scale to deliver on the full promise and value of Big Data.
Enterprises today have data warehouse modernization programs in place to
find a way to combine the best of their legacy data warehouse with the new
power of Big Data technology to create a best-of-both-worlds environment.
Impetus can help
Our experienced team of experts deliver a repeatable methodology to provide
a customizable range of services including assessment and planning,
implementation and data quality validation to support their data warehouse
modernization programs.
If you need to modernize your data architecture, your foundation will no doubt
begin with Hadoop. It is as much a must-have as it is a game-changer from an
IT and a business perspective.
Hadoop is a cost-effective, scale-out storage system with parallel computing
and analytical capability. It simplifies the procurement and storage of diverse
data sources, whether structured, semi-structured (e.g., sensor feeds,
machine data), or unstructured (e.g., web logs, social media, image, video,
audio).
It has become the framework of choice to accelerate time-to-insight, and
reduce the overall costs of managing data. Hadoop will play a positive and
profound role on your long-term data storage, management and analysis
capabilities, and in realizing the critical value of your data to sustain
competitiveness.
While Hadoop ecosystem offers powerful capabilities and virtually unlimited
horizontal scalability, it does not provide the complete set of functionality you
need for enterprise-level, Big Data analysis.
3. Implementing the Data Lake
With large teams of engineers and analysts, these gaps must be filled through
complex manual coding and large support teams. This slows Hadoop
adoption and can frustrate management teams who are eager to derive and
deliver results.
Impetus can help
Impetus offers a comprehensive end-to-end Data Warehouse Workload
Migration (WM) solution that allows you to identify and safely migrate data,
perform ETL processing and enable large scale analytics from the enterprise
data warehouse (EDW) to a Hadoop-based Big Data warehouse.
Furthermore, WM not just seamlessly moves schema, data, views etc. but
also transforms Procedure Language Scripts and migrates complete Role
Based Access Control (RBAC) and reports. This ensures that you reap
modern big data warehousing benefits along with protecting and using your
investments on existing traditional RDBMS and other information
infrastructure.
Adopting Hadoop involves introducing a Data Lake into your analytics
ecosystem.
The Data Lake can serve as your organization’s central data repository. What
makes the Data Lake a unique and differentiated repository framework is its
ability to unify and connect your data. It helps you access your entire body of
data simultaneously, unleashing the true power of Big Data — a co-related
and collaborative output of superior insights and analysis. It presents you with
a dynamic scenario where one can dictate a variety of need-based analysis
made possible by this unstructured repository.
While there are many purposes it can serve, such as feeding both your
production and sandbox environments, the first step and most immediate
opportunity is often the off-loading of the ETL (extract, transform, and load)
routines from the traditional data warehouse.
Building a robust Data Lake is a gradual movement. With the right tools, a
clearly-planned platform, a strong and uniform vision that includes innovation
around advanced analytics, your organization can architect an integrated,
rationalized and rigorous Data Lake repository.
Impetus can help
We specialize in modernizing the data warehouse and implementing data
lakes. We have experience with every stage of the Big Data transformation
curve. We enable you to:
• Work with unstructured data.
• Facilitate democratized data access.
• Apply Machine Learning algorithms to enrich data quality.
• Contain costs while continuing to do more with the data.
• Ensure that you do not end up in a data swamp.
4. Four Steps to Building a Data Lake
Challenges in Migrating to the Data Lake
Step 1: Acquire & Transform Data at Scale
This first stage involves putting the architecture together and learning to
handle and ingest data at scale. At this stage, the analytics consist of simple
transformations; however, it’s an important step in discovering how to make
Hadoop work for your organization.
Step 2: Focus on Analysis
Now you’re ready to focus on enhancing data analysis and interpretation. To
fully leverage the Data Lake, you will need to use various tools and
frameworks to begin combining and integrating the EDW and the Data Lake.
Step 3: Collaborate
This is where you will start to witness a seamless synergy between the EDW
and the Hadoop-based Data Lake. The strengths of each architecture will
begin to make themselves visible in your organization as this porous, all-
encompassing data pool allows analytics and intelligence to flow freely
across your enterprise.
Step 4: Unify
In this last stage, you reach maturity, tying together enterprise capabilities
and large-scale unification from information governance, compliance,
security, and auditing to the management of metadata and information
lifecycle capabilities.
Impetus can help
Workload Migration includes an auto-recommendation engine that helps in
intelligent migration by suggesting various recommendations around offload-
able parameters and metrics. This helps in optimizing the schema, synergize
and effectively form the data lake. Right from clustering, partitioning to
splitting the schema and data to recommendations on offload-able tables,
queries, optimization parameters, query engine and other capabilities.
Setting up a Hadoop-based Data Lake can be challenging for organizations
who do not have experience migrating Big Data. Organizations often
encounter some of the following challenges:
•
Identifying which data sources to offload•
Data validation and quality checks
• Issues with SQL compatibility
• Lack of available user defined functions in Hadoop libraries
• Lack of procedural support
• Workflows locked in proprietary data integraton tools
• The high costs and effort of migration
• Exception handling
• Lack of unified view and dashboard to offload data
• Governance controls on migration system and data
5. Ingests data rapidly via our fast, fault tolerant, parallel data ingestion
component.
Transforms SQL and procedural SQL from RDBMS, MPP, and other
database to compatible HQL and Spark QL queries. Using our
foundational, intelligent transformation engine.
Provides a smart User Interface that allows you to effortlessly orchestrate
migration pipelines in just a few clicks.
Integrates with your firm’s LDAP to allow Single Sign-on capabilities for
your users.
Delivers rapid response times and performance you can count on through
our integrated cache.
Tracks all metadata in source and target data stores.
Provides strict governance controls including access, roles, and security
that can be built into the migration process to keep your data safe.
Caters to a multitude of data sources to bring data in seamlessly and
safely after data validation and quality checks.
Runs checks and balance on data migration using our library of data
quality and data validation algorithms available as operators.
Offload Teradata, SQL server and DB2 Views easily
Executes migration pipelines, monitors them for various metrics and
health checks and helps the admin to stop or resume any pipeline at any
point using our job processing engine.
Deploys and monitors components in real-time using our automated
cluster management and monitoring utility.
Shows comprehensive stage-wise reports for migration, transformation,
registration and execution.
Intelligent Migration: Assess workloads automatically that includes
recommendations on a number of parameters for offloading.
Provides seamless connectivity from BI tools like Tableau, Qlikview, etc.,
allowing you to easily run Teradata or Oracle reports while migrating your
data.
The Impetus Data Warehouse Workload
Migration Tool
What it does
The Impetus Data Warehouse Workload Migration tool does the following:
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
prf_oralce_Ds
Migration
Validation
Execution
Impetus can help
Impetus Workload Migration provides an automated migration toolset
consisting of utilities that our team of experts or your in-house staff can use
to automate the migration and conversion of data for execution in the
Hadoop environment.
It also allows you to run data quality functions to standardize, cleanse and
de-dupe data. You can re-upload the processed data back to the source
EDW for reporting purposes if required.
We provide pre-built conversion logic for Teradata, Netezza, Oracle,
Microsoft SQL Server and IBM DB2 source data stores. Additionally,
Workload Migration includes a library of advanced data science machine
learning algorithms for solving difficult data quality challenges.