SlideShare a Scribd company logo
1 of 6
Download to read offline
Scope of Data Integration
Data
1
integration involves combining data residing in different sources and providing
users with a unified view of data. This process becomes significant in a variety of
situations, which include both commercial (when two similar companies need to merge
their databases) and scientific (combining research results from
different bioinformatics repositories) domains. Data integration has increased as the
volume and the need to share existing data exploded. It has become the focus of
extensive theoretical work and numerous problems still remain unsolved.
Traditionally, data integration has meant compromise. No matter how rapidly data
architects and developers could complete a project before its deadline, speed would
always come at the expense of quality. On the other hand, if they focused on delivering a
quality project, it would generally drag on for months thus exceeding its deadline.
Finally, if the teams concentrated on both quality and rapid delivery, the costs would
invariably exceed the budget. Regardless of which path you chose, the end result would
be less than desirable. This led some experts to revisit the scope of data integration. This
write up shall focus on the same issue.
1
Source Link: https://en.wikipedia.org/wiki/Data_integration
Why is data integration so difficult?
Even after years of creating data warehouses and data infrastructures, IT teams have
continued to struggle with high costs, delays and suboptimal results of traditional data
integration. With the introduction of new data sources and data types, data management
professionals are not making any concrete progress. Rising business demands are another
factor. Following are some of the main causes for data inefficiency:
Data integration is a time consuming process owing to lack of reusability of various data
patterns.
Integration costs are tremendous.
Low quality of data results in untrustworthy data.
Scalability is the biggest issue in traditional data systems as the data volumes keep
increasing each day.
Lack of real-time technologies results in data not updated regularly.
These problems and challenges are all related to the reality that data has become more
fragmented while data integration has grown more complex, costly and inflexible.
Depending on the data integration requirements, the data must first be extracted from the
various sources, then it is generally filtered, aggregated, summarized or transformed in
some way and finally delivered to the destination user or system.
Dramatic changes in data volume, variety and velocity make the traditional approach to
data integration inadequate and requires one to evolve to next-generation techniques in
order to unlock the potential of data.
For any data integration project one has to have a good understanding of the following:
Data models within each data source
Mappings between the source and destination data models
Data contracts mandated by the destination system
Data transformations required
Next Generation Data Integration
Next-generation data integration technologies are designed mostly to support an extended
team that includes data, integration and enterprise architects, as well as data analysts and
data stewards, while better aligning with business users.
Most companies use a mixture of in-house developed systems, 3rd party “off the shelf”
and cloud hosted systems. In order to get the most value from the data within these
systems it must be consolidated in some way. For example, the data within a cloud hosted
system could be combined with the data in an in-house system to deliver a new product
or service. Mergers and acquisitions are also big drivers of data integration. Data from
multiple systems must be consolidated so that the various companies involved in the
merger can work together very effectively.
Some rules for next generation data integration are as follows
2
:
Data Integration ("DI") is a family of techniques. Some data management professionals
still think of DI as merely ETL tools for data warehousing or data replication utilities for
database administration. Those use cases are still prominent, as we’ll see when we
discuss TDWI survey data. Yet, DI practices and tools have broadened into a dozen or
more techniques and use cases.
DI techniques may be hand coded, based on a vendor’s tool, or both. TDWI survey data
shows that migrating from hand coding to using a vendor DI tool is one of the strongest
trends as organizations move into the next generation. A common best practice is to use a
DI tool for most solutions, but augment it with hand coding for functions missing from
the tool.
DI practices reach across both analytics and operations. DI is not just for Data
Warehousing ("DW"). Nor is it just for operational Database Administration ("DBA"). It
now has many use cases spanning across many analytic and operational contexts and
expanding beyond DW and DBA work is one of the most prominent generational
changes for DI.
DI is an autonomous discipline. Nowadays, there’s so much DI work to be done that DI
teams with 13 or more specialists are the norm; some teams have more than 100. The
diversity of DI work has broadened, too. Due to this growth, a prominent generational
decision is whether to staff and fund DI as is, or to set up an independent team or
competency center for DI.
DI is absorbing other data management disciplines. The obvious example is DI and Data
Quality ("DQ"), which many users staff with one team and implement on one unified
vendor platform. A generational decision is whether the same team and platform should
also support master data management, replication, data sync, event processing and data
federation.
DI has become broadly collaborative. The larger number of DI specialists requires local
collaboration among DI team members, as well as global collaboration with other data
management disciplines, including those mentioned in the previous rule, plus teams for
message/service buses, database administration and operational applications.
DI needs diverse development methodologies. A number of pressures are driving
generational changes in DI development strategies, including increased team size,
2
Source Link: ftp://public.dhe.ibm.com/software/data/.../TDWI_NextGenerationDataIntegration.pdf
operational versus analytic DI projects, greater interoperability with other data
management technologies and the need to produce solutions in a more lean and agile
manner.
DI requires a wide range of interfaces. That’s because DI can access a wide range of
source and target IT systems in a variety of information delivery speeds and frequencies.
This includes traditional interfaces (native database connectors, ODBC, JDBC, FTP,
APIs, bulk loaders) and newer ones (Web services, SOA and data services). The new
ones are critical to next generation requirements for real time and services. Furthermore,
as many organizations extend their DI infrastructure, DI interfaces need to access data
on-premises, in public and private clouds and at partner and customer sites.
DI must scale. Architectures designed by users and servers built by vendors need to scale
up and scale out to both burgeoning data volumes and increasingly complex processing,
while still providing high performance at scale. With volume and complexity exploding,
scalability is a critical success factor for future generations. Make it a top priority in your
plans.
10. DI requires architecture. It’s true that some DI tools impose an architecture
(usually hub and spoke), but DI developers still need to take control and design the
details. DI architecture is important because it strongly enables or inhibits other next
generation requirements for scalability, real time, high availability, server interoperability
and data services.
Some of the methods employed for next generation of data integration are as follows
3
:
Extract, Transform & Load ("ETL") Design:
ETL involves extraction of data from a source and transforming it in the desired data
model and finally loading the data into the destination data store. In short:
Extraction process involves extracting the source data into a staging data store which is
usually a separate schema within the destination database or a separate staging database.
Transform process generally involves re-modelling the data, normalizing is structure, unit
conversion, formatting and the resolution of primary keys.
Load process is quite simple in the case of a once off data migration but can be quite
complex in the case of continuous data integration where the data must be merged
(inserts, updates & deletes), possibly over a defined date range, there may also be a
requirement to maintain a history of changes.
3
Source Link: http://iasaglobal.org/itabok/capability-descriptions/data-integration/
Enterprise Application Integration Design: While the ETL process is about consolidating
data from many sources into one, Enterprise Application Integration, or EAI, is about
distributing data between two or more systems. Data exchanged using EAI is often
transactional and related to an event in a business process or the distribution of master
data. EAI includes message broker and Enterprise Service Bus
("ESB") technologies.
Data Virtualization & Data Federation: Data virtualization is the process of offering data
consumers a data access interface which hides the technical aspects of stored data, such
as location, storage structure, access language and storage technology. Data federation is
a form of data virtualization where the data stored in a heterogeneous set of autonomous
data stores is made accessible to data consumers as one integrated data store by using on-
demand data integration.
Hexanika: Implementation of Data Integration in the form of ETL
Process
Hexanika is a FinTech Big Data software company, which has developed a
revolutionary software named as Hexanika Solutions for financial institutions to address
data sourcing and reporting challenges for regulatory compliance. What it includes is
basically uploading the data, sanitizing it and then validating the data. Hexanika
Solutions can join ‘N’ different tables data sources and creates joins. These data checks
are applied on standardized data as per customer needs. Scalability isn’t an issue as the
platform used is Hadoop.
The ELT Process4
4
Image Credits: IBM
Hexanika leverages the power of ELT using distributed parallel processing, Big
Data/Hadoop technology with a secure data cloud (IBM Cloud). Understanding the high
implementation costs of new systems and the complexities involved in redesigning
existing solutions, Hexanika offers a unique build that adapts to existing architectures.
This makes our solution cost-effective, efficient, simple and smart!
Read more about our solution and architecture at: http://hexanika.com/big-data-
solution-architecture/
Also, do visit our website: www.hexanika.com
Feature image link: https://i.ytimg.com/vi/7TpG9w46i_A/maxresdefault.jpg
Author: Prakash Jalihal
Contributor: Akash Marathe
Contact Us
USA
249 East 48 Street,
New York, NY 10017
Tel: +1 646.733.6636
INDIA
Krupa Bungalow 1187/10,
Shivaji Nagar, Pune 411005
Tel: +91 9850686861
Email: info@hexanika.com
Follow Us

More Related Content

What's hot

DATA WAREHOUSE AND BIG DATA INTEGRATION
DATA WAREHOUSE AND BIG DATA INTEGRATIONDATA WAREHOUSE AND BIG DATA INTEGRATION
DATA WAREHOUSE AND BIG DATA INTEGRATIONijcsit
 
Intro to big data and applications -day 3
Intro to big data and applications -day 3Intro to big data and applications -day 3
Intro to big data and applications -day 3Parviz Vakili
 
A Reference Process Model for Master Data Management
A Reference Process Model for Master Data ManagementA Reference Process Model for Master Data Management
A Reference Process Model for Master Data ManagementBoris Otto
 
Successful Hybrid Clouds Depend on Collaborative Business and IT Management
Successful Hybrid Clouds Depend on Collaborative Business and IT ManagementSuccessful Hybrid Clouds Depend on Collaborative Business and IT Management
Successful Hybrid Clouds Depend on Collaborative Business and IT ManagementVMware_EMEA
 
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with OktopusDenodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with OktopusDenodo
 
Designing a Framework to Standardize Data Warehouse Development Process for E...
Designing a Framework to Standardize Data Warehouse Development Process for E...Designing a Framework to Standardize Data Warehouse Development Process for E...
Designing a Framework to Standardize Data Warehouse Development Process for E...ijdms
 
CH10-Managing a Database
CH10-Managing a DatabaseCH10-Managing a Database
CH10-Managing a DatabaseSukanya Ben
 
An architacture for modular datacenter
An architacture for modular datacenterAn architacture for modular datacenter
An architacture for modular datacenterJunaid Kabir
 
ADVANCES IN HIGHER EDUCATIONAL RESOURCE SHARING AND CLOUD SERVICES FOR KSA
ADVANCES IN HIGHER EDUCATIONAL RESOURCE SHARING AND CLOUD SERVICES FOR KSAADVANCES IN HIGHER EDUCATIONAL RESOURCE SHARING AND CLOUD SERVICES FOR KSA
ADVANCES IN HIGHER EDUCATIONAL RESOURCE SHARING AND CLOUD SERVICES FOR KSAIJCSES Journal
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Edgar Alejandro Villegas
 
Paper Final Taube Bienert GridInterop 2012
Paper Final Taube Bienert GridInterop 2012Paper Final Taube Bienert GridInterop 2012
Paper Final Taube Bienert GridInterop 2012Bert Taube
 
How Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data WarehouseHow Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data Warehousemark madsen
 
Big data analysis model for transformation and effectiveness in the e governa...
Big data analysis model for transformation and effectiveness in the e governa...Big data analysis model for transformation and effectiveness in the e governa...
Big data analysis model for transformation and effectiveness in the e governa...IJARIIT
 

What's hot (17)

DATA WAREHOUSE AND BIG DATA INTEGRATION
DATA WAREHOUSE AND BIG DATA INTEGRATIONDATA WAREHOUSE AND BIG DATA INTEGRATION
DATA WAREHOUSE AND BIG DATA INTEGRATION
 
Today's Need To Manage The Storage Polymorphism
Today's Need To Manage The Storage PolymorphismToday's Need To Manage The Storage Polymorphism
Today's Need To Manage The Storage Polymorphism
 
Intro to big data and applications -day 3
Intro to big data and applications -day 3Intro to big data and applications -day 3
Intro to big data and applications -day 3
 
A Reference Process Model for Master Data Management
A Reference Process Model for Master Data ManagementA Reference Process Model for Master Data Management
A Reference Process Model for Master Data Management
 
Netmagic the-storage-matrix
Netmagic the-storage-matrixNetmagic the-storage-matrix
Netmagic the-storage-matrix
 
Successful Hybrid Clouds Depend on Collaborative Business and IT Management
Successful Hybrid Clouds Depend on Collaborative Business and IT ManagementSuccessful Hybrid Clouds Depend on Collaborative Business and IT Management
Successful Hybrid Clouds Depend on Collaborative Business and IT Management
 
Data Residency: Challenges and the Need for Standards
Data Residency: Challenges and the Need for StandardsData Residency: Challenges and the Need for Standards
Data Residency: Challenges and the Need for Standards
 
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with OktopusDenodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
 
Article 1
Article 1Article 1
Article 1
 
Designing a Framework to Standardize Data Warehouse Development Process for E...
Designing a Framework to Standardize Data Warehouse Development Process for E...Designing a Framework to Standardize Data Warehouse Development Process for E...
Designing a Framework to Standardize Data Warehouse Development Process for E...
 
CH10-Managing a Database
CH10-Managing a DatabaseCH10-Managing a Database
CH10-Managing a Database
 
An architacture for modular datacenter
An architacture for modular datacenterAn architacture for modular datacenter
An architacture for modular datacenter
 
ADVANCES IN HIGHER EDUCATIONAL RESOURCE SHARING AND CLOUD SERVICES FOR KSA
ADVANCES IN HIGHER EDUCATIONAL RESOURCE SHARING AND CLOUD SERVICES FOR KSAADVANCES IN HIGHER EDUCATIONAL RESOURCE SHARING AND CLOUD SERVICES FOR KSA
ADVANCES IN HIGHER EDUCATIONAL RESOURCE SHARING AND CLOUD SERVICES FOR KSA
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869
 
Paper Final Taube Bienert GridInterop 2012
Paper Final Taube Bienert GridInterop 2012Paper Final Taube Bienert GridInterop 2012
Paper Final Taube Bienert GridInterop 2012
 
How Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data WarehouseHow Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data Warehouse
 
Big data analysis model for transformation and effectiveness in the e governa...
Big data analysis model for transformation and effectiveness in the e governa...Big data analysis model for transformation and effectiveness in the e governa...
Big data analysis model for transformation and effectiveness in the e governa...
 

Viewers also liked

Chapter 5 -lesson_1__2
Chapter 5 -lesson_1__2Chapter 5 -lesson_1__2
Chapter 5 -lesson_1__2mreisenberg1
 
Sugar Rethink Your Drink
Sugar Rethink Your DrinkSugar Rethink Your Drink
Sugar Rethink Your DrinkCandida Khan
 
Calories in Beverages: Weight Loss Tips from Downsize Fitness
Calories in Beverages: Weight Loss Tips from Downsize Fitness  Calories in Beverages: Weight Loss Tips from Downsize Fitness
Calories in Beverages: Weight Loss Tips from Downsize Fitness Downsize Fitness
 
Why is Regulatory Reporting tough?
Why is Regulatory Reporting tough?Why is Regulatory Reporting tough?
Why is Regulatory Reporting tough?HEXANIKA
 

Viewers also liked (8)

Rethink Your Drink
Rethink Your DrinkRethink Your Drink
Rethink Your Drink
 
Chapter 5 -lesson_1__2
Chapter 5 -lesson_1__2Chapter 5 -lesson_1__2
Chapter 5 -lesson_1__2
 
Sugar Rethink Your Drink
Sugar Rethink Your DrinkSugar Rethink Your Drink
Sugar Rethink Your Drink
 
Calories in Beverages: Weight Loss Tips from Downsize Fitness
Calories in Beverages: Weight Loss Tips from Downsize Fitness  Calories in Beverages: Weight Loss Tips from Downsize Fitness
Calories in Beverages: Weight Loss Tips from Downsize Fitness
 
Why is Regulatory Reporting tough?
Why is Regulatory Reporting tough?Why is Regulatory Reporting tough?
Why is Regulatory Reporting tough?
 
Feed your brain
Feed your brainFeed your brain
Feed your brain
 
HMC- Rethink Your Drink (8 minutes)
HMC- Rethink Your Drink (8 minutes)HMC- Rethink Your Drink (8 minutes)
HMC- Rethink Your Drink (8 minutes)
 
Dehydration
DehydrationDehydration
Dehydration
 

Similar to Scope of Data Integration

Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
data-mesh_whitepaper_dec2021.pdf
data-mesh_whitepaper_dec2021.pdfdata-mesh_whitepaper_dec2021.pdf
data-mesh_whitepaper_dec2021.pdfssuser18927d
 
BRIDGING DATA SILOS USING BIG DATA INTEGRATION
BRIDGING DATA SILOS USING BIG DATA INTEGRATIONBRIDGING DATA SILOS USING BIG DATA INTEGRATION
BRIDGING DATA SILOS USING BIG DATA INTEGRATIONijmnct
 
Rapidly Enable Tangible Business Value through Data Virtualization
Rapidly Enable Tangible Business Value through Data VirtualizationRapidly Enable Tangible Business Value through Data Virtualization
Rapidly Enable Tangible Business Value through Data VirtualizationDenodo
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Comparing and analyzing various method of data integration in big data
Comparing and analyzing various method of data integration in big dataComparing and analyzing various method of data integration in big data
Comparing and analyzing various method of data integration in big dataIRJET Journal
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloudredmondpulver
 
Week 4 Lecture 1 - Databases and Data WarehousesManagement of .docx
Week 4 Lecture 1 - Databases and Data WarehousesManagement of .docxWeek 4 Lecture 1 - Databases and Data WarehousesManagement of .docx
Week 4 Lecture 1 - Databases and Data WarehousesManagement of .docxjessiehampson
 
Data Integration, Interoperability and Virtualization
Data Integration, Interoperability and VirtualizationData Integration, Interoperability and Virtualization
Data Integration, Interoperability and VirtualizationJavier Ramírez
 
Data Migration in Malta and Libya
Data Migration in Malta and LibyaData Migration in Malta and Libya
Data Migration in Malta and LibyaData Tech
 
Running head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docxRunning head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docxhealdkathaleen
 
Running head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docxRunning head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docxtodd271
 
The #1 Success Factor for Data Migration Projects
The #1 Success Factor for Data Migration ProjectsThe #1 Success Factor for Data Migration Projects
The #1 Success Factor for Data Migration ProjectsRoque Daudt, BCompSc
 
A Special Report on Infrastructure Futures: Keeping Pace in the Era of Big Da...
A Special Report on Infrastructure Futures: Keeping Pace in the Era of Big Da...A Special Report on Infrastructure Futures: Keeping Pace in the Era of Big Da...
A Special Report on Infrastructure Futures: Keeping Pace in the Era of Big Da...IBM India Smarter Computing
 
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...Denodo
 
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Denodo
 

Similar to Scope of Data Integration (20)

Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
data-mesh_whitepaper_dec2021.pdf
data-mesh_whitepaper_dec2021.pdfdata-mesh_whitepaper_dec2021.pdf
data-mesh_whitepaper_dec2021.pdf
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
BRIDGING DATA SILOS USING BIG DATA INTEGRATION
BRIDGING DATA SILOS USING BIG DATA INTEGRATIONBRIDGING DATA SILOS USING BIG DATA INTEGRATION
BRIDGING DATA SILOS USING BIG DATA INTEGRATION
 
Data virtualization
Data virtualizationData virtualization
Data virtualization
 
Rapidly Enable Tangible Business Value through Data Virtualization
Rapidly Enable Tangible Business Value through Data VirtualizationRapidly Enable Tangible Business Value through Data Virtualization
Rapidly Enable Tangible Business Value through Data Virtualization
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Comparing and analyzing various method of data integration in big data
Comparing and analyzing various method of data integration in big dataComparing and analyzing various method of data integration in big data
Comparing and analyzing various method of data integration in big data
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloud
 
Week 4 Lecture 1 - Databases and Data WarehousesManagement of .docx
Week 4 Lecture 1 - Databases and Data WarehousesManagement of .docxWeek 4 Lecture 1 - Databases and Data WarehousesManagement of .docx
Week 4 Lecture 1 - Databases and Data WarehousesManagement of .docx
 
Security for Big Data
Security for Big DataSecurity for Big Data
Security for Big Data
 
Data Integration, Interoperability and Virtualization
Data Integration, Interoperability and VirtualizationData Integration, Interoperability and Virtualization
Data Integration, Interoperability and Virtualization
 
Data Migration in Malta and Libya
Data Migration in Malta and LibyaData Migration in Malta and Libya
Data Migration in Malta and Libya
 
Running head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docxRunning head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docx
 
Running head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docxRunning head Database and Data Warehousing design1Database and.docx
Running head Database and Data Warehousing design1Database and.docx
 
The #1 Success Factor for Data Migration Projects
The #1 Success Factor for Data Migration ProjectsThe #1 Success Factor for Data Migration Projects
The #1 Success Factor for Data Migration Projects
 
A Special Report on Infrastructure Futures: Keeping Pace in the Era of Big Da...
A Special Report on Infrastructure Futures: Keeping Pace in the Era of Big Da...A Special Report on Infrastructure Futures: Keeping Pace in the Era of Big Da...
A Special Report on Infrastructure Futures: Keeping Pace in the Era of Big Da...
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...
 
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
 

More from HEXANIKA

How Big Data helps banks know their customers better
How Big Data helps banks know their customers betterHow Big Data helps banks know their customers better
How Big Data helps banks know their customers betterHEXANIKA
 
Sandbox in Financial Services
Sandbox in Financial ServicesSandbox in Financial Services
Sandbox in Financial ServicesHEXANIKA
 
High regulatory costs for small and mid sized banks
High regulatory costs for small and mid sized banksHigh regulatory costs for small and mid sized banks
High regulatory costs for small and mid sized banksHEXANIKA
 
Automation in Banking
Automation in BankingAutomation in Banking
Automation in BankingHEXANIKA
 
Regulatory Pain Points For Small And Medium Sized Banks
Regulatory Pain Points For Small And Medium Sized BanksRegulatory Pain Points For Small And Medium Sized Banks
Regulatory Pain Points For Small And Medium Sized BanksHEXANIKA
 
Understanding SAR (Suspicious Activity Reporting)
Understanding SAR (Suspicious Activity Reporting)Understanding SAR (Suspicious Activity Reporting)
Understanding SAR (Suspicious Activity Reporting)HEXANIKA
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?HEXANIKA
 
History of Big Data
History of Big DataHistory of Big Data
History of Big DataHEXANIKA
 
FATCA: why is it so difficult even after so many years?
FATCA: why is it so difficult even after so many years?FATCA: why is it so difficult even after so many years?
FATCA: why is it so difficult even after so many years?HEXANIKA
 
The Volcker Rule: Its Implications and Aftereffects
The Volcker Rule: Its Implications and AftereffectsThe Volcker Rule: Its Implications and Aftereffects
The Volcker Rule: Its Implications and AftereffectsHEXANIKA
 
A summary of Solvency II Directives
A summary of Solvency II DirectivesA summary of Solvency II Directives
A summary of Solvency II DirectivesHEXANIKA
 
A Review of BCBS 239: Helping banks stay compliant
A Review of BCBS 239: Helping banks stay compliantA Review of BCBS 239: Helping banks stay compliant
A Review of BCBS 239: Helping banks stay compliantHEXANIKA
 
Dodd-Frank's Impact on Regulatory Reporting
Dodd-Frank's Impact on Regulatory ReportingDodd-Frank's Impact on Regulatory Reporting
Dodd-Frank's Impact on Regulatory ReportingHEXANIKA
 
Regulatory impact on small and midsize banks
Regulatory impact on small and midsize banksRegulatory impact on small and midsize banks
Regulatory impact on small and midsize banksHEXANIKA
 

More from HEXANIKA (14)

How Big Data helps banks know their customers better
How Big Data helps banks know their customers betterHow Big Data helps banks know their customers better
How Big Data helps banks know their customers better
 
Sandbox in Financial Services
Sandbox in Financial ServicesSandbox in Financial Services
Sandbox in Financial Services
 
High regulatory costs for small and mid sized banks
High regulatory costs for small and mid sized banksHigh regulatory costs for small and mid sized banks
High regulatory costs for small and mid sized banks
 
Automation in Banking
Automation in BankingAutomation in Banking
Automation in Banking
 
Regulatory Pain Points For Small And Medium Sized Banks
Regulatory Pain Points For Small And Medium Sized BanksRegulatory Pain Points For Small And Medium Sized Banks
Regulatory Pain Points For Small And Medium Sized Banks
 
Understanding SAR (Suspicious Activity Reporting)
Understanding SAR (Suspicious Activity Reporting)Understanding SAR (Suspicious Activity Reporting)
Understanding SAR (Suspicious Activity Reporting)
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?
 
History of Big Data
History of Big DataHistory of Big Data
History of Big Data
 
FATCA: why is it so difficult even after so many years?
FATCA: why is it so difficult even after so many years?FATCA: why is it so difficult even after so many years?
FATCA: why is it so difficult even after so many years?
 
The Volcker Rule: Its Implications and Aftereffects
The Volcker Rule: Its Implications and AftereffectsThe Volcker Rule: Its Implications and Aftereffects
The Volcker Rule: Its Implications and Aftereffects
 
A summary of Solvency II Directives
A summary of Solvency II DirectivesA summary of Solvency II Directives
A summary of Solvency II Directives
 
A Review of BCBS 239: Helping banks stay compliant
A Review of BCBS 239: Helping banks stay compliantA Review of BCBS 239: Helping banks stay compliant
A Review of BCBS 239: Helping banks stay compliant
 
Dodd-Frank's Impact on Regulatory Reporting
Dodd-Frank's Impact on Regulatory ReportingDodd-Frank's Impact on Regulatory Reporting
Dodd-Frank's Impact on Regulatory Reporting
 
Regulatory impact on small and midsize banks
Regulatory impact on small and midsize banksRegulatory impact on small and midsize banks
Regulatory impact on small and midsize banks
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 

Scope of Data Integration

  • 1. Scope of Data Integration Data 1 integration involves combining data residing in different sources and providing users with a unified view of data. This process becomes significant in a variety of situations, which include both commercial (when two similar companies need to merge their databases) and scientific (combining research results from different bioinformatics repositories) domains. Data integration has increased as the volume and the need to share existing data exploded. It has become the focus of extensive theoretical work and numerous problems still remain unsolved. Traditionally, data integration has meant compromise. No matter how rapidly data architects and developers could complete a project before its deadline, speed would always come at the expense of quality. On the other hand, if they focused on delivering a quality project, it would generally drag on for months thus exceeding its deadline. Finally, if the teams concentrated on both quality and rapid delivery, the costs would invariably exceed the budget. Regardless of which path you chose, the end result would be less than desirable. This led some experts to revisit the scope of data integration. This write up shall focus on the same issue. 1 Source Link: https://en.wikipedia.org/wiki/Data_integration
  • 2. Why is data integration so difficult? Even after years of creating data warehouses and data infrastructures, IT teams have continued to struggle with high costs, delays and suboptimal results of traditional data integration. With the introduction of new data sources and data types, data management professionals are not making any concrete progress. Rising business demands are another factor. Following are some of the main causes for data inefficiency: Data integration is a time consuming process owing to lack of reusability of various data patterns. Integration costs are tremendous. Low quality of data results in untrustworthy data. Scalability is the biggest issue in traditional data systems as the data volumes keep increasing each day. Lack of real-time technologies results in data not updated regularly. These problems and challenges are all related to the reality that data has become more fragmented while data integration has grown more complex, costly and inflexible. Depending on the data integration requirements, the data must first be extracted from the various sources, then it is generally filtered, aggregated, summarized or transformed in some way and finally delivered to the destination user or system. Dramatic changes in data volume, variety and velocity make the traditional approach to data integration inadequate and requires one to evolve to next-generation techniques in order to unlock the potential of data. For any data integration project one has to have a good understanding of the following: Data models within each data source Mappings between the source and destination data models Data contracts mandated by the destination system Data transformations required Next Generation Data Integration Next-generation data integration technologies are designed mostly to support an extended team that includes data, integration and enterprise architects, as well as data analysts and data stewards, while better aligning with business users. Most companies use a mixture of in-house developed systems, 3rd party “off the shelf” and cloud hosted systems. In order to get the most value from the data within these systems it must be consolidated in some way. For example, the data within a cloud hosted system could be combined with the data in an in-house system to deliver a new product
  • 3. or service. Mergers and acquisitions are also big drivers of data integration. Data from multiple systems must be consolidated so that the various companies involved in the merger can work together very effectively. Some rules for next generation data integration are as follows 2 : Data Integration ("DI") is a family of techniques. Some data management professionals still think of DI as merely ETL tools for data warehousing or data replication utilities for database administration. Those use cases are still prominent, as we’ll see when we discuss TDWI survey data. Yet, DI practices and tools have broadened into a dozen or more techniques and use cases. DI techniques may be hand coded, based on a vendor’s tool, or both. TDWI survey data shows that migrating from hand coding to using a vendor DI tool is one of the strongest trends as organizations move into the next generation. A common best practice is to use a DI tool for most solutions, but augment it with hand coding for functions missing from the tool. DI practices reach across both analytics and operations. DI is not just for Data Warehousing ("DW"). Nor is it just for operational Database Administration ("DBA"). It now has many use cases spanning across many analytic and operational contexts and expanding beyond DW and DBA work is one of the most prominent generational changes for DI. DI is an autonomous discipline. Nowadays, there’s so much DI work to be done that DI teams with 13 or more specialists are the norm; some teams have more than 100. The diversity of DI work has broadened, too. Due to this growth, a prominent generational decision is whether to staff and fund DI as is, or to set up an independent team or competency center for DI. DI is absorbing other data management disciplines. The obvious example is DI and Data Quality ("DQ"), which many users staff with one team and implement on one unified vendor platform. A generational decision is whether the same team and platform should also support master data management, replication, data sync, event processing and data federation. DI has become broadly collaborative. The larger number of DI specialists requires local collaboration among DI team members, as well as global collaboration with other data management disciplines, including those mentioned in the previous rule, plus teams for message/service buses, database administration and operational applications. DI needs diverse development methodologies. A number of pressures are driving generational changes in DI development strategies, including increased team size, 2 Source Link: ftp://public.dhe.ibm.com/software/data/.../TDWI_NextGenerationDataIntegration.pdf
  • 4. operational versus analytic DI projects, greater interoperability with other data management technologies and the need to produce solutions in a more lean and agile manner. DI requires a wide range of interfaces. That’s because DI can access a wide range of source and target IT systems in a variety of information delivery speeds and frequencies. This includes traditional interfaces (native database connectors, ODBC, JDBC, FTP, APIs, bulk loaders) and newer ones (Web services, SOA and data services). The new ones are critical to next generation requirements for real time and services. Furthermore, as many organizations extend their DI infrastructure, DI interfaces need to access data on-premises, in public and private clouds and at partner and customer sites. DI must scale. Architectures designed by users and servers built by vendors need to scale up and scale out to both burgeoning data volumes and increasingly complex processing, while still providing high performance at scale. With volume and complexity exploding, scalability is a critical success factor for future generations. Make it a top priority in your plans. 10. DI requires architecture. It’s true that some DI tools impose an architecture (usually hub and spoke), but DI developers still need to take control and design the details. DI architecture is important because it strongly enables or inhibits other next generation requirements for scalability, real time, high availability, server interoperability and data services. Some of the methods employed for next generation of data integration are as follows 3 : Extract, Transform & Load ("ETL") Design: ETL involves extraction of data from a source and transforming it in the desired data model and finally loading the data into the destination data store. In short: Extraction process involves extracting the source data into a staging data store which is usually a separate schema within the destination database or a separate staging database. Transform process generally involves re-modelling the data, normalizing is structure, unit conversion, formatting and the resolution of primary keys. Load process is quite simple in the case of a once off data migration but can be quite complex in the case of continuous data integration where the data must be merged (inserts, updates & deletes), possibly over a defined date range, there may also be a requirement to maintain a history of changes. 3 Source Link: http://iasaglobal.org/itabok/capability-descriptions/data-integration/
  • 5. Enterprise Application Integration Design: While the ETL process is about consolidating data from many sources into one, Enterprise Application Integration, or EAI, is about distributing data between two or more systems. Data exchanged using EAI is often transactional and related to an event in a business process or the distribution of master data. EAI includes message broker and Enterprise Service Bus ("ESB") technologies. Data Virtualization & Data Federation: Data virtualization is the process of offering data consumers a data access interface which hides the technical aspects of stored data, such as location, storage structure, access language and storage technology. Data federation is a form of data virtualization where the data stored in a heterogeneous set of autonomous data stores is made accessible to data consumers as one integrated data store by using on- demand data integration. Hexanika: Implementation of Data Integration in the form of ETL Process Hexanika is a FinTech Big Data software company, which has developed a revolutionary software named as Hexanika Solutions for financial institutions to address data sourcing and reporting challenges for regulatory compliance. What it includes is basically uploading the data, sanitizing it and then validating the data. Hexanika Solutions can join ‘N’ different tables data sources and creates joins. These data checks are applied on standardized data as per customer needs. Scalability isn’t an issue as the platform used is Hadoop. The ELT Process4 4 Image Credits: IBM
  • 6. Hexanika leverages the power of ELT using distributed parallel processing, Big Data/Hadoop technology with a secure data cloud (IBM Cloud). Understanding the high implementation costs of new systems and the complexities involved in redesigning existing solutions, Hexanika offers a unique build that adapts to existing architectures. This makes our solution cost-effective, efficient, simple and smart! Read more about our solution and architecture at: http://hexanika.com/big-data- solution-architecture/ Also, do visit our website: www.hexanika.com Feature image link: https://i.ytimg.com/vi/7TpG9w46i_A/maxresdefault.jpg Author: Prakash Jalihal Contributor: Akash Marathe Contact Us USA 249 East 48 Street, New York, NY 10017 Tel: +1 646.733.6636 INDIA Krupa Bungalow 1187/10, Shivaji Nagar, Pune 411005 Tel: +91 9850686861 Email: info@hexanika.com Follow Us