Data is one of the most important assets an organization has
because it defines each organization’s uniqueness.
Being a data-driven organization is not the final objective,
but it represents a crucial process in the innovation challenge.
Data integration will continue to remain an actual issue for complex and fast-growing companies that share datasets between vendors, partners and more and more connected customers. The need to integrate systems is not recent, but now, thanks to computational power and technology evolution, we can achieve this in real-time.
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Orchestrate data with agility and responsiveness. Learn how to manage a common data integration project
1. Orchestrate data with agility and
responsiveness.
Learn how to manage a common data
integration project
by SKENDER KOLLCAKU
Milan, 07/2017
keywords:
iPaaS, data integration, Talend, Salesforce, data-driven, use case, migration, cloud computing, SaaS, CRM, database,
real-time, open-source, java, professional services, on-premise, mainframe, data quality, hybrid, repository, metadata,
reusable job, data validation, bi-directional sync, design pattern, agile, business, ETL, project management, customer,
2. The scenario: Manage a typical data
integration project
Consider the following business requirements:
Manage successfully and keep on track the project considering budget, cost,
time and stakeholders’concerns.
(1) Provide a customers data migration from a mainframe to a Cloud SaaS
CRM (Salesforce: https://www.salesforce.com) respecting Address
format/values according to some business requirements
(2) Set up a bi-directional integration between two systems
(3) Identify what added values the integration and data-driven culture make
available
"Orchestrate data with agility and
responsiveness" - by Skender Kollcaku
2
3. Agenda
Data availability, iPaaS and why data-driven culture is the new norm for
organizations
Data asset requires Governance, but also Agility and Responsiveness
Define a roadmap to manage and close successfully the project (business
case)
How to identify business-related data and valuable Customers records
Talend (https://www.talend.com/) as the unified leader platform for the
solution
Data validation and initial load (migration as the pattern design)
Bi-directional synchronization to automate jobs in real-time
Added values and future implementations
"Orchestrate data with agility and
responsiveness" - by Skender Kollcaku
3
4. The importance of data availability
Data is one of the most important assets an organization
has because it defines each organization’s uniqueness.
Being a data-driven organization is not the final objective,
but it represents a crucial process
in the innovation challenge.
"Orchestrate data with agility and
responsiveness" - by Skender Kollcaku
4
5. Data requires Governance, but also Agility and
Responsiveness
"Orchestrate data with agility and
responsiveness" - by Skender Kollcaku
5
Collaborate
in an open
manner
BE AGILE AND
ADAPT TO
CHANGES
Agility
Start with
business-
related data
FAST TIME TO
MARKET
Share process
to engage
Inspire
through
Talend
SHARE,
DEMOCRATIZE
AND INSPIRE FOR
THE FUTURE
Short and
fast deliveries
6. 3-steps Project plan starts comunicating with
the stakeholders
(1) Comunicate
with decision-
making players
(2) Identify
candidate data for
business-related
value
(3) Model and
implement design
pattern for the
specific process
"Orchestrate data with agility and
responsiveness" - by Skender Kollcaku
6
7. Talend is the leading open source integration
software provider to data-driven enterprises
"Orchestrate data with agility and
responsiveness" - by Skender Kollcaku
7
Open-Source leader
Eclipse-similar IDE
unified platform
Java-based code
generator
Visual job design
Graphical business
process modeller
(100% graphical)
Smart product
subscription
Big Data native in
real-time
Reusable metadata
elements
+1000 built-in
drag & drop
connectors and
components
8. Determine customers containing business-
related value
"Orchestrate data with agility and
responsiveness" - by Skender Kollcaku
8
Prospects/Leads (potential Customers)
Filter by fastest closed deals
Particular industry (life science, manufacturing or finance...)
Recent closed deals (filter by time range)
Largest revenue generated streams
Interested geographical area
9. (1) Project phase: initial load prior ETL
operations
Once available the input flat files from the mainframe, the ETL (Extract, Transform and Load)
operations to be executed could be the following:
Cleanse
Validate
Format
Unify
Standardize
"Orchestrate data with agility and
responsiveness" - by Skender Kollcaku
9
pull data from MF
cleanse, validate,
format
unify or standardize
provision DB schema
compatibility
upload into SaaS
CRM
10. Data quality includes data validation
"Orchestrate data with agility and
responsiveness" - by Skender Kollcaku
10
DATAVALIDATION
NULL HANDLING
STRING HANDLING
DATE HANDLING
THIRD-PARTY
VALIDATION LIBRARIES
Talend Data Preparation self-service free tool
11. Business process model definition before we
start implementing the job
"Orchestrate data with agility and
responsiveness" - by Skender Kollcaku
11
Use of Talend DI canvas to model the business process. Flow of data will satisfy the
following business requirement: only matched/validated Customers address records will
be loaded into the SaaS CRM.
12. Use Talend to set up the data migration between
On-Premise input files and target SaaS CRM object
(Account in Salesforce)
"Orchestrate data with agility and
responsiveness" - by Skender Kollcaku
12
prior to Address
validation
13. Simplified job which uses tMap “magical”
component to validate Address
"Orchestrate data with agility and
responsiveness" - by Skender Kollcaku
13
Simplified job which uses tMap component to validate Customer address.
The output are (1) loaded into Salesforce Account object as records and (2) rejected
Customers with invalid addresses in an Excel spreadsheet for future analysis
14. (2) Project phase: bi-directional
synchronization between mainframe and SaaS
"Orchestrate data with agility and
responsiveness" - by Skender Kollcaku
14
Talend built-in component tSalesforceGetUpdated_1 used for tracking changes (update,
insert, upsert) in the Salesforce Account object and propagate them in real-time to a DB2
mainframe’s table. This component can work in background given a past Start and End
time range.
Another mechanism is the CDC (Change Data Capture).
15. Bi-directional integration means real-time
synchronization between the two databases
There are some key issues to consider:
How similar are the schemas of the databases to be kept in sync (this helps for
eventual JOIN operations)?
How often do the databases need to be synched (performance query…)?
How will we resolve situations in which the same data has been modified in both
of databases since the last sync session (conflict based on the “record owner” or
“last modified” solution to be described)?
How much effort and/or money are we willing to invest in developing our sync
system (“keep project budget on track”)?
"Orchestrate data with agility and
responsiveness" - by Skender Kollcaku
15
16. (3) Added values: technical perspective
External lookup with any other data sources (supply chain, e-commerce, BI
(analysis of ROIs, deals/opportunities), DW, Marketing, social networks
activity/engagement, distributed and cross-platform applications… )
Reusable jobs, thanks to repository metadata
Versioning of the Java generated code (Github, Maven…)
Statistical reports about job execution (performance)
Other applications can trigger the job (example: collecting data for reports
and dashboards…)
Unified and scalable integration platform (Data Preparation, DI, Cloud
integration, ESB, MDM, Big Data, Fabric…)
"Orchestrate data with agility and
responsiveness" - by Skender Kollcaku
16
17. (3) Added values: business perspective
Give real value to the data asset (“enable data-driven organizations”)
Support for decisions (“how to use the information obtained?”) and provide
them in advance (apply automatically and review rules regularly)
Remove data management risk when modernizing systems
Consolidate applications
Smooth subscription model (start with free open-source tool and then
upgrade in a predictable fashion depending on business needs – pay only for
the number of developers…)
Optimize processes by keeping comprehensive, relevant and consistent data
everywhere.
Deliveries in real-time and analytics prediction!
Big Data native suite of products
"Orchestrate data with agility and
responsiveness" - by Skender Kollcaku
17