These notes describe a generalised data integration architecture framework and set of capabilities.
With many organisations, data integration tends to have evolved over time with many solution-specific tactical approaches implemented. The consequence of this is that there is frequently a mixed, inconsistent data integration topography. Data integrations are often poorly understood, undocumented and difficult to support, maintain and enhance.
Data interoperability and solution interoperability are closely related – you cannot have effective solution interoperability without data interoperability.
Data integration has multiple meanings and multiple ways of being used such as:
- Integration in terms of handling data transfers, exchanges, requests for information using a variety of information movement technologies
- Integration in terms of migrating data from a source to a target system and/or loading data into a target system
- Integration in terms of aggregating data from multiple sources and creating one source, with possibly date and time dimensions added to the integrated data, for reporting and analytics
- Integration in terms of synchronising two data sources or regularly extracting data from one data sources to update a target
- Integration in terms of service orientation and API management to provide access to raw data or the results of processing
There are two aspects to data integration:
1. Operational Integration – allow data to move from one operational system and its data store to another
2. Analytic Integration – move data from operational systems and their data stores into a common structure for analysis
Data Profiling, Data Catalogs and Metadata HarmonisationAlan McSweeney
These notes discuss the related topics of Data Profiling, Data Catalogs and Metadata Harmonisation. It describes a detailed structure for data profiling activities. It identifies various open source and commercial tools and data profiling algorithms. Data profiling is a necessary pre-requisite activity in order to construct a data catalog. A data catalog makes an organisation’s data more discoverable. The data collected during data profiling forms the metadata contained in the data catalog. This assists with ensuring data quality. It is also a necessary activity for Master Data Management initiatives. These notes describe a metadata structure and provide details on metadata standards and sources.
This describes a conceptual model approach to designing an enterprise data fabric. This is the set of hardware and software infrastructure, tools and facilities to implement, administer, manage and operate data operations across the entire span of the data within the enterprise across all data activities including data acquisition, transformation, storage, distribution, integration, replication, availability, security, protection, disaster recovery, presentation, analytics, preservation, retention, backup, retrieval, archival, recall, deletion, monitoring, capacity planning across all data storage platforms enabling use by applications to meet the data needs of the enterprise.
The conceptual data fabric model represents a rich picture of the enterprise’s data context. It embodies an idealised and target data view.
Designing a data fabric enables the enterprise respond to and take advantage of key related data trends:
• Internal and External Digital Expectations
• Cloud Offerings and Services
• Data Regulations
• Analytics Capabilities
It enables the IT function demonstrate positive data leadership. It shows the IT function is able and willing to respond to business data needs. It allows the enterprise to meet data challenges
• More and more data of many different types
• Increasingly distributed platform landscape
• Compliance and regulation
• Newer data technologies
• Shadow IT where the IT function cannot deliver IT change and new data facilities quickly
It is concerned with the design an open and flexible data fabric that improves the responsiveness of the IT function and reduces shadow IT.
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
A solid data architecture is critical to the success of any data initiative. But what is meant by “data architecture”? Throughout the industry, there are many different “flavors” of data architecture, each with its own unique value and use cases for describing key aspects of the data landscape. Join this webinar to demystify the various architecture styles and understand how they can add value to your organization.
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
Enterprise Architecture (EA) provides a visual blueprint of the organization, and shows key interrelationships between data, process, applications, and more. By abstracting these assets in a graphical view, it’s possible to see key interrelationships, particularly as they relate to data and its business impact across the organization. Join us for a discussion on how Data Architecture is a key component of an overall Enterprise Architecture for enhanced business value and success.
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
Enterprise Architecture (EA) provides a visual blueprint of the organization, and shows key interrelationships between data, process, applications, and more. By abstracting these assets in a graphical view, it’s possible to see key interrelationships, particularly as they relate to data and its business impact across the organization. Join us for a discussion on how data architecture is a key component of an overall enterprise architecture for enhanced business value and success.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
The data architecture of solutions is frequently not given the attention it deserves or needs. Frequently, too little attention is paid to designing and specifying the data architecture within individual solutions and their constituent components. This is due to the behaviours of both solution architects ad data architects.
Solution architecture tends to concern itself with functional, technology and software components of the solution
Data architecture tends not to get involved with the data aspects of technology solutions, leaving a data architecture gap. Combined with the gap where data architecture tends not to get involved with the data aspects of technology solutions, there is also frequently a solution architecture data gap. Solution architecture also frequently omits the detail of data aspects of solutions leading to a solution data architecture gap. These gaps result in a data blind spot for the organisation.
Data architecture tends to concern itself with post-individual solutions. Data architecture needs to shift left into the domain of solutions and their data and more actively engage with the data dimensions of individual solutions. Data architecture can provide the lead in sealing these data gaps through a shift-left of its scope and activities as well providing standards and common data tooling for solution data architecture
The objective of data design for solutions is the same as that for overall solution design:
• To capture sufficient information to enable the solution design to be implemented
• To unambiguously define the data requirements of the solution and to confirm and agree those requirements with the target solution consumers
• To ensure that the implemented solution meets the requirements of the solution consumers and that no deviations have taken place during the solution implementation journey
Solution data architecture avoids problems with solution operation and use:
• Poor and inconsistent data quality
• Poor performance, throughput, response times and scalability
• Poorly designed data structures can lead to long data update times leading to long response times, affecting solution usability, loss of productivity and transaction abandonment
• Poor reporting and analysis
• Poor data integration
• Poor solution serviceability and maintainability
• Manual workarounds for data integration, data extract for reporting and analysis
Data-design-related solution problems frequently become evident and manifest themselves only after the solution goes live. The benefits of solution data architecture are not always evident initially.
Data Profiling, Data Catalogs and Metadata HarmonisationAlan McSweeney
These notes discuss the related topics of Data Profiling, Data Catalogs and Metadata Harmonisation. It describes a detailed structure for data profiling activities. It identifies various open source and commercial tools and data profiling algorithms. Data profiling is a necessary pre-requisite activity in order to construct a data catalog. A data catalog makes an organisation’s data more discoverable. The data collected during data profiling forms the metadata contained in the data catalog. This assists with ensuring data quality. It is also a necessary activity for Master Data Management initiatives. These notes describe a metadata structure and provide details on metadata standards and sources.
This describes a conceptual model approach to designing an enterprise data fabric. This is the set of hardware and software infrastructure, tools and facilities to implement, administer, manage and operate data operations across the entire span of the data within the enterprise across all data activities including data acquisition, transformation, storage, distribution, integration, replication, availability, security, protection, disaster recovery, presentation, analytics, preservation, retention, backup, retrieval, archival, recall, deletion, monitoring, capacity planning across all data storage platforms enabling use by applications to meet the data needs of the enterprise.
The conceptual data fabric model represents a rich picture of the enterprise’s data context. It embodies an idealised and target data view.
Designing a data fabric enables the enterprise respond to and take advantage of key related data trends:
• Internal and External Digital Expectations
• Cloud Offerings and Services
• Data Regulations
• Analytics Capabilities
It enables the IT function demonstrate positive data leadership. It shows the IT function is able and willing to respond to business data needs. It allows the enterprise to meet data challenges
• More and more data of many different types
• Increasingly distributed platform landscape
• Compliance and regulation
• Newer data technologies
• Shadow IT where the IT function cannot deliver IT change and new data facilities quickly
It is concerned with the design an open and flexible data fabric that improves the responsiveness of the IT function and reduces shadow IT.
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
A solid data architecture is critical to the success of any data initiative. But what is meant by “data architecture”? Throughout the industry, there are many different “flavors” of data architecture, each with its own unique value and use cases for describing key aspects of the data landscape. Join this webinar to demystify the various architecture styles and understand how they can add value to your organization.
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
Enterprise Architecture (EA) provides a visual blueprint of the organization, and shows key interrelationships between data, process, applications, and more. By abstracting these assets in a graphical view, it’s possible to see key interrelationships, particularly as they relate to data and its business impact across the organization. Join us for a discussion on how Data Architecture is a key component of an overall Enterprise Architecture for enhanced business value and success.
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
Enterprise Architecture (EA) provides a visual blueprint of the organization, and shows key interrelationships between data, process, applications, and more. By abstracting these assets in a graphical view, it’s possible to see key interrelationships, particularly as they relate to data and its business impact across the organization. Join us for a discussion on how data architecture is a key component of an overall enterprise architecture for enhanced business value and success.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
The data architecture of solutions is frequently not given the attention it deserves or needs. Frequently, too little attention is paid to designing and specifying the data architecture within individual solutions and their constituent components. This is due to the behaviours of both solution architects ad data architects.
Solution architecture tends to concern itself with functional, technology and software components of the solution
Data architecture tends not to get involved with the data aspects of technology solutions, leaving a data architecture gap. Combined with the gap where data architecture tends not to get involved with the data aspects of technology solutions, there is also frequently a solution architecture data gap. Solution architecture also frequently omits the detail of data aspects of solutions leading to a solution data architecture gap. These gaps result in a data blind spot for the organisation.
Data architecture tends to concern itself with post-individual solutions. Data architecture needs to shift left into the domain of solutions and their data and more actively engage with the data dimensions of individual solutions. Data architecture can provide the lead in sealing these data gaps through a shift-left of its scope and activities as well providing standards and common data tooling for solution data architecture
The objective of data design for solutions is the same as that for overall solution design:
• To capture sufficient information to enable the solution design to be implemented
• To unambiguously define the data requirements of the solution and to confirm and agree those requirements with the target solution consumers
• To ensure that the implemented solution meets the requirements of the solution consumers and that no deviations have taken place during the solution implementation journey
Solution data architecture avoids problems with solution operation and use:
• Poor and inconsistent data quality
• Poor performance, throughput, response times and scalability
• Poorly designed data structures can lead to long data update times leading to long response times, affecting solution usability, loss of productivity and transaction abandonment
• Poor reporting and analysis
• Poor data integration
• Poor solution serviceability and maintainability
• Manual workarounds for data integration, data extract for reporting and analysis
Data-design-related solution problems frequently become evident and manifest themselves only after the solution goes live. The benefits of solution data architecture are not always evident initially.
Data Catalog for Better Data Discovery and GovernanceDenodo
Watch full webinar here: https://buff.ly/2Vq9FR0
Data catalogs are en vogue answering critical data governance questions like “Where all does my data reside?” “What other entities are associated with my data?” “What are the definitions of the data fields?” and “Who accesses the data?” Data catalogs maintain the necessary business metadata to answer these questions and many more. But that’s not enough. For it to be useful, data catalogs need to deliver these answers to the business users right within the applications they use.
In this session, you will learn:
*How data catalogs enable enterprise-wide data governance regimes
*What key capability requirements should you expect in data catalogs
*How data virtualization combines dynamic data catalogs with delivery
Incorporating A DesignOps Approach Into Solution ArchitectureAlan McSweeney
Solution architecture and design is concerned with designing new (IT) solutions to resolve problems or address opportunities . In order to solve a problem, you need sufficient information to understand the problem. If you do not understand the scope of the required solution you cannot understand the risks associated with the implementation approach.
Getting the solution wrong can be very expensive. The DesignOps approach is a unified end-to-end view of solution delivery from initial concept to steady state operations. It is a design-to-operations approach identifying all the solution design elements needed to ensure the delivery of a complete solution.
Solution architecture and design teams are becoming larger so more co-ordination, standardisation and management is required. The increasing focus on digital transformation increases the need for improved design as business applications are exposed outside the organisation. Solution complexity is increasing. The aim of the DesignOps approach is to improve solution design outcomes.
Improving Data Literacy Around Data ArchitectureDATAVERSITY
Data Literacy is an increasing concern, as organizations look to become more data-driven. As the rise of the citizen data scientist and self-service data analytics becomes increasingly common, the need for business users to understand core Data Management fundamentals is more important than ever. At the same time, technical roles need a strong foundation in Data Architecture principles and best practices. Join this webinar to understand the key components of Data Literacy, and practical ways to implement a Data Literacy program in your organization.
Enterprise Business Analysis Capability - Strategic Asset for Business Alignm...Alan McSweeney
Introducing the concept of Enterprise Business Analysis as a strategic resource to achieve business and IT alignment. Alignment means being able to draw a straight Line from business strategy through to delivered and operational solutions implemented to respond to businessn. Business and IT Alignment requires more than just relationship management – it requires actual engagement by IT with the needs of the business.
Data Architecture - The Foundation for Enterprise Architecture and GovernanceDATAVERSITY
Organizations are faced with an increasingly complex data landscape, finding themselves unable to cope with exponentially increasing data volumes, compounded by additional regulatory requirements with increased fines for non-compliance. Enterprise architecture and data governance are often discussed at length, but often with different stakeholder audiences. This can result in complementary and sometimes conflicting initiatives rather than a focused, integrated approach. Data governance requires a solid data architecture foundation in order to support the pillars of enterprise architecture. In this session, IDERA’s Ron Huizenga will discuss a practical, integrated approach to effectively understand, define and implement an cohesive enterprise architecture and data governance discipline with integrated modeling and metadata management.
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
Organizations with governed metadata made available through their data catalog can answer questions their people have about the organization’s data. These organizations get more value from their data, protect their data better, gain improved ROI from data-centric projects and programs, and have more confidence in their most strategic data.
Join Bob Seiner for this lively webinar where he will talk about the value of a data catalog and how to build the use of the catalog into your stewards’ daily routines. Bob will share how the tool must be positioned for success and viewed as a must-have resource that is a steppingstone and catalyst to governed data across the organization.
A Workflow is an automated series of actions that produces a specified outcome. At certain points, it requires actions from users in the form of tasks.
Workflows help people collaborate on assets, automate processes, ...
A workflow is useful in case of:
- Asset approval
- Asset intake
- Issue management
- Escalation by default
- User on-boarding
The design of a workflow starts with defining a process definition. Collibra Data Governance Center uses the Activiti Workflow engine to manage its process definitions.
In this first lesson, I’ll show you how you have to set up the Activiti Workbench.
Tackling Data Quality problems requires more than a series of tactical, one-off improvement projects. By their nature, many Data Quality problems extend across and often beyond an organization. Addressing these issues requires a holistic architectural approach combining people, process, and technology. Join Nigel Turner and Donna Burbank as they provide practical ways to control Data Quality issues in your organization.
Enterprise Data Management Framework OverviewJohn Bao Vuu
A solid data management foundation to support big data analytics and more importantly a data-driven culture is necessary for today’s organizations.
A mature Data Management Program can reduce operational costs and enable rapid business growth and development. Data Management program must evolve to monetize data assets, deliver breakthrough innovation and help drive business strategies in new markets.
Solution Architecture and Solution AcquisitionAlan McSweeney
This describes a systematised and structured approach to solution acquisition or procurement that involves solution architecture from the start. This allows the true scope of both the required and subsequently acquired solution are therefore fully understood. By using such an approach, poor solution acquisition outcomes are avoided.
Solution architecture provides the structured approach to capturing all the cost contributors and knowing the true solution scope.
There is more packaged/product/service-based solution acquisition activity. There is an increasing trend of solutions hosted outside the organisation. Meanwhile solution acquisition outcomes are poor and getting worse.
Poor solution acquisition has long-term consequences and costs.
The to-be-acquired solution needs to operate in and co-exist with an existing solution topography and the solution acquisition process needs to be aware of and take account of this wider solution topography. Cloud-based or externally hosted and provided solutions do not eliminate the need for the solution to exist within the organisation solution topography.
Strategic misrepresentation in solution acquisition is the deliberate distortion or falsification of information relating to solution acquisition costs, complexity, required functionality, solution availability, resource availability, time to implement in order to get solution acquisition approval. Strategic misrepresentation is very real and its consequences can be very damaging.
Solution architecture has the skills and experience to define the real scope of the solution being acquired. An effective structured solution acquisition process, well-implemented and consistently applied, means dependable and repeatable solution acquisition and successful outcomes.
Glossaries, Dictionaries, and Catalogs Result in Data GovernanceDATAVERSITY
Data catalogs, business glossaries, and data dictionaries house metadata that is important to your organization’s governance of data. People in your organization need to be engaged in leveraging the tools, understanding the data that is available, who is responsible for the data, and knowing how to get their hands on the data to perform their job function. The metadata will not govern itself.
Join Bob Seiner for the webinar where he will discuss how glossaries, dictionaries, and catalogs can result in effective Data Governance. People must have confidence in the metadata associated with the data that you need them to trust. Therefore, the metadata in your data catalog, business glossary, and data dictionary must result in governed data. Learn how glossaries, dictionaries, and catalogs can result in Data Governance in this webinar.
Bob will discuss the following subjects in this webinar:
- Successful Data Governance relies on value from very important tools
- What it means to govern your data catalog, business glossary, and data dictionary
- Why governing the metadata in these tools is important
- The roles necessary to govern these tools
- Governance expected from metadata in catalogs, glossaries, and dictionaries
This introduction to data governance presentation covers the inter-related DM foundational disciplines (Data Integration / DWH, Business Intelligence and Data Governance). Some of the pitfalls and success factors for data governance.
• IM Foundational Disciplines
• Cross-functional Workflow Exchange
• Key Objectives of the Data Governance Framework
• Components of a Data Governance Framework
• Key Roles in Data Governance
• Data Governance Committee (DGC)
• 4 Data Governance Policy Areas
• 3 Challenges to Implementing Data Governance
• Data Governance Success Factors
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
MDM, data quality, data architecture, and more. At the same time, combining these foundational data management approaches with other innovative techniques can help drive organizational change as well as technological transformation. This webinar will provide practical steps for creating a data foundation for effective digital transformation.
Wonder what this data mesh stuff is all about? What are the principles of data mesh? Can you or should you consider data mesh as the approach for your analytics platform? And most important - how can Snowflake help?
Given in Montreal on 14-Dec-2021
A Work of Zhamak Dehghani
Principal consultant
ThoughtWorks
https://martinfowler.com/articles/data-monolith-to-mesh.html
https://fast.wistia.net/embed/iframe/vys2juvzc3?videoFoam
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
Many enterprises are investing in their next generation data lake, with the hope of democratizing data at scale to provide business insights and ultimately make automated intelligent decisions. Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale. To address these failure modes we need to shift from the centralized paradigm of a lake, or its predecessor data warehouse. We need to shift to a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
Activate Data Governance Using the Data CatalogDATAVERSITY
Data Governance programs depend on the activation of data stewards that are held formally accountable for how they manage data. The data catalog is a critical tool to enable your stewards to contribute and interact with an inventory of metadata about the data definition, production, and usage. This interaction is active Data Governance in the truest sense of the word.
In this RWDG webinar, Bob Seiner will share tips and techniques focused on activating your data stewards through a data catalog. Data Governance programs that involve stewards in daily activities are more likely to demonstrate value from their data-intensive investments.
Bob will address the following in this webinar:
- A comparison of active and passive Data Governance
- What it means to have an active Data Governance program
- How a data catalog tool can be used to activate data stewards
- The role a data catalog plays in Data Governance
- The metadata in the data catalog will not govern itself
Describes what Enterprise Data Architecture in a Software Development Organization should cover and does that by listing over 200 data architecture related deliverables an Enterprise Data Architect should remember to evangelize.
Data Catalog for Better Data Discovery and GovernanceDenodo
Watch full webinar here: https://buff.ly/2Vq9FR0
Data catalogs are en vogue answering critical data governance questions like “Where all does my data reside?” “What other entities are associated with my data?” “What are the definitions of the data fields?” and “Who accesses the data?” Data catalogs maintain the necessary business metadata to answer these questions and many more. But that’s not enough. For it to be useful, data catalogs need to deliver these answers to the business users right within the applications they use.
In this session, you will learn:
*How data catalogs enable enterprise-wide data governance regimes
*What key capability requirements should you expect in data catalogs
*How data virtualization combines dynamic data catalogs with delivery
Incorporating A DesignOps Approach Into Solution ArchitectureAlan McSweeney
Solution architecture and design is concerned with designing new (IT) solutions to resolve problems or address opportunities . In order to solve a problem, you need sufficient information to understand the problem. If you do not understand the scope of the required solution you cannot understand the risks associated with the implementation approach.
Getting the solution wrong can be very expensive. The DesignOps approach is a unified end-to-end view of solution delivery from initial concept to steady state operations. It is a design-to-operations approach identifying all the solution design elements needed to ensure the delivery of a complete solution.
Solution architecture and design teams are becoming larger so more co-ordination, standardisation and management is required. The increasing focus on digital transformation increases the need for improved design as business applications are exposed outside the organisation. Solution complexity is increasing. The aim of the DesignOps approach is to improve solution design outcomes.
Improving Data Literacy Around Data ArchitectureDATAVERSITY
Data Literacy is an increasing concern, as organizations look to become more data-driven. As the rise of the citizen data scientist and self-service data analytics becomes increasingly common, the need for business users to understand core Data Management fundamentals is more important than ever. At the same time, technical roles need a strong foundation in Data Architecture principles and best practices. Join this webinar to understand the key components of Data Literacy, and practical ways to implement a Data Literacy program in your organization.
Enterprise Business Analysis Capability - Strategic Asset for Business Alignm...Alan McSweeney
Introducing the concept of Enterprise Business Analysis as a strategic resource to achieve business and IT alignment. Alignment means being able to draw a straight Line from business strategy through to delivered and operational solutions implemented to respond to businessn. Business and IT Alignment requires more than just relationship management – it requires actual engagement by IT with the needs of the business.
Data Architecture - The Foundation for Enterprise Architecture and GovernanceDATAVERSITY
Organizations are faced with an increasingly complex data landscape, finding themselves unable to cope with exponentially increasing data volumes, compounded by additional regulatory requirements with increased fines for non-compliance. Enterprise architecture and data governance are often discussed at length, but often with different stakeholder audiences. This can result in complementary and sometimes conflicting initiatives rather than a focused, integrated approach. Data governance requires a solid data architecture foundation in order to support the pillars of enterprise architecture. In this session, IDERA’s Ron Huizenga will discuss a practical, integrated approach to effectively understand, define and implement an cohesive enterprise architecture and data governance discipline with integrated modeling and metadata management.
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
Organizations with governed metadata made available through their data catalog can answer questions their people have about the organization’s data. These organizations get more value from their data, protect their data better, gain improved ROI from data-centric projects and programs, and have more confidence in their most strategic data.
Join Bob Seiner for this lively webinar where he will talk about the value of a data catalog and how to build the use of the catalog into your stewards’ daily routines. Bob will share how the tool must be positioned for success and viewed as a must-have resource that is a steppingstone and catalyst to governed data across the organization.
A Workflow is an automated series of actions that produces a specified outcome. At certain points, it requires actions from users in the form of tasks.
Workflows help people collaborate on assets, automate processes, ...
A workflow is useful in case of:
- Asset approval
- Asset intake
- Issue management
- Escalation by default
- User on-boarding
The design of a workflow starts with defining a process definition. Collibra Data Governance Center uses the Activiti Workflow engine to manage its process definitions.
In this first lesson, I’ll show you how you have to set up the Activiti Workbench.
Tackling Data Quality problems requires more than a series of tactical, one-off improvement projects. By their nature, many Data Quality problems extend across and often beyond an organization. Addressing these issues requires a holistic architectural approach combining people, process, and technology. Join Nigel Turner and Donna Burbank as they provide practical ways to control Data Quality issues in your organization.
Enterprise Data Management Framework OverviewJohn Bao Vuu
A solid data management foundation to support big data analytics and more importantly a data-driven culture is necessary for today’s organizations.
A mature Data Management Program can reduce operational costs and enable rapid business growth and development. Data Management program must evolve to monetize data assets, deliver breakthrough innovation and help drive business strategies in new markets.
Solution Architecture and Solution AcquisitionAlan McSweeney
This describes a systematised and structured approach to solution acquisition or procurement that involves solution architecture from the start. This allows the true scope of both the required and subsequently acquired solution are therefore fully understood. By using such an approach, poor solution acquisition outcomes are avoided.
Solution architecture provides the structured approach to capturing all the cost contributors and knowing the true solution scope.
There is more packaged/product/service-based solution acquisition activity. There is an increasing trend of solutions hosted outside the organisation. Meanwhile solution acquisition outcomes are poor and getting worse.
Poor solution acquisition has long-term consequences and costs.
The to-be-acquired solution needs to operate in and co-exist with an existing solution topography and the solution acquisition process needs to be aware of and take account of this wider solution topography. Cloud-based or externally hosted and provided solutions do not eliminate the need for the solution to exist within the organisation solution topography.
Strategic misrepresentation in solution acquisition is the deliberate distortion or falsification of information relating to solution acquisition costs, complexity, required functionality, solution availability, resource availability, time to implement in order to get solution acquisition approval. Strategic misrepresentation is very real and its consequences can be very damaging.
Solution architecture has the skills and experience to define the real scope of the solution being acquired. An effective structured solution acquisition process, well-implemented and consistently applied, means dependable and repeatable solution acquisition and successful outcomes.
Glossaries, Dictionaries, and Catalogs Result in Data GovernanceDATAVERSITY
Data catalogs, business glossaries, and data dictionaries house metadata that is important to your organization’s governance of data. People in your organization need to be engaged in leveraging the tools, understanding the data that is available, who is responsible for the data, and knowing how to get their hands on the data to perform their job function. The metadata will not govern itself.
Join Bob Seiner for the webinar where he will discuss how glossaries, dictionaries, and catalogs can result in effective Data Governance. People must have confidence in the metadata associated with the data that you need them to trust. Therefore, the metadata in your data catalog, business glossary, and data dictionary must result in governed data. Learn how glossaries, dictionaries, and catalogs can result in Data Governance in this webinar.
Bob will discuss the following subjects in this webinar:
- Successful Data Governance relies on value from very important tools
- What it means to govern your data catalog, business glossary, and data dictionary
- Why governing the metadata in these tools is important
- The roles necessary to govern these tools
- Governance expected from metadata in catalogs, glossaries, and dictionaries
This introduction to data governance presentation covers the inter-related DM foundational disciplines (Data Integration / DWH, Business Intelligence and Data Governance). Some of the pitfalls and success factors for data governance.
• IM Foundational Disciplines
• Cross-functional Workflow Exchange
• Key Objectives of the Data Governance Framework
• Components of a Data Governance Framework
• Key Roles in Data Governance
• Data Governance Committee (DGC)
• 4 Data Governance Policy Areas
• 3 Challenges to Implementing Data Governance
• Data Governance Success Factors
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
MDM, data quality, data architecture, and more. At the same time, combining these foundational data management approaches with other innovative techniques can help drive organizational change as well as technological transformation. This webinar will provide practical steps for creating a data foundation for effective digital transformation.
Wonder what this data mesh stuff is all about? What are the principles of data mesh? Can you or should you consider data mesh as the approach for your analytics platform? And most important - how can Snowflake help?
Given in Montreal on 14-Dec-2021
A Work of Zhamak Dehghani
Principal consultant
ThoughtWorks
https://martinfowler.com/articles/data-monolith-to-mesh.html
https://fast.wistia.net/embed/iframe/vys2juvzc3?videoFoam
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
Many enterprises are investing in their next generation data lake, with the hope of democratizing data at scale to provide business insights and ultimately make automated intelligent decisions. Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale. To address these failure modes we need to shift from the centralized paradigm of a lake, or its predecessor data warehouse. We need to shift to a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product.
Activate Data Governance Using the Data CatalogDATAVERSITY
Data Governance programs depend on the activation of data stewards that are held formally accountable for how they manage data. The data catalog is a critical tool to enable your stewards to contribute and interact with an inventory of metadata about the data definition, production, and usage. This interaction is active Data Governance in the truest sense of the word.
In this RWDG webinar, Bob Seiner will share tips and techniques focused on activating your data stewards through a data catalog. Data Governance programs that involve stewards in daily activities are more likely to demonstrate value from their data-intensive investments.
Bob will address the following in this webinar:
- A comparison of active and passive Data Governance
- What it means to have an active Data Governance program
- How a data catalog tool can be used to activate data stewards
- The role a data catalog plays in Data Governance
- The metadata in the data catalog will not govern itself
Describes what Enterprise Data Architecture in a Software Development Organization should cover and does that by listing over 200 data architecture related deliverables an Enterprise Data Architect should remember to evangelize.
Accelerate Cloud Migrations and Architecture with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3N46zxX
Cloud migration brings scalability and flexibility, and often reduced cost to organizations. But even after moving to the cloud, more often than not, organizational data can be found to be siloed, hard to access and lacking centralized governance. That leads to delay and often missed opportunities in value creation from enterprise data. Join Amit Mody, Senior Manager at Accenture, in this keynote session to learn why current physical data architectures are hindrance to value creation from data, what is a logical data fabric powered by data virtualization and how a logical data fabric can unlock the value creation potential for enterprises.
Leveraging AI and ML for efficient data integration.pdfChristopherTHyatt
Unlock unparalleled efficiency with AI and ML-powered data integration. Seamlessly fuse diverse datasets using advanced algorithms, automating processes for optimal operational performance. Harness insights, enhance decision-making, and propel your business into the future. Embrace the transformative synergy of AI and ML, redefining how organizations integrate, analyze, and leverage data for unparalleled success.
data collection, data integration, data management, data modeling.pptxSourabhkumar729579
it contains presentation of data collection, data integration, data management, data modeling.
it is made by sourabh kumar student of MCA from central university of haryana
Unlock the Power of Mainframe Data for Democratized Cloud AnalyticsPrecisely
The rapid rate at which new technologies for cloud-based computing are being developed and deployed – like big data analytics, the Internet of Things, machine learning and artificial intelligence - is truly astonishing. Looking further out into the future, the accelerating scale and speed of such advancements is compelling organizations to migrate the majority of their data and infrastructure to the cloud. Even the Mainframe.
It’s not that mainframe computing systems are going away any time soon. In fact, mainframe platforms are just as essential today for running global, enterprise-scale businesses as they have ever been. And despite years of pundits advocating otherwise, actually migrating off mainframe and onto cloud platforms has very rarely proven to be worth the cost and business risk involved.
But for today’s IT leaders tasked with moving enterprise IT operations forward to that integrated, future-proof, and democratized future, there is no avoiding making these changes while still running an efficient business. Organizations are turning to Precisely to help them maintain current operations while bringing their mainframe data to the cloud.
During this session we discuss:
Why it's imperative for IT leaders to unlock access to mainframe data in a cloud-first worldThe challenges and complexities of democratizing data with mainframe systemsHow to successfully integrate mainframe data into cloud-based environments
ADV Slides: Data Pipelines in the Enterprise and ComparisonDATAVERSITY
Despite the many, varied, and legitimate data platforms that exist today, data seldom lands once in its perfect spot for the long haul of usage. Data is continually on the move in an enterprise into new platforms, new applications, new algorithms, and new users. The need for data integration in the enterprise is at an all-time high.
Solutions that meet these criteria are often called data pipelines. These are designed to be used by business users, in addition to technology specialists, for rapid turnaround and agile needs. The field is often referred to as self-service data integration.
Although the stepwise Extraction-Transformation-Loading (ETL) remains a valid approach to integration, ELT, which uses the power of the database processes for transformation, is usually the preferred approach. The approach can often be schema-less and is frequently supported by the fast Apache Spark back-end engine, or something similar.
In this session, we look at the major data pipeline platforms. Data pipelines are well worth exploring for any enterprise data integration need, especially where your source and target are supported, and transformations are not required in the pipeline.
Data Virtualization for Compliance – Creating a Controlled Data EnvironmentDenodo
CIT modernized its data architecture in response to intense regulatory scrutiny. In this presentation, they present how data virtualization is being used to drive standardization, enable cross-company data integration, and serve as a common provisioning point from which to access all authoritative sources of data.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/CCqUeT.
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo
Watch full webinar here: https://bit.ly/3dudL6u
It's not if you move to the cloud, but when. Most organisations are well underway with migrating applications and data to the cloud. In fact, most organisations - whether they realise it or not - have a multi-cloud strategy. Single, hybrid, or multi-cloud…the potential benefits are huge - flexibility, agility, cost savings, scaling on-demand, etc. However, the challenges can be just as large and daunting. A poorly managed migration to the cloud can leave users frustrated at their inability to get to the data that they need and IT scrambling to cobble together a solution.
In this session, we will look at the challenges facing data management teams as they migrate to cloud and multi-cloud architectures. We will show how the Denodo Platform can:
- Reduce the risk and minimise the disruption of migrating to the cloud.
- Make it easier and quicker for users to find the data that they need - wherever it is located.
- Provide a uniform security layer that spans hybrid and multi-cloud environments.
A Logical Architecture is Always a Flexible Architecture (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3joZa0a
The current data landscape is fragmented, not just in location but also in terms of processing paradigms: data lakes, IoT architectures, NoSQL, and graph data stores, SaaS applications, etc. are found coexisting with relational databases to fuel the needs of modern analytics, ML, and AI. The physical consolidation of enterprise data into a central repository, although possible, is both expensive and time-consuming. A logical data warehouse is a modern data architecture that allows organizations to leverage all of their data irrespective of where the data is stored, what format it is stored in, and what technologies or protocols are used to store and access the data.
Watch this session to understand:
- What is a logical data warehouse and how to architect one
- The benefits of logical data warehouse – speed with agility
- Customer use case depicting logical architecture implementation
Solution Architecture and Solution Estimation.pdfAlan McSweeney
Solution architects and the solution architecture function are ideally placed to create solution delivery estimates
Solution architects have the knowledge and understanding of the solution constituent component and structure that is needed to create solution estimate:
• Knowledge of solution options
• Knowledge of solution component structure to define a solution breakdown structure
• Knowledge of available components and the options for reuse
• Knowledge of specific solution delivery constraints and standards that both control and restrain solution options
Accurate solution delivery estimates are need to understand the likely cost/resources/time/options needed to implement a new solution within the context of a range of solutions and solution options. These estimates are a key input to investment management and making effective decisions on the portfolio of solutions to implement. They enable informed decision-making as part of IT investment management.
An estimate is not a single value. It is a range of values depending on a number of conditional factors such level of knowledge, certainty, complexity and risk. The range will narrow as the level of knowledge and uncertainty decreases
There is no easy or magic way to create solution estimates. You have to engage with the complexity of the solution and its components. The more effort that is expended the more accurate the results of the estimation process will be. But there is always a need to create estimates (reasonably) quickly so a balance is needed between effort and quality of results.
The notes describe a structured solution estimation process and an associated template. They also describe the wider context of solution estimates in terms of IT investment and value management and control.
Validating COVID-19 Mortality Data and Deaths for Ireland March 2020 – March ...Alan McSweeney
This analysis seeks to validate published COVID-19 mortality statistics using mortality data derived from general mortality statistics, mortality estimated from population size and mortality rates and death notice data
Analysis of the Numbers of Catholic Clergy and Members of Religious in Irelan...Alan McSweeney
This analysis looks at the changes in the numbers of priests and nuns in Ireland for the years 1926 to 2016. It combines data from a range of sources to show the decline in the numbers of priests and nuns and their increasing age profile.
This analysis consists of the following sections:
• Summary - this highlights some of the salient points in the analysis.
• Overview of Analysis - this describes the approach taken in this analysis.
• Context – this provides background information on the number of Catholics in Ireland as a context to this analysis.
• Analysis of Census Data 1926 – 2016 - this analyses occupation age profile data for priests and nuns. It also includes sample projections on the numbers of priests and nuns.
• Analysis of Catholic Religious Mortality 2014-2021 - this analyses death notice data from RIP.ie to shows the numbers of priests and nuns that have died in the years 2014 to 2021. It also looks at deaths of Irish priests and nuns outside Ireland and at the numbers of countries where Irish priests and nuns have worked.
• Analysis of Data on Catholic Clergy From Other Sources - this analyses data on priests and nuns from other sources.
• Notes on Data Sources and Data Processing - this lists the data sources used in this analysis.
IT Architecture’s Role In Solving Technical Debt.pdfAlan McSweeney
Technical debt is an overworked term without an effective and common agreed understanding of what exactly it is, what causes it, what are its consequences, how to assess it and what to do about it.
Technical debt is the sum of additional direct and indirect implementation and operational costs incurred and risks and vulnerabilities created because of sub-optimal solution design and delivery decisions.
Technical debt is the sum of all the consequences of all the circumventions, budget reduction, time pressure, lack of knowledge, manual workarounds, short-cuts, avoidance, poor design and delivery quality and decisions to remove elements from solution scope and failure to provide foundational and backbone solution infrastructure.
Technical debt leads to a negative feedback cycle with short solution lifespan, earlier solution replacement and short-term tactical remedial actions.
All the disciplines within IT architecture have a role to play in promoting an understanding of and in the identification of how to resolve technical debt. IT architecture can provide the leadership in both remediating existing technical debt and preventing future debt.
Failing to take a complete view of the technical debt within the organisation means problems and risks remained unrecognised and unaddressed. The real scope of the problem is substantially underestimated. Technical debt is always much more than poorly written software.
Technical debt can introduce security risks and vulnerabilities into the organisation’s solution landscape. Failure to address technical debt leaves exploitable security risks and vulnerabilities in place.
Shadow IT or ghost IT is a largely unrecognised source of technical debt including security risks and vulnerabilities. Shadow IT is the consequence of a set of reactions by business functions to an actual or perceived inability or unwillingness of the IT function to respond to business needs for IT solutions. Shadow IT is frequently needed to make up for gaps in core business solutions, supplementing incomplete solutions and providing omitted functionality.
Solution Architecture And Solution SecurityAlan McSweeney
This describes an approach to embedding security within the technology solution landscape. It describes a security model that encompasses the range of individual solution components up to the entire solution landscape. The solution security model allows the security status of a solution and its constituent delivery and operational components to be tracked wherever those components are located. This provides an integrated approach to solution security across all solution components and across the entire organisation topology of solutions. It allows the solution architect to validate the security of an individual solution. It enables the security status of the entire solution landscape to be assessed and recorded. Solution security is a wicked problem because there is no certainly about when the problem has been resolved and a state of security has been achieved. The security state of a solution can just be expressed along a subjective spectrum of better or worse rather than a binary true or false. Solution security can have negative consequences: prevents types of access, limits availability in different ways, restricts functionality provided, makes solution harder to use, lengthens solution delivery times, increases costs along the entire solution lifecycle, leads to loss of usability, utility and rate of use.
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...Alan McSweeney
This paper describes how technologies such as data pseudonymisation and differential privacy technology enables access to sensitive data and unlocks data opportunities and value while ensuring compliance with data privacy legislation and regulations.
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...Alan McSweeney
Your data has value to your organisation and to relevant data sharing partners. It has been expensively obtained. It represents a valuable asset on which a return must be generated. To achieve the value inherent in the data you need to be able to make it appropriately available to others, both within and outside the organisation.
Organisations are frequently data rich and information poor, lacking the skills, experience and resources to convert raw data into value.
These notes outline technology approaches to achieving compliance with data privacy regulations and legislation while providing access to data.
There are different routes to making data accessible and shareable within and outside the organisation without compromising compliance with data protection legislation and regulations and removing the risk associated with allowing access to personal data:
• Differential Privacy – source data is summarised and individual personal references are removed. The one-to-one correspondence between original and transformed data has been removed
• Anonymisation – identifying data is destroyed and cannot be recovered so individual cannot be identified. There is still a one-to-one correspondence between original and transformed data
• Pseudonymisation – identifying data is encrypted and recovery data/token is stored securely elsewhere. There is still a one-to-one correspondence between original and transformed data
These technologies and approaches are not mutually exclusive – each is appropriate to differing data sharing and data access use cases
The data privacy regulatory and legislative landscape is complex and getting even more complex so an approach to data access and sharing that embeds compliance as a matter of course is required.
Appropriate technology appropriately implemented and operated is a means of managing and reducing risks of re-identification by making the time, skills, resources and money necessary to achieve this unrealistic.
Technology is part of a risk management approach to data privacy. There is wider operational data sharing and data privacy framework that includes technology aspects, among other key areas. Using these technologies will embed such compliance by design into your data sharing and access facilities. This will allow you to realise value from your data successfully.
Solution architects must be aware of the need for solution security and of the need to have enterprise-level controls that solutions can adopt.
The sets of components that comprise the extended solution landscape, including those components that provide common or shared functionality, are located in different zones, each with different security characteristics.
The functional and operational design of any solution and therefore its security will include many of these components, including those inherited by the solution or common components used by the solution.
The complete solution security view should refer explicitly to the components and their controls.
While each individual solution should be able to inherit the security controls provided by these components, the solution design should include explicit reference to them for completeness and to avoid unvalidated assumptions.
There is a common and generalised set of components, many of which are shared, within the wider solution topology that should be considered when assessing overall solution architecture and solution security.
Individual solutions must be able to inherit security controls, facilities and standards from common enterprise-level controls, standards, toolsets and frameworks.
Individual solutions must not be forced to implement individual infrastructural security facilities and controls. This is wasteful of solution implementation resources, results in multiple non-standard approaches to security and represents a security risk to the organisation.
The extended solution landscape potentially consists of a large number of interacting components and entities located in different zones, each with different security profiles, requirements and concerns. Different security concerns and therefore controls apply to each of these components.
Solution security is not covered by a single control. It involves multiple overlapping sets of controls providing layers of security.
Solution Architecture And (Robotic) Process Automation SolutionsAlan McSweeney
Automation is a technology trend IT architects should be aware of and know how to respond to business requests as well as recommend automation technologies and solutions where appropriate. Automation is a bigger topic than just RPA (Robotic Process Automation).
Automation solutions, like all other technology solutions, should be subject to an architecture and design process. There are many approaches to and options for the automation of business activities. Too often automation solutions are tactical applications layered over existing business systems
The objective of all IT solutions is to automate manual business processes and their activities to a certain extent. The requirement for RPA-type applications arises in part because of automation failures within existing applications or the need to automate the interactions with or integrations between separate, possibly legacy, applications.
One of the roles of IT architecture is to always seek to take the wider architectural view and to ensure that solutions are designed and delivered within a strategic framework to avoid, as much as is practical and realistic, short-term tactical solutions and approaches that lead to an accumulation of design, operations and support debt. Tactical solutions will always play a part in the organisation’s solution landscape.
The objective of these notes is to put automation into its wider and larger IT architecture context while accepting the need for tactical approaches in some instances.
These notes cover the following topics:
• Solution And Process Automation – The Wider Technology And Approach Landscape
• Business Processes, Business Solutions And Automation
• Organisation Process Model
• Strategic And Tactical Automation
• Deciding On The Scope Of Automation
• Digital Strategy, Digital Transformation And Automation
• Specifying The Automation Solution
• Business Process Model and Notation (BPMN)
• Sample Business Process – Order To Cash
• RPA (Robotic Process Automation)
Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...Alan McSweeney
This document compares published COVID-19 mortality statistics for Ireland with publicly available mortality data extracted from informal public data sources. This mortality data is taken from published death notices on the web site www.rip.ie. This is used a substitute for poor quality and long-delayed officially published mortality statistics.
Death notice information on the web site www.rip.ie is available immediately and contains information at a greater level of detail than published statistics. There is a substantial lag in officially published mortality data and the level of detail is very low. However, the extraction of death notice data and its conversion into a usable and accurate format requires a great deal of processing.
The objective of this analysis is to assess the accuracy of published COVID-19 mortality statistics by comparing trends in mortality over the years 2014 to 2020 with both numbers of deaths recorded from 2020 to 2021 and the COVID-19 statistics. It compares number of deaths for the seven 13-month intervals:
1. Mar 2014 - Mar 2015
2. Mar 2015 - Mar 2016
3. Mar 2016 - Mar 2017
4. Mar 2017 - Mar 2018
5. Mar 2018 - Mar 2019
6. Mar 2019 - Mar 2020
7. Mar 2020 - Mar 2021
It focuses on the seventh interval which is when COVID-19 deaths have occurred. It combines an analysis of mortality trends with details on COVID-19 deaths. This is a fairly simplistic analysis that looks to cross-check COVID-19 death statistics using data from other sources.
The subject of what constitutes a death from COVID-19 is controversial. This analysis is not concerned with addressing this controversy. It is concerned with comparing mortality data from a number of sources to identify potential discrepancies. It may be the case that while the total apparent excess number of deaths over an interval is less than the published number of COVID-19 deaths, the consequence of COVID-19 is to accelerate deaths that might have occurred later in the measurement interval.
Accurate data is needed to make informed decisions. Clearly there are issues with Irish COVID-19 mortality data. Accurate data is also needed to ensure public confidence in decision-making. Where this published data is inaccurate, this can lead of a loss of this confidence that can exploited.
Analysis of Decentralised, Distributed Decision-Making For Optimising Domesti...Alan McSweeney
This analysis looks at the potential impact that large numbers of electric vehicles could have on electricity demand, electricity generation capacity and on the electricity transmission and distribution grid in Ireland. It combines data from a number of sources – electricity usage patterns, vehicle usage patterns, electric vehicle current and possible future market share – to assess the potential impact of electric vehicles.
It then analyses a possible approach to electric vehicle charging where the domestic charging unit has some degree of decentralised intelligence and decision-making capability in deciding when to start vehicle charging to minimise electricity usage impact and optimise electricity generation usage.
The potential problem to be addressed is that if large numbers of electric cars are plugged-in and charging starts immediately when the drivers of those cars arrive home, the impact on demand for electricity will be substantial.
Operational Risk Management Data Validation ArchitectureAlan McSweeney
This describes a structured approach to validating data used to construct and use an operational risk model. It details an integrated approach to operational risk data involving three components:
1. Using the Open Group FAIR (Factor Analysis of Information Risk) risk taxonomy to create a risk data model that reflects the required data needed to assess operational risk
2. Using the DMBOK model to define a risk data capability framework to assess the quality and accuracy of risk data
3. Applying standard fault analysis approaches - Fault Tree Analysis (FTA) and Failure Mode and Effect Analysis (FMEA) - to the risk data capability framework to understand the possible causes of risk data failures within the risk model definition, operation and use
Ireland 2019 and 2020 Compared - Individual ChartsAlan McSweeney
This analysis compares some data areas - Economy, Crime, Aviation, Energy, Transport, Health, Mortality. Housing and Construction - for Ireland for the years 2019 and 2020, illustrating the changes that have occurred between the two years. It shows some of the impacts of COVID-19 and of actions taken in response to it, such as the various lockdowns and other restrictions.
The first lockdown clearly had major changes on many aspects of Irish society. The third lockdown which began at the end of the period analysed will have as great an impact as the first lockdown.
The consequences of the events and actions that have causes these impacts could be felt for some time into the future.
Analysis of Irish Mortality Using Public Data Sources 2014-2020Alan McSweeney
This describes the use of published death notices on the web site www.rip.ie as a substitute to officially published mortality statistics. This analysis uses data from RIP.ie for the years 2014 to 2020.
Death notice information is available immediately and contains information at a greater level of detail than published statistics. There is a substantial lag in officially published mortality data.
This analysis compares some data areas - Economy, Crime, Aviation, Energy, Transport, Health, Mortality. Housing and Construction - for Ireland for the years 2019 and 2020, illustrating the changes that have occurred between the two years. It shows some of the impacts of COVID-19 and of actions taken in response to it, such as the various lockdowns and other restrictions.
The first lockdown clearly had major changes on many aspects of Irish society. The third lockdown which began at the end of the period analysed will have as great an impact as the first lockdown.
The consequences of the events and actions that have causes these impacts could be felt for some time into the future.
Review of Information Technology Function Critical Capability ModelsAlan McSweeney
IT Function critical capabilities are key areas where the IT function needs to maintain significant levels of competence, skill and experience and practise in order to operate and deliver a service. There are several different IT capability frameworks. The objective of these notes is to assess the suitability and applicability of these frameworks. These models can be used to identify what is important for your IT function based on your current and desired/necessary activity profile.
Capabilities vary across organisation – not all capabilities have the same importance for all organisations. These frameworks do not readily accommodate variability in the relative importance of capabilities.
The assessment approach taken is to identify a generalised set of capabilities needed across the span of IT function operations, from strategy to operations and delivery. This generic model is then be used to assess individual frameworks to determine their scope and coverage and to identify gaps.
The generic IT function capability model proposed here consists of five groups or domains of major capabilities that can be organised across the span of the IT function:
1. Information Technology Strategy, Management and Governance
2. Technology and Platforms Standards Development and Management
3. Technology and Solution Consulting and Delivery
4. Operational Run The Business/Business as Usual/Service Provision
5. Change The Business/Development and Introduction of New Services
In the context of trends and initiatives such as outsourcing, transition to cloud services and greater platform-based offerings, should the IT function develop and enhance its meta-capabilities – the management of the delivery of capabilities? Is capability identification and delivery management the most important capability? Outsourced service delivery in all its forms is not a fire-and-forget activity. You can outsource the provision of any service except the management of the supply of that service.
The following IT capability models have been evaluated:
• IT4IT Reference Architecture https://www.opengroup.org/it4it contains 32 functional components
• European e-Competence Framework (ECF) http://www.ecompetences.eu/ contains 40 competencies
• ITIL V4 https://www.axelos.com/best-practice-solutions/itil has 34 management practices
• COBIT 2019 https://www.isaca.org/resources/cobit has 40 management and control processes
• APQC Process Classification Framework - https://www.apqc.org/process-performance-management/process-frameworks version 7.2.1 has 44 major IT management processes
• IT Capability Maturity Framework (IT-CMF) https://ivi.ie/critical-capabilities/ contains 37 critical capabilities
The following model has not been evaluated
• Skills Framework for the Information Age (SFIA) - http://www.sfia-online.org/ lists over 100 skills
Critical Review of Open Group IT4IT Reference ArchitectureAlan McSweeney
This reviews the Open Group’s IT4IT Reference Architecture (https://www.opengroup.org/it4it) with respect to other operational frameworks to determine its suitability and applicability to the IT operating function.
IT4IT is intended to be a reference architecture for the management of the IT function. It aims to take a value chain approach to create a model of the functions that IT performs and the services it provides to assist organisations in the identification of the activities that contribute to business competitiveness. It is intended to be an integrated framework for the management of IT that emphasises IT service lifecycles.
This paper reviews what is meant by a value-chain, with special reference to the Supply Chain Operations Reference (SCOR) model (https://www.apics.org/apics-for-business/frameworks/scor). the most widely used and most comprehensive such model.
The SCOR model is part of wider set of operations reference models that describe a view of the critical elements in a value chain:
• Product Life Cycle Operations Reference model (PLCOR) - Manages the activities for product innovation and product and portfolio management
• Customer Chain Operations Reference model (CCOR) - Manages the customer interaction processes
• Design Chain Operations Reference model (DCOR) - Manages the product and service development processes
• Managing for Supply Chain Performance (M4SC) - Translates business strategies into supply chain execution plans and policies
It also compares the IT4IT Reference Architecture and its 32 functional components to other frameworks that purport to identify the critical capabilities of the IT function:
• IT Capability Maturity Framework (IT-CMF) https://ivi.ie/critical-capabilities/ contains 37 critical capabilities
• Skills Framework for the Information Age (SFIA) - http://www.sfia-online.org/ lists over 100 skills
• European e-Competence Framework (ECF) http://www.ecompetences.eu/ contains 40 competencies
• ITIL IT Service Management https://www.axelos.com/best-practice-solutions/itil
• COBIT 2019 https://www.isaca.org/resources/cobit has 40 management and control processes
Analysis of Possible Excess COVID-19 Deaths in Ireland From Jan 2020 to Jun 2020Alan McSweeney
This analysis seeks to determine if there are excess deaths that occurred in Ireland in the interval Jan – Jun 2020 that can be attributed to COVID-19. Excess deaths means deaths in excess of the number of expected deaths plus the number of deaths directly attributed to COVID-19. On the other hand a deficiency of deaths would occur when the number of expected deaths plus the number of deaths directly attributed to COVID-19 is less than the actual deaths.
This analysis uses number of deaths taken from the web site RIP.ie to generate an estimate of the number of deaths in Jan – Jun 2020 in the absence of any other official source. The last data extract from the RIP.ie web site was taken on 3 Jul 2020.
The analysis uses historical data from RIP.ie from 2018 and 2019 to assess its accuracy as a data source.
The analysis then uses the following three estimation approaches to assess the excess or deficiency of deaths:
1. The pattern of deaths in 2020 can be compared to previous comparable year or years. The additional COVID-19 deaths can be added to the comparable year and the difference between the expected, actual from RIP.ie and actual COVID-19 deaths can be analysed to generate an estimate of any excess or deficiency.
2. The age-specific mortality rates described on page 16 can be applied to estimates of population numbers to generates an estimate of expected deaths. This can be compared to the actual RIP.ie and actual COVID-19 deaths to generate an estimate of any excess or deficiency.
3. The range of death rates per 1,000 of population as described in Figure 10 on page 16 can be applied to estimates of population numbers to generates an estimate of expected deaths. This can be compared to the actual RIP.ie and actual COVID-19 deaths to generate an estimate of any excess or deficiency.
This presentation describes systematic, repeatable and co-ordinated approach to agile solution architecture and design. It is intended to describe a set of practical steps and activities embedded within a framework to allow an agile method to be adopted and used for solution design and delivery. This approach ensures consistency in the assessment of solution design options and in subsequent solution design and solution delivery activities. This process leads to the rapid design and delivery of realistic and achievable solutions that meet real solution consumer needs. The approach provides for effective solution decision-making. It generates options and results quickly and consistently. Implementing a framework such as this provides for the creation of a knowledgebase of previous solution design and delivery exercises that leads to an accumulated body of knowledge within the organisation.
Creating A Business Focussed Information Technology StrategyAlan McSweeney
This presentation describes a structured approach to creating a business-focussed information technology strategy.
An effective business-oriented IT strategy is an opportunity to resolve the disconnection and to ensure the IT function is able to and does respond to business needs and is trusted by the business to provide IT solutions.
The IT strategy will consist of static structural elements relating to the organisation of the IT function:
• Capabilities – skills and abilities the IT function should possess and be able to use effectively and efficiently
• IT Function Structure – the organisation and arrangement of the sub-functions and their responsibilities and relationships
• Operating Model – how the IT function work and delivers value and the processes it implements and operates
• Staffing And Roles – the numbers of people, their roles, responsibilities, expected skills, experience and abilities, workload, reporting structures and expected ways of operating
It will also include dynamic elements relating to initiatives, both enabling initiatives within the IT function and specific business initiatives required to achieve the business strategy.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Data Integration, Access, Flow, Exchange, Transfer, Load And Extract Architecture
1. Data Integration, Access,
Flow, Exchange, Transfer,
Load And Extract
Architecture
Alan McSweeney
http://ie.linkedin.com/in/alanmcsweeney
https://www.amazon.com/dp/1797567616
2. Data Integration, Access, Flow, Exchange, Transfer,
Load, Share And Extract
• Set of data movements between data entities - data sources and data targets -
across the organisation’s data landscape
• Data integration is more than just extracting data from operational systems to
populate data warehouses and long-term data stores
• The movement, creation, transfer and exchange of data breathes life into the set
of organisation solutions
• Data integration is the combination of all these data flows, transfers, exchanges,
loads, extracts that occurs across the data landscape and the tools, methods and
approaches to facilitating and achieving them
• Data integration is an enterprise-level capability that should be available to all
applications and solutions
• The organisation’s data fabric should include infrastructural components and tools
that deliver these data integration facilities
• Individual solution and applications and their implementation projects should not
have to create (additional) point-to-point custom integrations
• Data interoperability and solution interoperability are closely related – you cannot
have effective solution interoperability without data interoperability
March 22, 2021 2
3. Evolution Of Data Integration
• With many organisations, data integration tends to have
evolved over time with many solution-specific tactical
approaches implemented
• The consequence is that there is frequently a mixed,
inconsistent data integration topography
• Data integrations are often poorly understood,
undocumented and difficult to support, maintain and
enhance
March 22, 2021 3
5. Data Integration
• Data integration has multiple meanings and multiple ways
of being used such as:
− Integration in terms of handling data transfers, exchanges,
requests for information using a variety of information movement
technologies
− Integration in terms of migrating data from a source to a target
system and/or loading data into a target system
− Integration in terms of aggregating data from multiple sources
and creating one source, with possibly date and time dimensions
added to the integrated data, for reporting and analytics
− Integration in terms of synchronising two data sources or
regularly extracting data from one data sources to update a target
− Integration in terms of service orientation and API management
to provide access to raw data or the results of processing
March 22, 2021 5
6. Two Aspects Of Data Integration
• Overall data integration architecture needs to handle both types
March 22, 2021 6
Operational
System
Operational
System
Operational
System
Operational Integration – allow data to move from one operational system and its data
store to another
Analytic Integration – move data from operational systems and their data stores into a
common structure for retrieval, reporting and analysis
Operational
System
Operational
System
Analytic Data
Store
Data Retrieval
7. Data Integration And Organisation Data Plumbing
March 22, 2021 7
Organisation
Technology
Solutions
Landscape
Data Plumbing
Required to
Support
Solutions
Landscape and
Solution
Interoperability
8. Data Fabric, Data Landscape And Data Entities
• The data landscape is an integrated view of all data
entities within (core) and outside (extended) the
organisation that the organisation obtains, shares and
provides data
• The data fabric is the aggregation of the data entities and
their data flows across the core and extended organisation
• Data entities are data assets that are involved in the
provisioning, storage, processing and transfer of
organisation data
− Data entities perform data-related activities across the spectrum
of data actions and events
− A data entity is a hardware or software technology component
involved in any form of data processing
March 22, 2021 8
9. Importance Of Data Integration In IT Architecture
• Enterprise Architecture – defines overall IT architecture for the organisation
• Data Architecture – defines the data architecture for the organisation, of which data integration and
interoperability is one element
• Solution Architecture – designs solutions in the context of overall enterprise and data architectures and the
need for solutions to access, integrate, exchange, transfer and extract data
− Effective data integration is key to solution interoperability
• Data Integration Architecture – defines a common approach to and set of enabling and implementing
technologies in the areas of data integration, access, flow, exchange, transfer, load and extract that can be
used by all IT solutions
March 22, 2021 9
Enterprise
Architecture
Data
Architecture
Data
Integration
Architecture
Solution
Architecture
10. Business And Information Technology Architecture
March 22, 2021 10
Business Strategy Business Architecture Business Governance
Information
Technology
Governance
Information
Technology Strategy
Information
Technology
Architecture
Data
Architecture
Information
Technology Security
Architecture
Application, Solution,
Infrastructure and
Service Architecture
11. Overall Data Architecture And Capabilities
March 22, 2021 11
Data Infrastructure
and Storage
Data Security,
Protection,
Access Control,
Authentication,
Authorisation
Data
Management,
Governance,
Architecture,
Operations,
Supporting
Processes
Data Reporting and
Analytics,
Visualisation Tools
and Facilities
Data Design,
Modelling,
Operational Data
Stores
Master and Reference
Data Management
Metadata Data
Management
Data Integration,
Access, Flow,
Exchange, Transfer,
Transformation,
Load And Extract
Data Warehouse, Data
Marts, Data Lakes
Unstructured Data
and Document
Management
External Data Sources
and Interacting
Parties
12. Data Integration Architecture
March 22, 2021 12
Data Sources Data Channels
Data Integration
Security,
Authentication,
Authorisation
Data Integration
Operations
Management,
Administration
Data Integration
Development, Testing
and Deployment
External Data Sources
and Targets
Data Integration
Technologies
Data Integration
Scheduler and Rules
Engine
Internal Data Sources
and Targets
13. Data Integration As Part Of Overall Information
Technology Architecture
March 22, 2021 13
Overall Business and IT
Architecture Context
Data
Architecture
Components
Data
Integration
Architecture
Components
14. Organisation Data Zones
• Data zones are containers for data entities with similar access
and location characteristics
March 22, 2021 14
Central Data
Entities and
Infrastructure
Zone
Business
Unit/Location
Entities and
Infrastructure
Zone(s)
Organisation Data Zone
Secure External Organisation Access Zone
Secure External Organisation Participation and Collaboration Zone
Insecure External Organisation Presentation And Access Zone
15. Sample Organisation Data Zones
• Central Data Infrastructure – this contains the central data applications
and their associated data
• Business Unit/Location Data Infrastructure – this is an individual
organisation business unit or location and the data entities it contains
• Organisation – this data zone represents the entire organisation and it
contains all the locations and business units or functions within the
organisation
• Secure External Organisation Access – this zone contains data entities that
enable secure access from outside the organisation
• Secure External Organisation Participation and Collaboration – this is a
location outside the physical organisation boundary where data entities
that are provided by or too trusted external parties reside, including cloud
platforms
• Insecure External Organisation Presentation And Access – this represents
a location where publicly accessible data entities reside. These entities are
regarded as insecure and/or untrusted
• Integration can occur within and between data zones
March 22, 2021 15
16. Source
Data
Entity
Target
Data
Entity
Internal And External Data
• Data can be defined as internal or external
− Internal data is (logically) held within a source data entity
− External data is data brought into or send out of a source data
entity to a target data entity
March 22, 2021 16
Internal Data
Data Entity
Data Load, Data Processing,
New Data Generation
External Data External Data
17. Internal And External Data
• At its core, data integration is concerned with enabling
the transition of data from internal to external states
• The internal and external state of data is separate from the
internal to external location of the source or target data
entity
− Internal – within the organisation data zones
− External – outside the organisation data zones
March 22, 2021 17
18. Data Integration Issues And Trends
March 22, 2021 18
The data landscape has been broadened and there are more data entities that form part of the extended
organisation data landscape as more applications are moved to the cloud and as cloud platforms are used for
providing additional facilities not currently present in organisations such as data analytics and machine learning
Initiatives and projects that are part digital transformation programmes involve integrating data between
internal and external parties
Need to reduce the latency of data integration as response time requirements are reduce
Performance, resilience and availability integration requirements are increasing
Need to deploy operational integrations more quickly to respond to business needs
There is a wider range of data entities as the data landscape increases in complexity
Process automation initiatives require an operational data integration platform
Greater volume and complexity of data integrations represent a potential data loss risk unless actively
monitored and managed
There are more data demands within the organisation especially in the areas of analytics and the associated
data integrations from operational data sources
19. Data Trends Affecting Data Integration
Greater volumes of operational data from increasing numbers of
different sources and providers
Greater volumes of derived data
More data sources both internal and external to the organisation
Data in larger numbers of different formats
Data with wider range of contents
Data being generated at different rates
Data being generated at different times
Data being generated with varying degrees accuracy, reliability
and greater fuzziness
Data that changes constantly
Data that is of different utility and value
March 22, 2021 19
20. Data Integration, Access, Flow, Exchange, Transfer,
Load And Extraction Processes
March 22, 2021 20
Application
Data Source
Application
Data Store
Data Load
Data
Transfer
Data
Exchange
Application
Application
Data
Access
Data
Extraction
Data Source
Data
Flow
Data
Migration
Data
Extraction
Data Store
Data
Replication
Location
Data
Publication
Application
Data
Presentation
Application
Data
Retrieval
21. Data Integration, Access, Flow, Exchange, Transfer,
Load And Extraction Processes
March 22, 2021 21
Application
Data Source
Application
Data Store
Data Load
Data
Transfer
Data
Exchange
Application
Application
Data
Access
Data
Extraction
Data Source
Data
Flow
Data
Migration
Data
Extraction
Data Store
Data
Replication
Location
Data
Publication
Application
Data
Presentation
Application
Data
Retrieval
Data Integration
22. Data Integration, Access, Flow, Exchange, Transfer,
Load And Extraction Processes
• Within any organisation, there will be many different data movements being performed in
different ways using different technologies and approaches:
− API/Web Service
− SOAP
− RPC
− SOA/ESB
− FTP
− ETL/ELT
− EDI
− AS1/2/3
− SMTP
− Database replication
− Change data capture
− IPaaS
− Stream processing
− Message queueing (MQSeries, MQTT, AMQP, Active MQ, JMS, Azure Queues, …)
− DB link
− Batch
− DDS
− OPC-UA/IEC 62541
− IEC 60870
− Proprietary technologies (such as SWIFT)
− … And many others
March 22, 2021 22
Proliferation of integration
technologies and approaches
indicates the long-standing and
pervasive nature of data
integration with information
technology
23. Wider Data Integration Concerns
March 22, 2021 23
Cloud Data
Store (Lake,
Warehouse)
SaaS
Application
and Data Store
On Premises
Data
Application
and Data Store
On Premises
Data
Warehouse
Cloud
Reporting and
Analysis
Application
On Premises
Reporting and
Analysis
Application
On Premises
Data
Application
and Data Store
On Premises
Data
Application
and Data Store
SaaS
Application
and Data Store
SaaS
Application
and Data Store
SaaS
Application
and Data Store
IaaS Hosted
Application
and Data Store
External
Collaborating
Party
External DMZ
24. Wider Data Integration Scenarios And Concerns
• The data integration landscape is becoming more
heterogenous leading to data integration across data
zones
− Between on-premises entities
− Between on-premises and external collaborating parties
− Between external collaborating parties and cloud-based entities
− Between on-premises and cloud SaaS solutions
− Between on-premises and cloud infrastructure IaaS solutions
− Within the same cloud provider
− Between different cloud providers
• The approach to data integration and the technologies to
use has changed from a purely internal use only solution to
one encompassing a range of inter-zonal data movements
March 22, 2021 24
25. Data Integration Scenarios
March 22, 2021 25
Cloud Data
Store (Lake,
Warehouse)
SaaS
Application
and Data Store
On Premises
Data
Application
and Data Store
On Premises
Data
Warehouse
Cloud
Reporting and
Analysis
Application
On Premises
Reporting and
Analysis
Application
On Premises
Data
Application
and Data Store
On Premises
Data
Application
and Data Store
SaaS
Application
and Data Store
SaaS
Application
and Data Store
SaaS
Application
and Data Store
IaaS Hosted
Application
and Data Store
External
Collaborating
Party
External DMZ
Between
On-premises
Entities
Between On-premises Entities and
External Collaborating Parties
26. Data Integration Logical Components
• On Premises Data Integration
− Performs integration within and between on-premises data
entities
• Data Integration Gateway
− Enables data integration between internal and external data
entities
• External Data Integration
− Enables data integration between internal and external data
entitles
− This includes between on-premises and cloud
March 22, 2021 26
27. Data Integration Components
March 22, 2021 27
Cloud Data
Store (Lake,
Warehouse)
SaaS
Application
and Data Store
On Premises
Data
Application
and Data Store
On Premises
Data
Warehouse
Cloud
Reporting and
Analysis
Application
On Premises
Reporting and
Analysis
Application
On Premises
Data
Application
and Data Store
On Premises
Data
Application
and Data Store
SaaS
Application
and Data Store
SaaS
Application
and Data Store
SaaS
Application
and Data Store
IaaS Hosted
Application
and Data Store
External
Collaborating
Party
On
Premises
Data
Integration
Data
Integration
Gateway
External DMZ
External
Data
Integration
28. Data Integration Platform
March 22, 2021 28
Data Integration Logically Extends
Across The Entire Data Span
Data Integration
Plugboard
29. Data Integration, Access, Flow, Exchange, Transfer,
Load And Extract Architecture – Options
• Options
− Implement full data integration architecture
− Implement a logical meta integration architecture combining
multiple tools and technologies
− Implement multiple separate (technology or application specific)
integration platform, with or without overall management
• Irrespective of the approach, creating and maintaining an
inventory of data integrations in an essential activity
March 22, 2021 29
30. Data Integration Mediation/Wrapper/Meta Tool
• Rather than seek to have one big data integration solution,
consider the option of using multiple tools that are
(logically) integrated into a common integration
architecture
March 22, 2021 30
Individual Data Integration Tools/Applications
Meta Data Integration Platform
31. Tool Or Meta Tool
• Meta data integration tool approach can increase
complexity without increasing flexibility or reducing cost
• Overhead of managing multiple individual integration tools
and integrating these with meta tool can be complex
March 22, 2021 31
32. Core And Extended Dimensions Of Data Integration
March 22, 2021 32
Data Sources
and Data
Ingestion,
Data Ingestion
Rules
Data Targets and
Data Mapping/
Transfer, Data
Integration Rules
Data
Transport
Technologies
Data
Transformations
and Data
Processing Rules
Data
Structures,
Formats and
Types
Security
and
Access
Control
Speed,
Volume,
Throughput,
Capacity,
Scalability
Development,
Validation,
Deployment
and
Maintenance
Monitoring,
Administration
and
Management
Logging,
Analysis,
Reporting,
Event and Alert
Management
Scheduling
and
Triggering Interim
Data
Storage/
Data
Staging
Capacity
Management
Availability
and
Continuity
Management
Platform
Architecture
Management
Operations
Management
Governance
and
Knowledge
Management,
Data
Semantics
Service Level
Management
33. Dimensions Of Data Integration
• Three dimensions of data integration
− Core – operational components – the core functionality of the data integration platform
• Data Sources and Data Ingestion, Data Ingestion Rules
• Data Targets and Data Mapping/Transfer, Data Integration Rules
• Data Transport Technologies
• Interim Data Storage/Data Staging
• Data Structures, Formats and Types
• Data Transformations and Data Processing Rules
− Platform – management aspects – the operational elements of the data integration platform
• Speed, Volume, Throughput, Capacity, Scalability
• Security and Access Control
• Development, Validation, Deployment and Maintenance
• Monitoring, Administration and Management
• Scheduling and Triggering
• Logging, Analysis, Reporting, Event and Alert Management
− Service – key supporting processes and enabling components – that need to be part of any
usable data integration platform
• Service Level Management
• Capacity Management
• Availability and Continuity Management
• Platform Architecture Management
• Governance and Knowledge Management, Data Semantics
• Operations Management
March 22, 2021 33
34. Data Integration Core Operational Characteristics
• Data Sources and Data Ingestion, Data Ingestion Rules – the
sources of data for data integration and the rules and technologies
for processing
• Data Targets and Data Mapping/Transfer, Data Integration Rules –
the targets of data for data integration and the rules and
technologies for processing
• Data Transport Technologies – support for the range of data
integration technologies
• Interim Data Storage/Data Staging – provision of a data staging
area for asynchronous data retrieval
• Data Structures, Formats and Types – support for a range of input
and output data formats and types and the ability to convert from
one to another
• Data Transformations and Data Processing Rules – facility for
transforming source data
March 22, 2021 34
35. Data Integration Platform Management
Characteristics
• Speed, Volume, Throughput, Capacity, Scalability – ability of the platform
to handle the volume of data integration activity within agreed times
• Security and Access Control – provision of facilities to authenticate and
authorise data access requests and to interact with data source security
layer
• Development, Validation, Deployment and Maintenance – capability to
develop, test, deploy and manage new data integrations and changes to
existing data integrations
• Monitoring, Administration and Management – facilities to monitor the
operation of the data integration platform and manage and administer it
• Scheduling and Triggering – capacity to manage data integration
schedules and events that trigger integrations
• Logging, Analysis, Reporting, Event and Alert Management -provision of
event and activity logging, the ability to define and receive alerts and the
ability to report on and analyse event data
March 22, 2021 35
36. Data Integration Platform Service Characteristics
• Service Level Management – ensuring that the platform complies with
agreed data integration performance and throughput service levels
• Capacity Management – monitoring the resources used by the integration
platform and ensuring that the platform has sufficient resources
• Availability and Continuity Management – guaranteeing that the platform
meets availability needs and ensuring its continuity of operations
• Platform Architecture Management – managing the overall platform
architecture, its upgrades, the additional of new facilities and the support
for new integration technologies
• Governance and Knowledge Management, Data Semantics – managing
knowledge about data integration and providing information about data
read from sources and transferred to targets
• Operations Management – managing the provision of operational support
services for all aspects of the data integration platform
March 22, 2021 36
37. Logical Unified Data Integration Architecture
March 22, 2021 37
Dashboard/
Analytics/
Reporting
Deployed Data
Integrations
Operational
Process Usage
Log
Scheduler,
Rules Engine
Operational
Data
Integrations
Integration Design and
Development, Version
Management and Control
Integration
Templates and
Template
Library
Integration
Publication/
Deployment
External
Data Sources
and Targets
Internal Data Sources
and Targets
Integration
Component
/Product
/Tool Library
Deployed
Integration
Operation
Alerting/
Event
Management
Management
and
Administration
Interface
Internal Access
Layer
External
Access
Layer
Data
Knowledge
Store
Security
Interim Data
Store
External
to
Internal
Translation
Data
Integration
Execution
Core integration Platform
Data
Integration
Gateway
38. Logical Unified Data Integration Architecture –
Components – 1/2
• Core integration Platform – this orchestrates and manages the operation of data integrations
• Deployed Integration Operation – these are specific data integrations that have been developed,
tested and are deployed to the Core Integration Platform
• Scheduler, Rules Engine – this component manages the definition and operation integration schedules
and the actioning of integrations based on triggering events
• Operational Data Integrations – these are data integrations that are deployed to operation
• Data Integration Execution – this is the component of the Core Integration Platform that executes data
integrations
• Data Integration Gateway – gateway components provide communications channels to external data
sources and targets
• External Access Layer/Connectors – this allows external data sources and targets connect to the Core
Integration Platform
• Internal Access Layer /Connectors – this allows internal data sources and targets connect to the Core
Integration Platform
• Security – this provide support for source and target authorisation and authentication and integration
with their security layers
• Internal Data Sources and Targets – these are the data sources and targets that are local to the
platform
• External Data Targets and Targets – these are the data sources and targets that are remote from the
platform
• External to Internal Translation – this is intended to represent a facility that translates external
requests to internal addresses to provide an additional level of security
March 22, 2021 38
39. Logical Unified Data Integration Architecture –
Components – 2/2
• Data Knowledge Store – this stores information about data being integrated with to enable its retrieval
by subject and content
• Interim Data Store – this is a staging area for data being stored between transfer from source to target
• Operational Process Usage Log – this contains a log of integration usage and activities
• Alerting/Event Management – this allows for the definition, maintenance and handling events and
alerts
• Dashboard/Analytics/Reporting – this provide a facilities to report on platform activity and usage
• Management and Administration Interface – this allows the platform to be managed and
administered
• Deployed Data Integrations – this represents the set of active deployed integrations
• Integration Design and Development, Version Management and Control – this enables data
integrations to be developed, tested, deployed to production and subsequently updated
• Integration Templates and Template Library – this contains a library of data integration templates that
can be used and reused during development
• Integration Component /Product/Tool Library – this represents a library of integration technology
tools that can be incorporated into and used in integration run times
• Integration Publication/ Deployment – this supports the process for deploying data integrations into
production
March 22, 2021 39
40. Generalised Data Integration Approach
• Every data integration consists of a minimum of two (logical)
components
1. A source extract/provision half
2. A target delivery half
• The source must make the data available in some form and either
allow (enable PULL) or initiate (PUSH) the data movement to the
target
• The target then receives (PUSH) or retrieves (PULL) the data
• Direct source to target data integration involves individual point-to-
point connections, bypassing any data integration hub
• There may be an interim transformation stage where the format
and content of the provided data is changed to suit the needs of
target
• Some Source/Target PUSH/PULL combinations imply the need for a
staging area where extracted/provided data from the source resides
before being passed to the target
− Asynchronous data integration
• Classification can be extended by allowing for multiple sources and
targets
March 22, 2021 40
Source
PUSH PULL
Target
PUSH
PULL
41. Logical Data Integration Scenarios
March 22, 2021 41
Data Source Data Source
Data Source
Data Source
Data Target
Data Source
Source PULL
Target PUSH
Data Source Data Target
Source PUSH
Target PUSH
Source PULL
Target PULL
Source PUSH
Target PULL
Source PUSH
Target PUSH
INCOMING HALF OUTGOING HALF
Data Target
Source PUSH
Target PULL
Data Target
Source PUSH
Target PUSH
Data Target
Data
Integration
Hub
42. Integration Combinations
• There are many different integration modes/patterns depending on factors such as:
− Number of sources for a single integration
− Number of targets for a single integration
− Push or pull by source and target
− Initiator of the integration – source, target or hub
• Single Source, Single Target
− Source Push Target Push
− Source Push Target Pull
− Source Pull Target Push
− Source Pull Target Pull
• Multiple Source, Single Target
− Source Push Target Push
− Source Push Target Pull
− Source Pull Target Push
− Source Pull Target Pull
• Single Source, Multiple Target
− Source Push Target Push
− Source Push Target Pull
− Source Pull Target Push
− Source Pull Target Pull
• Multiple Source, Multiple Target
− Source Push Target Push
− Source Push Target Pull
− Source Pull Target Push
− Source Pull Target Pull
March 22, 2021 42
43. Single Source PUSH Single Target PUSH
• Single data source pushes data to integration hub
• Hub pushes data to target
March 22, 2021 43
Data Source Data Target
Source PUSH
Target PUSH
44. Single Source PUSH Single Target PULL
March 22, 2021 44
• Single data source pushes data to integration hub
• Hub allows the target to pull data
Data Source Data Target
Source PUSH
Target PULL
45. Single Source PULL Single Target PUSH
March 22, 2021 45
• Data pulled from single data source
• Hub pushes data to target
Data Source Data Target
Source PULL
Target PUSH
46. Single Source PULL Single Target PULL
March 22, 2021 46
• Data pulled from single data source
• Hub allows the target to pull data
Data Source Data Target
Source PULL
Target PULL
47. Multiple Source PUSH Single Target PUSH
March 22, 2021 47
Data Source Data Target
Multiple Source PUSH
Target PUSH
Data Source
Data Source
• Multiple data sources push data to integration hub where
it is aggregated
• Hub pushes data to target
48. Multiple Source PUSH Single Target PULL
March 22, 2021 48
Data Source Data Target
Multiple Source PUSH
Target PULL
Data Source
Data Source
• Data pushed from multiple data sources and aggregated
• Hub allows the target to pull data
49. Multiple Source PULL Single Target PUSH
March 22, 2021 49
Data Source Data Target
Multiple Source PULL
Target PUSH
Data Source
Data Source
• Data pulled from multiple data sources and aggregated
• Hub pushes data to target
50. Multiple Source PULL Single Target PULL
March 22, 2021 50
Data Source Data Target
Multiple Source PULL
Target PULL
Data Source
Data Source
• Data pulled from multiple data sources and aggregated
• Hub pushes data to multiple targets
51. Single Source PUSH Multiple Target PUSH
March 22, 2021 51
Data Source Data Target
Source PUSH
Multiple Target PUSH
Data Target
Data Target
• Single data source pushes data to integration hub
• Hub allows the target to pull data
52. Single Source PUSH Multiple Target PULL
March 22, 2021 52
Data Source Data Target
Source PUSH
Multiple Target PULL
Data Target
Data Target
• Single data source pushes data to integration hub
• Hub allows multiple targets to pull data
53. Single Source PULL Multiple Target PUSH
March 22, 2021 53
Data Source Data Target
Source PULL
Multiple Target PUSH
Data Target
Data Target
• Data pulled from single data source
• Hub pushes data to multiple targets
54. Single Source PULL Multiple Target PULL
March 22, 2021 54
Data Source Data Target
Source PULL
Multiple Target PULL
Data Target
Data Target
• Data pulled from single data source
• Hub allows multiple targets to pull data
55. Multiple Source PUSH Multiple Target PUSH
March 22, 2021 55
Data Source Data Target
Multiple Source PUSH
Multiple Target PUSH
Data Target
Data Target
• Multiple data sources pushes data to integration hub and
aggregated
• Hub allows multiple targets to pull aggregated data
Data Source
Data Source
56. Multiple Source PUSH Multiple Target PULL
March 22, 2021 56
Data Source Data Target
Multiple Source PUSH
Multiple Target PULL
Data Target
Data Target
• Multiple data sources pushes data to integration hub and
aggregated
• Hub pushes aggregated data to multiple targets
Data Source
Data Source
57. Multiple Source PULL Multiple Target PUSH
March 22, 2021 57
Data Source Data Target
Multiple Source PULL
Multiple Target PUSH
Data Target
Data Target
• Data pulled from multiple data sources and aggregated
• Hub pushes aggregated data to multiple targets
Data Source
Data Source
58. Multiple Source PULL Multiple Target PULL
March 22, 2021 58
Data Source Data Target
Multiple Source PULL
Multiple Target PULL
Data Target
Data Target
• Data pulled from multiple data sources and aggregated
• Hub allows multiple targets to pull aggregated data
Data Source
Data Source
59. Data Integration Initiation And Notification
• For source PULL/target PUSH integrations, the integration hub is
always in direct control and can synchronise the two halves of the
integration – its can initiate the data PULL and then PUSH the
resulting data
• For other combinations, the hub has less control of synchronisation
− Source PUSH/Target PUSH – integration hub can PUSH the data to the target
after it has been PUSHed by the source
− Source PULL/Target PULL – integration hub can PULL the data from the source
when the target requests it
− Source PUSH/Target PULL – integration hub must wait for source to PUSH data
before it can respond to PULL request from target
March 22, 2021 59
Source
PUSH PULL
Target
PUSH
PULL
= Fully Synchronised
= Partially Synchronised
= Unsynchronised
60. Synchronous And Asynchronous Data Integration
• Synchronous integration occurs where the hub initiates both
the PULLing of source data and the PUSHing of transmitted
data
• Asynchronous integration is where the source supply and the
target provision of data do not occur in sequence or where the
triggering of the source supply or target provision events are
not controlled
• This includes subscription-type integration where the data is
retained by the hub and retrieved by subscribers
March 22, 2021 60
Data Source Data Target
Source PULL
Target PUSH
61. Data Integration Hub Data Retention
• How long should the integration hub retain data?
• The integration hub should not become one more
organisation data store where data is retained forever
• Target PULL integrations are the potential source of
accumulated retained undelivered data
• The integration hub needs to include a facility to purge
unretrieved data and/or the data retention interval needs
to be specified as a data integration attribute
• Where a target makes a PULL request for data no longer
available, the integration hub needs to handle this.
March 22, 2021 61
62. Data Integration Initiation – Source PULL/Target
PUSH
March 22, 2021 62
Data Target
Data Source Data Target
Hub Requests Data from Source and Send it
To The Target
63. Data Integration Initiation – Source PUSH/Target
PUSH
March 22, 2021 63
Data Source Data Target
Hub Receives Data from Source
Data Target
Data Target
Hub Pushes Data to Target
64. Data Integration Initiation – Source PULL/Target
PULL
March 22, 2021 64
Data Target
Data Target
Target Requests Data
Data Source Data Target
Hub Pulls Data From Source
Data Target
Data Target
Hub Responds to Pull Request From Target
65. Data Integration Initiation – Source PUSH/Target
PULL
March 22, 2021 65
Data Target
Data Target
Target Requests Data
Hub Responds Data Is Not Available
Data Source Data Target
Source Pushes Data to Hub Hub Receives Data from Source
Data Target
Data Target
Hub Notifies Target Data is Available
Data Target
Data Target
Target Requests Data
Hub Responds to Pull Request From Target
66. Data Integration Security
• Data integration security arises in fours areas
− Source
• PUSH – source may need to authenticate with the integration hub
• PULL – integration hub may need to authenticate with data source
− Target
• PUSH – integration hub may need to authenticate with data target
• PULL – target may need to authenticate with the integration hub
• Integration hub needs to support a range of authentication
and authorisation protocols
• Integration hub also needs to support security operations
and administration
March 22, 2021 66
67. Data Integration Security – Source PUSH
March 22, 2021 67
Data Source Data Target
Hub Authenticates Source and Transmits
Authorisation and Access Details
Data Source Data Target
Data Source Data Target
Source Authenticates With Hub, Identifying
Integration Name
Source PUSHes data
68. Data Integration Security – Source PULL
March 22, 2021 68
Data Source Data Target
Source Authenticates Source and Transmits
Authorisation and Access Details
Data Source Data Target
Data Source Data Target
Hub Authenticates With Source, Identifying
Integration Name
Hub PULLs data
69. Data Integration Security – Target PUSH
March 22, 2021 69
Data Target
Data Target
Data Target
Data Target
Data Target
Data Target
Target Authenticates Source and Transmits
Authorisation and Access Details
Hub Authenticates With Target, Identifying
Integration Name
Hub PUSHes data
70. Data Integration Security – Target PULL
March 22, 2021 70
Data Target
Data Target
Data Target
Data Target
Data Target
Data Target
Hub Authenticates Target and Transmits
Authorisation and Access Details
Target Authenticates With Hub, Identifying
Integration Name
Target PULLs data
71. Data Integration Metadata
• Data that provides information about the data integration that enables the
integration to be defined, implemented, operated, managed and monitored
• Classifications of metadata types
March 22, 2021 71
Types of
Integration
Metadata
Descriptive Information about the data integration
Business
What the data is, its sources, targets, meaning and relationships
with other data
Structural How the data integration is organised, operated and how versions
are maintained?
Administrative/
Process
How the data integration should be managed and administered
through its lifecycle stages and who can perform what operations
on the metadata
Statistical Information on actual data integration options, usage and other
volumetrics
Reference Sets of values for structured metadata fields
72. Attributes Of A Data Integration
• Each data
integration has a
number of
attributes or sets
of metadata that
defines its
operation and
use in detail
• This information
is needed to
define and
operate the
integration
• The information
must be
collected, stored,
made available
and maintained
in a metadata
store
March 22, 2021 72
Attribute Description
Identifier Defines a unique integration identifier
Related Integrations Lists related integrations and identifies the nature of the relationships, including any dependencies
Source(s) Defines the source systems or locations where the source data will be obtained from
Target(s) Defines the target systems or locations to which the data will be delivered or made available
Push/Pull from Source Identifies if the data is pulled or pushed from the source
Push/Pull from Target Identifies if the data is pulled or pushed to the target
Source Data Format Defines the format of the source data
Target Data Format Defines the format of the target data
Source Protocol Defines the interface protocol used to obtain the source data and any protocol-specific information
Target Protocol Defines the interface protocol used to deliver the target data and any protocol-specific information
Validation Lists any validations to be performed on the source data, defining where they are blocking or non-
blocking and any exception processing to be performed
Transformation Defines any transformation to be performed on the source data including transformation steps and
any splits or aggregations performed
Data Size Contains an estimate of the size of the source and (transformed) target data
Trigger Defines the event(s) that triggers the integration, if relevant
Frequency Defines the expected frequency of the data integration, if relevant
Data Retention Defines how long the data should be retained between source and target
Monitoring and Alerting Lists how the integration will be monitored and how alerts will be generated based on events
Source Access Security Defines any security associated with accessing the data source
Target Access Security Defines any security associated with accessing the data target
Audit Log Identifies where audit information relating to the operation and use of the integration ate stored
Restart After Failure Lists detail on how the integration should be recovered and restarted after failure
Data Sensitivity Lists the sensitivity of the data being handled by the integration
Ownership Identifies the business and technical owners of the integration
Priority Defines any priority assigned to the integration
Supporting Documentation Identifies where documentation relating to the integration is available
User Interface to
View/Maintain Transferred
Data
Identifies the user interface that is available to view and maintain the transferred data
Version Details on the current integration version and any previous versions
Active/Inactive Flag Indicates if the integration is active or inactive
73. Data Integration Specification
• Data integration can be logically specified as follows
{Integration{Name, Attributes}
Sources
{Source1,TechnologyType,Direction,Attributes}
{Source2,TechnologyType,Direction,Attributes}
{…}
}
{Transformation
{Name, Attributes}
Steps
{Step1,<Processing>}
{Step2,<Processing>}
[…]
}
Targets
{Target1,TechnologyType,Direction,Attributes}
{Target2,TechnologyType,Direction,Attributes}
{…}
}
March 22, 2021 73
Set of data sources, the mechanisms
by which data is transferred, the
transfer direction (PUSH/PULL) and
the extended integration attributes
The transformation performed on
the source data to create the data
sent to or made available to the
target
Set of data targets, the mechanisms
by which data is transferred, the
transfer direction (PUSH/PULL) and
the extended integration attributes
Overall integration identifier and
attributes
74. Data Integration Specification
• Attributes can be defined at the overall data integration
level or at the individual data source and target definition
level
• Technology type could be one of:
− FT – transfer a file using a file transfer protocol
− API – information is requested using an API made available by the
application
− MSG – information is exchanged using a message queueing
protocol
− ETL – data is exchanged using an ETL process
− HTTP – data is exchanged using HTTP GET/PUT
• This describes a common approach to defining data
integrations
March 22, 2021 74
75. Data Integration Transformation Specification
• Set of data processing activities, requiring on or more inputs
and performed in structured interim contingent outcome-
dependent order or sequence to generate one or more outputs
and cause one or more outcomes
• Transformation is the self-contained unit that completes a
given task
• Transformation can consist of sub-processes and/or activities
• Transformation and its constituent activities, stages and steps
can be decomposed into a number of levels of detail, down to
the individual atomic level
• Transformation is primarily concerned with its outcomes and
outputs
March 22, 2021 75
76. Data Integration Transformation
March 22, 2021 76
• Transformation can be represented at different levels of detail
Transformation
Trigger(s)
Required Input(s)
Output(s)
Outcome(s)
77. Data Integration Transformation
March 22, 2021 77
• Activities within transformation can be linked by routers that
direct flow and maintain order based on the values of output(s)
and the status of outcome(s)
Data
Processing
Trigger(s)
Required
Input(s)
Output(s)
Outcome(s)
Router
Data
Processing
Trigger(s)
Required
Input(s)
Output(s)
Outcome(s)
Data
Processing
Trigger(s)
Required
Input(s)
Output(s)
Outcome(s)
78. Standardised Deployed Operational Data
Integrations
March 22, 2021 78
Dashboard/
Analytics/
Reporting
Deployed Data
Integrations
Operational
Process Usage
Log
Scheduler,
Rules Engine
Operational
Data
Integrations
Integration Design and
Development, Version
Management and Control
Integration
Templates and
Template
Library
Integration
Publication/
Deployment
External
Data Sources
and Targets
Internal Data Sources
and Targets
Integration
Component
/Product
/Tool Library
Deployed
Integration
Operation
Alerting/
Event
Management
Management
and
Administration
Interface
Internal Access
Layer
External
Access
Layer
Data
Knowledge
Store
Security
Interim Data
Store
External
to
Internal
Translation
Data
Integration
Execution
Core integration Platform
Data
Integration
Gateway
79. Next Steps
• Understand the Scope of the Current Data Integration
State
− Create an inventory of data integration technologies
− Create an inventory of existing data integrations
• Create a Future State Data Integration Architecture
− Create a data integration reference architecture
− Translate reference architecture into an implementation design
− Map implementation design to integration technologies and
products
− Map existing integrations to implementation design
March 22, 2021 79