This document provides guidelines for data modeling and mapping for an enterprise data warehouse (EDW). It covers:
1. Definitions of key concepts like entities, attributes, relationships, and normal forms.
2. Standards for upstream and downstream data models including following Teradata's data model, using 3rd normal form, and defining primary keys and indexes.
3. Guidelines for logical and physical data modeling including identifying entities and attributes, applying naming conventions, and optimizing for performance.
4. Recommendations for data mapping documents including providing transformation rules and extraction criteria for source to target mapping.
5. A self-review checklist for data models and mappings to ensure all standards and conventions are followed
This document discusses various techniques for specifying data requirements, including data modeling with entity-relationship diagrams and UML class diagrams, developing a data dictionary, performing data analysis with CRUD matrices, specifying reports, and designing dashboards. It provides examples and best practices for each technique to help effectively capture and communicate an application's data requirements.
This document provides an overview of database management systems (DBMS) and their basic concepts. It discusses the differences between data and information, the components of a database including hardware, software, data, users and procedures. It also describes database operations, data models, database architecture including three-tier architecture, database languages, data modeling using entity-relationship diagrams, and constraints in ER modeling including mapping cardinalities, participation constraints, and keys.
This document discusses conceptual data modeling using the entity-relationship (ER) model. It defines key concepts of the ER model including entities, attributes, relationships, entity sets, relationship sets, keys, and ER diagrams. It explains how the ER model is used in the early conceptual design phase of database design to capture the essential data requirements and produce a conceptual schema that can be later mapped to a logical and physical database implementation.
This document discusses the components and architecture of a data warehouse. It describes the major components as the source data component, data staging component, information delivery component, metadata component, and management/control component. It then discusses each of these components in more detail, specifically covering source data types, the extract-transform-load process in data staging, the data storage repository, and authentication/monitoring in information delivery. Dimensional modeling is also introduced as the preferred approach for data warehouse design compared to entity-relationship modeling.
The document provides instructions for building a data warehouse using Microsoft SQL Server and performing extract, transform, and load (ETL) operations on data from various sources about personal loans. It describes creating ETL workflows to extract data from files on loans, customers, employees, and payments then loading it into tables. Dimensional models and reports are then developed using the data to analyze repayment trends by company, region, gender and other attributes.
Data development involves analyzing, designing, implementing, deploying, and maintaining data solutions to maximize the value of enterprise data. It includes defining data requirements, designing data components like databases and reports, and implementing these components. Effective data development requires collaboration between business experts, data architects, analysts, developers and other roles. The activities of data development follow the system development lifecycle and include data modeling, analysis, design, implementation, and maintenance.
The document discusses data development and data modeling concepts. It describes data development as defining data requirements, designing data solutions, and implementing components like databases, reports, and interfaces. Effective data development requires collaboration between business experts, data architects, analysts and developers. It also outlines the key activities in data modeling including analyzing information needs, developing conceptual, logical and physical data models, designing databases and information products, and implementing and testing the data solution.
This document discusses various techniques for specifying data requirements, including data modeling with entity-relationship diagrams and UML class diagrams, developing a data dictionary, performing data analysis with CRUD matrices, specifying reports, and designing dashboards. It provides examples and best practices for each technique to help effectively capture and communicate an application's data requirements.
This document provides an overview of database management systems (DBMS) and their basic concepts. It discusses the differences between data and information, the components of a database including hardware, software, data, users and procedures. It also describes database operations, data models, database architecture including three-tier architecture, database languages, data modeling using entity-relationship diagrams, and constraints in ER modeling including mapping cardinalities, participation constraints, and keys.
This document discusses conceptual data modeling using the entity-relationship (ER) model. It defines key concepts of the ER model including entities, attributes, relationships, entity sets, relationship sets, keys, and ER diagrams. It explains how the ER model is used in the early conceptual design phase of database design to capture the essential data requirements and produce a conceptual schema that can be later mapped to a logical and physical database implementation.
This document discusses the components and architecture of a data warehouse. It describes the major components as the source data component, data staging component, information delivery component, metadata component, and management/control component. It then discusses each of these components in more detail, specifically covering source data types, the extract-transform-load process in data staging, the data storage repository, and authentication/monitoring in information delivery. Dimensional modeling is also introduced as the preferred approach for data warehouse design compared to entity-relationship modeling.
The document provides instructions for building a data warehouse using Microsoft SQL Server and performing extract, transform, and load (ETL) operations on data from various sources about personal loans. It describes creating ETL workflows to extract data from files on loans, customers, employees, and payments then loading it into tables. Dimensional models and reports are then developed using the data to analyze repayment trends by company, region, gender and other attributes.
Data development involves analyzing, designing, implementing, deploying, and maintaining data solutions to maximize the value of enterprise data. It includes defining data requirements, designing data components like databases and reports, and implementing these components. Effective data development requires collaboration between business experts, data architects, analysts, developers and other roles. The activities of data development follow the system development lifecycle and include data modeling, analysis, design, implementation, and maintenance.
The document discusses data development and data modeling concepts. It describes data development as defining data requirements, designing data solutions, and implementing components like databases, reports, and interfaces. Effective data development requires collaboration between business experts, data architects, analysts and developers. It also outlines the key activities in data modeling including analyzing information needs, developing conceptual, logical and physical data models, designing databases and information products, and implementing and testing the data solution.
It 302 computerized accounting (week 2) - sharifahalish sha
Here are some potential ways to represent relational databases other than using tables and relationships:
- Graph databases: Represent data as nodes, edges, and properties. Nodes represent entities, edges represent relationships between entities. Good for highly connected data.
- Document databases: Store data in flexible, JSON-like documents rather than rigid tables. Good for semi-structured or unstructured data.
- Multidimensional databases (OLAP cubes): Represent data in cubes with dimensions and measures. Good for analytical queries involving aggregation and slicing/dicing of data.
- Network/graph databases: Similar to graph databases but focus more on network properties like paths, connectivity etc. Good for social networks, recommendation systems.
-
Artifacts, Data Dictionary, Data Modeling, Data WranglingFaisal Akbar
This document discusses different data modeling concepts including artifacts, data dictionaries, and data modeling. It defines artifacts as tangible byproducts of software development that help describe functions, architecture, and design. Data dictionaries are described as databases containing metadata about the data stored in other databases, including information like field sizes and data authorization. Different types of data models are presented, including conceptual, logical, and physical models, with conceptual being most abstract and physical being database specific. The document also discusses data wrangling as the process of cleaning, structuring, and enriching raw data.
The document discusses database modeling, management, and development. It covers database design and modeling including conceptual, logical, and physical database design. It also discusses entity-relationship modeling including entities, attributes, relationships, keys, and constraints. Additionally, it covers Java database connectivity (JDBC) including the different types of JDBC drivers and how to access a database using JDBC.
The document discusses the process of analyzing client requirements for a new system. This includes gathering information from clients, clarifying needs, structuring requirements, and confirming with clients that all functional, quality, and other needs have been identified correctly and fall within the project scope. The key steps are analyzing the information gathered, documenting the requirements, and obtaining final sign-off from stakeholders to finalize the requirements document.
WBC Entity Relationship and data flow diagramsArshitSood3
This document provides information on entity relationship diagrams (ERDs) and data flow diagrams (DFDs). It defines ERDs and DFDs, lists their key components and symbols. For ERDs, it explains entities, attributes, relationships and provides examples of how to create a simple ERD. For DFDs, it defines the symbols used including external entities, processes, data stores and data flows. It also provides guidelines for developing effective diagrams.
Journal of Physics Conference SeriesPAPER • OPEN ACCESS.docxLaticiaGrissomzz
Journal of Physics: Conference Series
PAPER • OPEN ACCESS
The methodology of database design in
organization management systems
To cite this article: I L Chudinov et al 2017 J. Phys.: Conf. Ser. 803 012030
View the article online for updates and enhancements.
You may also like
The Construction of Group Financial
Management Information System
Yuan Ma
-
Identification of E-Maintenance Elements
and Indicators that Affect Maintenance
Performance of High Rise Building: A
Literature Review
Nurul Inayah Wardahni, Leni Sagita
Riantini, Yusuf Latief et al.
-
Web-Based Project Management
Information System in Construction
Projects
M R Fachrizal, J C Wibawa and Z Afifah
-
This content was downloaded from IP address 75.44.16.235 on 09/10/2022 at 19:18
https://doi.org/10.1088/1742-6596/803/1/012030
https://iopscience.iop.org/article/10.1088/1757-899X/750/1/012025
https://iopscience.iop.org/article/10.1088/1757-899X/750/1/012025
https://iopscience.iop.org/article/10.1088/1757-899X/1007/1/012021
https://iopscience.iop.org/article/10.1088/1757-899X/1007/1/012021
https://iopscience.iop.org/article/10.1088/1757-899X/1007/1/012021
https://iopscience.iop.org/article/10.1088/1757-899X/1007/1/012021
https://iopscience.iop.org/article/10.1088/1757-899X/879/1/012064
https://iopscience.iop.org/article/10.1088/1757-899X/879/1/012064
https://iopscience.iop.org/article/10.1088/1757-899X/879/1/012064
The methodology of database design in organization
management systems
I L Chudinov, V V Osipova, Y V Bobrova
Tomsk Polytechnic University, 30, Lenina ave., Tomsk, 634050, Russia
E-mail: [email protected]
Abstract. The paper describes the unified methodology of database design for management
information systems. Designing the conceptual information model for the domain area is the
most important and labor-intensive stage in database design. Basing on the proposed integrated
approach to design, the conceptual information model, the main principles of developing the
relation databases are provided and user’s information needs are considered. According to the
methodology, the process of designing the conceptual information model includes three basic
stages, which are defined in detail. Finally, the article describes the process of performing the
results of analyzing user’s information needs and the rationale for use of classifiers.
1. Introduction
Management information systems are among the most important components of information
technologies (IT), used in a company. They are usually classified by the functions into the following
systems: Manufacturing Execution Systems (MES), Human Resource Management (HRM), Enterprise
Content Management (ECM), Customer Relationship Management (CRM), etc. [1]. Such systems are
used a special structured database and are required for reengineering of the whole enterprise
management system, while the integration makes it difficult to use them. These systems are expensive
enough and particularly devel.
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...JOHNLEAK1
This document provides information about different types of data models:
1. Conceptual data models define entities, attributes, and relationships at a high level without technical details.
2. Logical data models build on conceptual models by adding more detail like data types but remain independent of specific databases.
3. Physical data models describe how the database will be implemented for a specific database system, including keys, constraints and other features.
The document discusses database fundamentals and provides an overview of key concepts including:
- The objectives of learning about database systems and their basic components
- An introduction to Entity Relationship (ER) modeling for conceptual database design
- The different types of database systems including relational, hierarchical, network, and object-oriented
- How to create a database environment using ER modeling to design the structure and relationships of data
The document discusses database design processes and concepts. It covers:
1) The objectives of database design are to create logical and physical models of the proposed database system. The logical model focuses on data requirements while the physical model translates the logical design based on hardware/software constraints.
2) Proper database design is important as it provides a blueprint for how data is stored and accessed, defines application behavior, and meets user requirements. It can also improve performance.
3) The overall workflow involves requirement analysis, database designing including logical and physical models, and implementation including testing to ensure requirements are met.
This document provides an overview of data flow diagrams (DFDs), including:
- DFDs model the flow of information through a system using four symbols: processes, data flows, external entities, and data stores.
- The case study is a video rental store that lends videos to customers, orders from a supplier, and handles overdue fees.
- There are three types of DFDs: context diagrams provide an overview, level 1 show sub-processes and data at a high level, and level 2 (and lower) provide more detailed breakdowns of individual activities.
A template defining an outline structure for the clear and unambiguous definition of the discreet data elements (tables, columns, fields) within the physical data management layers of the required data solution.
This document discusses preparing a requirements report to communicate findings from gathering business requirements to clients. It describes including an introduction describing the report's purpose and system scope, a system description using diagrams, functional requirements specifying what the system must and may do, non-functional requirements defining constraints, information domain specifying data needs, costs, benefits, and other relevant topics. Storyboards providing visual representations of website interfaces are also recommended. The purpose is to gain agreement from clients on the objectives of the proposed system.
The document discusses different types of data models including conceptual, logical, and physical models. It describes conceptual models as focusing on business significance without technical details, logical models as adding more structure and relationships from a business perspective, and physical models as depicting the actual database layout. The document also covers other data modeling techniques such as hierarchical, network, object-oriented, relational, and dimensional modeling. Dimensional modeling structures data into facts and dimensions for efficient data warehousing.
FOCUS AREA:
- Identify data requirements and goals.
- IT solution (data) design.
- Focus on data development and configuration (solutions/projects).
- Develop data standards.
- Ensure data integration.
- Ensure correct data testing.
- Maintain and optimize data solutions.
RELATION TO STRATEGY:
- Develop data solutions based on business/IT requirements
Develop data solutions and goals based on operational objectives.
- Link business KPI’s to system KPI’s.
- Ensure correct data reporting in terms of system reports, cockpits, dashboards and scorecards.
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]Usman Tariq
In this PPT, you will learn:
• About data modeling and why data models are important
• About the basic data-modeling building blocks
• What business rules are and how they influence database design
• How the major data models evolved
• About emerging alternative data models and the needs they fulfill
• How data models can be classified by their level of abstraction
Author: Carlos Coronel | Steven Morris
The document outlines the general steps in database development which include enterprise data modeling (EDM) and developing an information systems architecture (ISA). Key steps include reviewing current systems, analyzing business requirements, planning the database project, and considering how the ISA can grow and be flexible. The development process also involves conceptual and logical data modeling, physical database design, and implementation.
Why BI ?
Performance management
Identify trends
Cash flow trend
Fine-tune operations
Sales pipeline analysis
Future projections
business Forecasting
Decision Making Tools
Convert data into information
How to Think ?
What happened?
What is happening?
Why did it happen?
What will happen?
What do I want to happen?
The document discusses three types of data models: conceptual, logical, and physical. A conceptual data model defines business concepts and rules and is created by business stakeholders. A logical data model defines how the system should be implemented regardless of the specific database and is created by data architects and analysts. A physical data model describes how the system will be implemented using a specific database management system and is created by database administrators and developers.
The document discusses software requirements analysis and specification. It emphasizes that identifying requirements is critical but difficult, especially for large problems. The requirements phase aims to translate user needs into a formal Software Requirements Specification (SRS) document. A good SRS is important for developing high-quality software by reducing errors and costs. It establishes agreement between users and developers.
It 302 computerized accounting (week 2) - sharifahalish sha
Here are some potential ways to represent relational databases other than using tables and relationships:
- Graph databases: Represent data as nodes, edges, and properties. Nodes represent entities, edges represent relationships between entities. Good for highly connected data.
- Document databases: Store data in flexible, JSON-like documents rather than rigid tables. Good for semi-structured or unstructured data.
- Multidimensional databases (OLAP cubes): Represent data in cubes with dimensions and measures. Good for analytical queries involving aggregation and slicing/dicing of data.
- Network/graph databases: Similar to graph databases but focus more on network properties like paths, connectivity etc. Good for social networks, recommendation systems.
-
Artifacts, Data Dictionary, Data Modeling, Data WranglingFaisal Akbar
This document discusses different data modeling concepts including artifacts, data dictionaries, and data modeling. It defines artifacts as tangible byproducts of software development that help describe functions, architecture, and design. Data dictionaries are described as databases containing metadata about the data stored in other databases, including information like field sizes and data authorization. Different types of data models are presented, including conceptual, logical, and physical models, with conceptual being most abstract and physical being database specific. The document also discusses data wrangling as the process of cleaning, structuring, and enriching raw data.
The document discusses database modeling, management, and development. It covers database design and modeling including conceptual, logical, and physical database design. It also discusses entity-relationship modeling including entities, attributes, relationships, keys, and constraints. Additionally, it covers Java database connectivity (JDBC) including the different types of JDBC drivers and how to access a database using JDBC.
The document discusses the process of analyzing client requirements for a new system. This includes gathering information from clients, clarifying needs, structuring requirements, and confirming with clients that all functional, quality, and other needs have been identified correctly and fall within the project scope. The key steps are analyzing the information gathered, documenting the requirements, and obtaining final sign-off from stakeholders to finalize the requirements document.
WBC Entity Relationship and data flow diagramsArshitSood3
This document provides information on entity relationship diagrams (ERDs) and data flow diagrams (DFDs). It defines ERDs and DFDs, lists their key components and symbols. For ERDs, it explains entities, attributes, relationships and provides examples of how to create a simple ERD. For DFDs, it defines the symbols used including external entities, processes, data stores and data flows. It also provides guidelines for developing effective diagrams.
Journal of Physics Conference SeriesPAPER • OPEN ACCESS.docxLaticiaGrissomzz
Journal of Physics: Conference Series
PAPER • OPEN ACCESS
The methodology of database design in
organization management systems
To cite this article: I L Chudinov et al 2017 J. Phys.: Conf. Ser. 803 012030
View the article online for updates and enhancements.
You may also like
The Construction of Group Financial
Management Information System
Yuan Ma
-
Identification of E-Maintenance Elements
and Indicators that Affect Maintenance
Performance of High Rise Building: A
Literature Review
Nurul Inayah Wardahni, Leni Sagita
Riantini, Yusuf Latief et al.
-
Web-Based Project Management
Information System in Construction
Projects
M R Fachrizal, J C Wibawa and Z Afifah
-
This content was downloaded from IP address 75.44.16.235 on 09/10/2022 at 19:18
https://doi.org/10.1088/1742-6596/803/1/012030
https://iopscience.iop.org/article/10.1088/1757-899X/750/1/012025
https://iopscience.iop.org/article/10.1088/1757-899X/750/1/012025
https://iopscience.iop.org/article/10.1088/1757-899X/1007/1/012021
https://iopscience.iop.org/article/10.1088/1757-899X/1007/1/012021
https://iopscience.iop.org/article/10.1088/1757-899X/1007/1/012021
https://iopscience.iop.org/article/10.1088/1757-899X/1007/1/012021
https://iopscience.iop.org/article/10.1088/1757-899X/879/1/012064
https://iopscience.iop.org/article/10.1088/1757-899X/879/1/012064
https://iopscience.iop.org/article/10.1088/1757-899X/879/1/012064
The methodology of database design in organization
management systems
I L Chudinov, V V Osipova, Y V Bobrova
Tomsk Polytechnic University, 30, Lenina ave., Tomsk, 634050, Russia
E-mail: [email protected]
Abstract. The paper describes the unified methodology of database design for management
information systems. Designing the conceptual information model for the domain area is the
most important and labor-intensive stage in database design. Basing on the proposed integrated
approach to design, the conceptual information model, the main principles of developing the
relation databases are provided and user’s information needs are considered. According to the
methodology, the process of designing the conceptual information model includes three basic
stages, which are defined in detail. Finally, the article describes the process of performing the
results of analyzing user’s information needs and the rationale for use of classifiers.
1. Introduction
Management information systems are among the most important components of information
technologies (IT), used in a company. They are usually classified by the functions into the following
systems: Manufacturing Execution Systems (MES), Human Resource Management (HRM), Enterprise
Content Management (ECM), Customer Relationship Management (CRM), etc. [1]. Such systems are
used a special structured database and are required for reengineering of the whole enterprise
management system, while the integration makes it difficult to use them. These systems are expensive
enough and particularly devel.
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...JOHNLEAK1
This document provides information about different types of data models:
1. Conceptual data models define entities, attributes, and relationships at a high level without technical details.
2. Logical data models build on conceptual models by adding more detail like data types but remain independent of specific databases.
3. Physical data models describe how the database will be implemented for a specific database system, including keys, constraints and other features.
The document discusses database fundamentals and provides an overview of key concepts including:
- The objectives of learning about database systems and their basic components
- An introduction to Entity Relationship (ER) modeling for conceptual database design
- The different types of database systems including relational, hierarchical, network, and object-oriented
- How to create a database environment using ER modeling to design the structure and relationships of data
The document discusses database design processes and concepts. It covers:
1) The objectives of database design are to create logical and physical models of the proposed database system. The logical model focuses on data requirements while the physical model translates the logical design based on hardware/software constraints.
2) Proper database design is important as it provides a blueprint for how data is stored and accessed, defines application behavior, and meets user requirements. It can also improve performance.
3) The overall workflow involves requirement analysis, database designing including logical and physical models, and implementation including testing to ensure requirements are met.
This document provides an overview of data flow diagrams (DFDs), including:
- DFDs model the flow of information through a system using four symbols: processes, data flows, external entities, and data stores.
- The case study is a video rental store that lends videos to customers, orders from a supplier, and handles overdue fees.
- There are three types of DFDs: context diagrams provide an overview, level 1 show sub-processes and data at a high level, and level 2 (and lower) provide more detailed breakdowns of individual activities.
A template defining an outline structure for the clear and unambiguous definition of the discreet data elements (tables, columns, fields) within the physical data management layers of the required data solution.
This document discusses preparing a requirements report to communicate findings from gathering business requirements to clients. It describes including an introduction describing the report's purpose and system scope, a system description using diagrams, functional requirements specifying what the system must and may do, non-functional requirements defining constraints, information domain specifying data needs, costs, benefits, and other relevant topics. Storyboards providing visual representations of website interfaces are also recommended. The purpose is to gain agreement from clients on the objectives of the proposed system.
The document discusses different types of data models including conceptual, logical, and physical models. It describes conceptual models as focusing on business significance without technical details, logical models as adding more structure and relationships from a business perspective, and physical models as depicting the actual database layout. The document also covers other data modeling techniques such as hierarchical, network, object-oriented, relational, and dimensional modeling. Dimensional modeling structures data into facts and dimensions for efficient data warehousing.
FOCUS AREA:
- Identify data requirements and goals.
- IT solution (data) design.
- Focus on data development and configuration (solutions/projects).
- Develop data standards.
- Ensure data integration.
- Ensure correct data testing.
- Maintain and optimize data solutions.
RELATION TO STRATEGY:
- Develop data solutions based on business/IT requirements
Develop data solutions and goals based on operational objectives.
- Link business KPI’s to system KPI’s.
- Ensure correct data reporting in terms of system reports, cockpits, dashboards and scorecards.
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]Usman Tariq
In this PPT, you will learn:
• About data modeling and why data models are important
• About the basic data-modeling building blocks
• What business rules are and how they influence database design
• How the major data models evolved
• About emerging alternative data models and the needs they fulfill
• How data models can be classified by their level of abstraction
Author: Carlos Coronel | Steven Morris
The document outlines the general steps in database development which include enterprise data modeling (EDM) and developing an information systems architecture (ISA). Key steps include reviewing current systems, analyzing business requirements, planning the database project, and considering how the ISA can grow and be flexible. The development process also involves conceptual and logical data modeling, physical database design, and implementation.
Why BI ?
Performance management
Identify trends
Cash flow trend
Fine-tune operations
Sales pipeline analysis
Future projections
business Forecasting
Decision Making Tools
Convert data into information
How to Think ?
What happened?
What is happening?
Why did it happen?
What will happen?
What do I want to happen?
The document discusses three types of data models: conceptual, logical, and physical. A conceptual data model defines business concepts and rules and is created by business stakeholders. A logical data model defines how the system should be implemented regardless of the specific database and is created by data architects and analysts. A physical data model describes how the system will be implemented using a specific database management system and is created by database administrators and developers.
The document discusses software requirements analysis and specification. It emphasizes that identifying requirements is critical but difficult, especially for large problems. The requirements phase aims to translate user needs into a formal Software Requirements Specification (SRS) document. A good SRS is important for developing high-quality software by reducing errors and costs. It establishes agreement between users and developers.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Kaxil Naik
Navigating today's data landscape isn't just about managing workflows; it's about strategically propelling your business forward. Apache Airflow has stood out as the benchmark in this arena, driving data orchestration forward since its early days. As we dive into the complexities of our current data-rich environment, where the sheer volume of information and its timely, accurate processing are crucial for AI and ML applications, the role of Airflow has never been more critical.
In my journey as the Senior Engineering Director and a pivotal member of Apache Airflow's Project Management Committee (PMC), I've witnessed Airflow transform data handling, making agility and insight the norm in an ever-evolving digital space. At Astronomer, our collaboration with leading AI & ML teams worldwide has not only tested but also proven Airflow's mettle in delivering data reliably and efficiently—data that now powers not just insights but core business functions.
This session is a deep dive into the essence of Airflow's success. We'll trace its evolution from a budding project to the backbone of data orchestration it is today, constantly adapting to meet the next wave of data challenges, including those brought on by Generative AI. It's this forward-thinking adaptability that keeps Airflow at the forefront of innovation, ready for whatever comes next.
The ever-growing demands of AI and ML applications have ushered in an era where sophisticated data management isn't a luxury—it's a necessity. Airflow's innate flexibility and scalability are what makes it indispensable in managing the intricate workflows of today, especially those involving Large Language Models (LLMs).
This talk isn't just a rundown of Airflow's features; it's about harnessing these capabilities to turn your data workflows into a strategic asset. Together, we'll explore how Airflow remains at the cutting edge of data orchestration, ensuring your organization is not just keeping pace but setting the pace in a data-driven future.
Session in https://budapestdata.hu/2024/04/kaxil-naik-astronomer-io/ | https://dataml24.sessionize.com/session/667627
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
DataModeling.pptx
1. Section 1 - Introduction and
basic overview
What is a Data Model?
• A graphical representation of data elements of an organization or business.
• It identifies those things about which it is important to track information (entities), the details
about those things (attributes), and the relationship between those things (relationships).
• Entity- Relationship Diagram plus Entity and Attribute data definitions
• Subject-oriented, designed in Third Normal Form
• Facilitates communication between the business users and the IT analysts
Data Modeling Terminologies:
• Entity: A thing which is recognized as being capable of an independent existence and which can be
uniquely identified. Examples: a Customer, an employee, an Account, a campaign, Branch. EDW
Example: Party, Agreement, Organization, Individual.
• Weak Entity: An entity can be a weak entity, that cannot be uniquely identified by its attributes
alone . Weak entity can be an associative entity, or a sub type entity.
Examples Associative entity: Account Party, Party Address.
Examples of sub type entity: Organization, Individual.
• Relationship: A relationship captures how two or more entities are related to one another.
Examples: ‘employee of’ is a relationship between an organization and employee.
• Attributes: Attribute describes an entity or a relationship. Entities and relationships can both have
attributes. Examples: Party Start date, Party name, Account Number.
2. Section 2a – Data Modeling & Data Mapping guidelines
Upstream
EDW Upstream Data Model – To EDW, Upstream is a combination of layers having processes, databases, jobs, and de-coupling views through which
extracted source system files are loaded from Extraction Layer to staging (or Loading) layer and then further transformed via Transform Layer to
Core EDW Layer.
Upstream Model standards:
• To be modeled and extended by keeping Teradata FSLDM as base.
• Tables to be in 3rd Normal Form as per Teradata FSLDM.
• Jobs to follow the Primary Key defined by OCBC Data Modeler.
Prerequisite:
• User requirement Or Functional Spec document, to determine the data elements to bring to EDW.
• Early data Inventory (EDI), to understand the source system which contains the required data elements.
• It has to explain the business attributes,
• primary keys of each tables,
• The relationship between the tables,
• If the source system has the files which relates to other system.
• Data Profiling documents to understand the data demographics and integrity of the new data elements
• Understanding of the existing EDW model Or Teradata FSLDM.
3. Section 2a – Data
Modeling & Data
Mapping guidelines
Upstream Contd..
Upstream Logical Data Modeling Process: Once the data
elements required to bring to EDW is finalized and the EDI
for the corresponding source system is available, then it
can follow the below process to prepare the data model.
1. Identify the data entity: Based on the EDI and data
profiling, identify the data entity corresponding to the
data element which has to bring to EDW. E.g. Customer,
account, Branch, employee, transaction, campaign,
product, collateral etc..
2. Group the attributes related to each entity, which EDW
is interested.
• Attributes describes data entities e.g.: Account (entity)
: Status, account number, account open date ..etc
3. Refer the FSDLM guidelines to map the identified
entities to OCBC FSLDM.
• As per FSLDM, Customer, Branch, Employee are Party
• Transaction is event
• Collateral is Party asset
4. Extend the OCBC FSLDM model if the required entity
doesn’t fit the existing model subject to the approval of
OCBC Data Modeler.
4. Section 2a – Data
Modeling & Data
Mapping guidelines
Upstream Contd..
Upstream Physical Data Model creation: Physical data models are derived from logical model entity
definitions and relationships and are constructed to ensure that data is stored and managed in physical
structures that operate effectively on the Teradata database platform. Below are the PDM creation
guidelines:
• Primary Index: Target tables to follow the best candidate eligible for Primary Index (Unique or Non-
Unique) knowing the access path, distribution, join criteria etc.
Note: Primary Index and Primary key are different concept.
• Partitioned Primary Index: To enhance the performance of the big size tables, PPI can be
implemented. Refer the Teradata technical documentation for more details on PPI guidelines.
• Default Values: It can be used to default the values if no value passed to the table attribute
• ETL Control Frame work Attributes: All EDW target tables (except b-key, b-map and reference
tables) will have seven control frame work attributes.
1. Start_Dt
2. End_Dt
3. Data_Source_Cd
4. Record_Deleted_Flag
5. BusinessDate
6. Ins_Txf_BatchID
7. Upd_Txf_BatchID
• Data Types Assignation: Data Types should be assigned as part of physicalisation.
• Null / Not Null Handling: Nulls and Not-Nulls should be assigned for each attribute.
5. Section 2a – Data
Modeling & Data
Mapping guidelines
Upstream Contd..
There are two Data Model deliverables in Upstream:
• T - EDW Platform Upstream Data Architecture: SDLC standard
document to document the data model, which describes all the
business data entities, attributes and how the data elements
modeled in OCBC EDW. It also should include all the model
design decisions and deviations if any. Data architecture
document should have the logical and physical model diagram
with relationships and primary keys, primary indexes and other
physical design considerations.
• T - EDW Platform Upstream Source Target Mapping: Data
architecture document will be used as the input for the source to
target mapping document. Source to Target Mapping document
captures the detailed business rules, how the source business
attributes transformed to EDW data elements. It will be used as
the reference document for the further ETL development.
6. Section 2b – Data
Modeling & Data
Mapping guidelines
Downstream
EDW Downstream Data Model: To EDW, Downstream is a combination of
layers having process, databases, tables, views and through which user
requirements are modeled and met after applying the business and
technical transformation rules on data read via customized OCBC
FSLDM.
Downstream Model standards:
• To be modelled to support user requirements and must be scalable.
• To be modelled to ensure that it’s easy to maintain and it’s tuned to
meet the SLA based on best performance.
• To be modelled keeping in mind the enterprise level reporting
(including geographies).
• To be modelled with various security attributes at most granular
level.
• Tables must not be skewed.
• Jobs to follow the Primary Key defined by OCBC Data Modeler.
• Target tables to follow the best candidate eligible for Primary Index
(Unique or Non-Unique) knowing the access path, distribution, join
criteria etc.
Prerequisite: User requirement or Functional Spec document, to determine
the data elements to bring report to users which:
• should explain the business rules to derive the reporting data
elements
• should specify the reporting level of each data element
• should have details of the reporting hierarchies
7. Section 2b – Data
Modeling & Data
Mapping guidelines
Downstream Contd..
There are two Data Model deliverables in Downstream:
• T - EDW Platform Downstream Data Architecture: SDLC standard
document to document the data model, which describes all the
business data entities, attributes and how the data elements
modeled in OCBC EDW downstream data mart to support the user
reporting requirement. It also should include all the model design
decisions and deviations if any. Data architecture document should
have the logical and physical model diagram with relationships and
primary keys, primary indexes and other physical design
considerations.
• T - EDW Platform Downstream Source Target Mapping: Data
architecture document will be used as the input for the source to
target mapping document. Source to Target Mapping document
captures the detailed business rules, how the source business
attributes transformed to EDW data elements. It will be used as the
reference document for the further ETL development.
8. Section 3a – Naming
standards and
conventions Contd…
Column name Convention: Table naming convention is as per below rule.
• Keep the column name same as of parent column in Core EDW.
• Take off the vowels i.e. a, e, i, o and u. Please do not remove the vowel if it is first letter of the word for example, “A”
in Amount. If the column name is still longer than the 30 characters please contact the architecture or the standard
and guidelines team.
• Keep the column name in mixed case (first letter in capital). For example, Data_Source_Cd instead of
DATA_SOURCE_CD or Data_SOURCE_Cd etc.
• The control columns should be brought as it is from Core EDW. And the above rules does not apply on them.
• Example:
• Like measure Count of Transaction is not part of Core EDW then
• Column Name
• Count_Of_Transactions / Transaction_Count
• Take off the vowels i.e. a, e, i, o and u.
• Column Name post this rule
• Cnt_Of_Trnsctn / Trnsctn_Cnt
• Keep the column name in mixed case (first letter in capital).
• Column Name post this rule
• Cnt_Of_Trnsctn / Trnsctn_Cnt
• The control columns should be brought as it is from Core EDW. And the above rules does not apply on them.
• Column Name post this rule
• The above column is not a control column.
• Control column like Data_Source_Cd should be brought as it is.
9. Section 4b – Best
Practices Contd…
Data Model Creation:
Below are the key steps in the data modeling
• Identify entities
• Identify attributes of entities
• Identify relationships
• Apply naming conventions
• Assign keys and indexes
• Normalize to reduce data redundancy
• Denormalize to improve performance
10. Section 5 – Check
list / Self Review
Process
Data Mapping:
• Please start the self review after the completion of data mapping document. Each and every
column in the doc should provide the useful and meaningful information. Provide the information
in the transformation rules as expected by the OCBC reviewer.
• Once the self review is done, suggest to have an internal review by Data Mapper before proceed
to the OCBC BADM team review to avoid rework / escalations.
Data Modeling:
• Check closely if all the names having correct naming conventions.
• Create the domains to provide the flexibility of changing a data type
• Create the indexes as per the OCBC standards.
• Assign all the data types to all the columns.
• Provide the table and column definitions for all the tables.
• Assign Null & Not null to all the columns.
• Create the default values, if any.
• Attach the abbreviation file with all the meaningful abbreviated tables and columns.
• Generate the DDLs from Erwin and add the database name for the datamart, send it to OCBC data
modeler for review. Once he/she approves will ask DBA to deploy it in the respective
environment.
• Make the data model presentable to the client / business users by coloring the entities and the
relationships / columns.
11. Section 6 – DOs and
DON’Ts
Please do a self review after the completion of
Data Model & Data Mapping. All the standards
should be meet as per the client expectations.
• For Data Modeling check all the data types, null,
not null fields, indexes, generate the DDLs and
pass it to the OCBC DM get the approval and
he/she will pass it to DBA to create the table
structures.
• For Data Mapping, all the columns should be
filled with proper formatting, coloring, if any,
font, size, uppercase, lowercase, etc.
• Every page of mapping document should be
updated by the mapping name.
• Extraction criteria should be clearly
mentioned.