The document discusses components of data warehousing including data extraction, transformation, metadata, data warehouse databases, access tools, data marts, and administration. It describes building a data warehouse by considering business needs, organizational issues, and approaches like top-down or bottom-up. Key components are sourcing data from operational systems, cleaning and transforming it, loading it into a data warehouse database, and providing access tools for analysis and reporting. Metadata is also an important part of the data warehousing system.
The document provides an overview of the key components and considerations for building a data warehouse. It discusses 7 main components: 1) the data warehouse database, 2) sourcing, acquisition, cleanup and transformation tools, 3) metadata, 4) access (query) tools, 5) data marts, 6) data warehouse administration and management, and 7) information delivery systems. It also outlines important design considerations, technical considerations, and implementation considerations that must be addressed when building a data warehouse environment.
The document provides an overview of the key components of a data warehouse, including:
1) The source data component which sources data from operational systems, internal/archived data, and external sources.
2) The data staging component which performs ETL (extraction, transformation, and loading) of data including cleaning, standardizing, and loading the data.
3) The data storage component which stores historical data from various sources in a separate repository with structures suitable for analysis.
4) The information delivery component which provides reports, complex queries, OLAP analysis, and data to applications like EIS and data mining tools.
5) The metadata component which contains operational, extraction/transformation, and end
The document discusses building a data warehouse. It defines a data warehouse as a subject-oriented, integrated, time-variant and non-volatile collection of data used for decision making. It describes the components of a data warehouse including staging, data warehouse database, transformation tools, metadata, data marts, access tools and administration. It also discusses approaches to building a data warehouse, design considerations, implementation steps, extraction/transformation tools, and user levels. The benefits of a data warehouse include locating the right information, presentation of information, testing hypotheses, discovery of information, and sharing analysis.
The document discusses building a data warehouse, including approaches and design considerations. It describes a top-down approach to build an enterprise data warehouse as a centralized repository, while a bottom-up approach builds departmental data marts incrementally. Successful data warehouses are based on a dimensional model, contain both historical and current integrated data at detailed and summarized levels from multiple sources.
This document discusses building a data warehouse. It defines key components of a data warehouse including the data warehouse database, transformation tools, metadata, access tools, and data marts. It describes two common approaches to building a data warehouse - top-down and bottom-up. Top-down involves building a centralized data warehouse first while bottom-up involves building departmental data marts initially. The document also outlines considerations for designing, implementing, and accessing a data warehouse.
This document provides an overview of data warehousing concepts. It defines a data warehouse as a collection of data marts representing historical data from different company operations. It discusses the top-down and bottom-up approaches to building a data warehouse, as well as considerations for data warehouse design including data content, metadata, data distribution, and tools. Finally, it briefly describes different architectures for mapping a data warehouse to a multiprocessor system, including shared memory, shared disk, and shared nothing architectures.
This document outlines the objectives and units of study for a course on data warehousing and mining. The 5 units cover: 1) data warehousing components and architecture; 2) business analysis tools; 3) data mining tasks and techniques; 4) association rule mining and classification; and 5) clustering applications and trends in data mining. Key topics include extracting, transforming, and loading data into a data warehouse; using metadata and query/reporting tools; building dependent data marts; and applying data mining techniques like classification, clustering, and association rule mining. The course aims to introduce these concepts and their real-world implications.
1. The document discusses data warehousing and data mining. Data warehousing involves collecting and integrating data from multiple sources to support analysis and decision making. Data mining involves analyzing large datasets to discover patterns.
2. Web mining is discussed as a type of data mining that analyzes web data. There are three domains of web mining: web content mining, web structure mining, and web usage mining. Common techniques for web mining include clustering, association rules, path analysis, and sequential patterns.
3. Web mining has benefits like addressing ineffective search engines and monitoring user visit habits to improve website design. Data warehousing and data mining can provide useful business intelligence when the right analysis techniques are applied to large amounts of integrated
The document provides an overview of the key components and considerations for building a data warehouse. It discusses 7 main components: 1) the data warehouse database, 2) sourcing, acquisition, cleanup and transformation tools, 3) metadata, 4) access (query) tools, 5) data marts, 6) data warehouse administration and management, and 7) information delivery systems. It also outlines important design considerations, technical considerations, and implementation considerations that must be addressed when building a data warehouse environment.
The document provides an overview of the key components of a data warehouse, including:
1) The source data component which sources data from operational systems, internal/archived data, and external sources.
2) The data staging component which performs ETL (extraction, transformation, and loading) of data including cleaning, standardizing, and loading the data.
3) The data storage component which stores historical data from various sources in a separate repository with structures suitable for analysis.
4) The information delivery component which provides reports, complex queries, OLAP analysis, and data to applications like EIS and data mining tools.
5) The metadata component which contains operational, extraction/transformation, and end
The document discusses building a data warehouse. It defines a data warehouse as a subject-oriented, integrated, time-variant and non-volatile collection of data used for decision making. It describes the components of a data warehouse including staging, data warehouse database, transformation tools, metadata, data marts, access tools and administration. It also discusses approaches to building a data warehouse, design considerations, implementation steps, extraction/transformation tools, and user levels. The benefits of a data warehouse include locating the right information, presentation of information, testing hypotheses, discovery of information, and sharing analysis.
The document discusses building a data warehouse, including approaches and design considerations. It describes a top-down approach to build an enterprise data warehouse as a centralized repository, while a bottom-up approach builds departmental data marts incrementally. Successful data warehouses are based on a dimensional model, contain both historical and current integrated data at detailed and summarized levels from multiple sources.
This document discusses building a data warehouse. It defines key components of a data warehouse including the data warehouse database, transformation tools, metadata, access tools, and data marts. It describes two common approaches to building a data warehouse - top-down and bottom-up. Top-down involves building a centralized data warehouse first while bottom-up involves building departmental data marts initially. The document also outlines considerations for designing, implementing, and accessing a data warehouse.
This document provides an overview of data warehousing concepts. It defines a data warehouse as a collection of data marts representing historical data from different company operations. It discusses the top-down and bottom-up approaches to building a data warehouse, as well as considerations for data warehouse design including data content, metadata, data distribution, and tools. Finally, it briefly describes different architectures for mapping a data warehouse to a multiprocessor system, including shared memory, shared disk, and shared nothing architectures.
This document outlines the objectives and units of study for a course on data warehousing and mining. The 5 units cover: 1) data warehousing components and architecture; 2) business analysis tools; 3) data mining tasks and techniques; 4) association rule mining and classification; and 5) clustering applications and trends in data mining. Key topics include extracting, transforming, and loading data into a data warehouse; using metadata and query/reporting tools; building dependent data marts; and applying data mining techniques like classification, clustering, and association rule mining. The course aims to introduce these concepts and their real-world implications.
1. The document discusses data warehousing and data mining. Data warehousing involves collecting and integrating data from multiple sources to support analysis and decision making. Data mining involves analyzing large datasets to discover patterns.
2. Web mining is discussed as a type of data mining that analyzes web data. There are three domains of web mining: web content mining, web structure mining, and web usage mining. Common techniques for web mining include clustering, association rules, path analysis, and sequential patterns.
3. Web mining has benefits like addressing ineffective search engines and monitoring user visit habits to improve website design. Data warehousing and data mining can provide useful business intelligence when the right analysis techniques are applied to large amounts of integrated
The document discusses databases versus data warehousing. It notes that databases are for operational purposes like storage and retrieval for applications, while data warehouses are used for informational purposes like business reporting and analysis. A data warehouse contains integrated, subject-oriented data from multiple sources that is used to support management decisions.
The document provides information about data warehousing including definitions, how it works, types of data warehouses, components, architecture, and the ETL process. Some key points:
- A data warehouse is a system for collecting and managing data from multiple sources to support analysis and decision-making. It contains historical, integrated data organized around important subjects.
- Data flows into a data warehouse from transaction systems and databases. It is processed, transformed, and loaded so users can access it through BI tools. This allows organizations to analyze customers and data more holistically.
- The main components of a data warehouse are the load manager, warehouse manager, query manager, and end-user access tools. The ETL process
This document provides information about a course on data warehousing and data mining, including:
1. It outlines the course syllabus which covers the basics of data warehousing, data preprocessing, association rules, classification and clustering, and recent trends in data mining.
2. It describes the 5 units that make up the course, including an overview of the topics covered in each unit such as data warehouse architecture, data integration, decision trees, and applications of data mining.
3. It lists two textbooks and four references that will be used for the course.
Data Mining is defined as extracting information from huge sets of data. In other words, we can say that data mining is the procedure of mining knowledge from data.
According to Inmon, a data warehouse is a subject oriented,
integrated, time-variant, and non-volatile collection of data. He defined the terms
in the sentence as follows:
this is the ppt this contains definition of data ware house , data , ware house, data modeling , data warehouse architecture and its type , data warehouse types, single tire, two tire, three tire .
This document discusses key concepts in data warehousing and modeling. It describes a multitier architecture for data warehousing consisting of a bottom tier warehouse database, middle tier OLAP server, and top tier front-end client tools. It also discusses different data warehouse models including enterprise warehouses, data marts, and virtual warehouses. The document outlines the extraction, transformation, and loading process used to populate data warehouses and the role of metadata repositories.
This document provides a comprehensive overview of data warehousing analysis. It discusses the evolution of data warehousing from traditional ways of managing historical data in legacy systems and on desktops. Key characteristics of a data warehouse are identified, including being an organized, extensible environment for analyzing non-volatile data over time from various sources. Emerging technologies like falling hardware costs and increasing processing power enabled faster analysis of large datasets and the development of data warehousing.
A data warehouse is a central repository of historical data from an organization's various sources designed for analysis and reporting. It contains integrated data from multiple systems optimized for querying and analysis rather than transactions. Data is extracted, cleaned, and loaded from operational sources into the data warehouse periodically. The data warehouse uses a dimensional model to organize data into facts and dimensions for intuitive analysis and is optimized for reporting rather than transaction processing like operational databases. Data warehousing emerged to meet the growing demand for analysis that operational systems could not support due to impacts on performance and limitations in reporting capabilities.
Implementation of Data Marts in Data ware houseIJARIIT
A data mart is a persistent physical store of operational and aggregated data statistically processed data that supports businesspeople in making decisions based primarily on analyses of past activities and results. A data mart contains a predefined subset of enterprise data organized for rapid analysis and reporting. Data warehousing has come into being because the file structure of the large mainframe core business systems is inimical to information retrieval. The purpose of the data warehouse is to combine core business and data from other sources in a format that facilitates reporting and decision support. In just a few years, data warehouses have evolved from large, centralized data repositories to subject specific, but independent, data marts and now to dependent marts that load data from a central repository of Data Staging files that has previously extracted data from the institution’s operational business systems (e.g., student record, finance and human resource systems, etc.).
- A data warehouse is a central repository for an organization's historical data that is used to support management reporting and decision making. It contains data from multiple sources integrated into a consistent structure.
- Data warehouses are optimized for querying and analysis rather than transactions. They use a dimensional model and denormalized structures to improve query performance for business users.
- There are two main approaches to data warehouse design - the dimensional model advocated by Kimball and the normalized model advocated by Inmon. Both have advantages and disadvantages for query performance and ease of use.
Data Warehouse – Introduction, characteristics, architecture, scheme and modelling, Differences between operational database systems and data warehouse.
The document discusses two common data warehouse architectures: independent data marts and a three-layer approach. With independent data marts, data is extracted from source systems into separate data marts, each with their own ETL process. This can result in redundant work and inconsistent data across marts. The three-layer approach includes an enterprise data warehouse, operational data store, and dependent data marts filled from the warehouse, allowing for consistent, consolidated data and easier analysis across subjects.
A data warehouse consists of several key components:
- Current detail data from operational systems of record which is stored for analysis.
- Integration and transformation programs that convert operational data into a common format for the data warehouse.
- Summarized and archived data used for reporting and analysis over time.
- Metadata that describes the structure and meaning of the data.
Data warehouses are used for standard reporting, queries on summarized data, and data mining of patterns in large datasets to gain business insights.
This document provides an overview of business intelligence, data warehousing, data marts, and data mining presented by Mr. Manish Tripathi. It defines business intelligence as a process for analyzing data to help business decisions. Data warehousing is described as a centralized repository for storing historical data from various sources to support analysis and reporting. Data marts are subsets of data warehouses focused on specific business units or teams. Common business intelligence tools and the benefits of these systems are also summarized.
The document discusses key concepts in data warehousing including:
1) The distinction between data and information, with data becoming valuable when organized and presented as information for decision making.
2) Characteristics of a data warehouse including being subject-oriented, integrated, non-volatile, time-variant, and accessible to end-users.
3) Differences between operational data and data warehouse data including the data warehouse being subject-oriented, summarized over time, and serving managerial communities rather than transactional needs.
The document provides an overview of database management systems including data warehousing, data mining, data definition language, data control language, and data manipulation language. It defines each concept and provides examples. For data warehousing, it describes the purpose, components, architecture, evolution of use, advantages, and disadvantages. For data mining, it discusses the introduction, definition, goal, process, tools, and advantages/disadvantages. It also explains the CREATE, ALTER, DROP statements for data definition language, the GRANT and REVOKE commands for data control language, and the INSERT, SELECT, UPDATE, DELETE commands for data manipulation language.
A data warehouse is a pool of data structured to support decision making. It integrates data from multiple sources and is time-variant and nonvolatile. Data warehouses can take the form of enterprise data warehouses, used across an organization for decision support, or data marts designed for a specific department. The data warehousing process involves extracting data from sources, transforming and loading it into a comprehensive database, and using middleware tools and metadata. Real-time data warehousing allows for information-based decision making using up-to-date data.
Data Warehousing is a topic on Management of Information Technology that would help students on their subject matter and as reference for their assigned report.
ETL processes , Datawarehouse and Datamarts.pptxParnalSatle
The document discusses ETL processes, data warehousing, and data marts. It defines ETL as extracting data from source systems, transforming it, and loading it into a data warehouse. Data warehouses integrate data from multiple sources to support business intelligence and analytics. Data marts are focused subsets of data warehouses that serve specific business functions or departments. The document outlines the key components and architecture of data warehousing systems, including source data, data staging, data storage in warehouses and marts, and analytical applications.
This document discusses different types of reasoning under uncertainty. It describes symbolic reasoning as deriving conclusions from given properties or methodologies. Logical reasoning uses rules of inference to draw conclusions from premises. Formal logic studies symbolic representations of logical inference using formal systems with axioms and inference rules. Mathematical reasoning is monotonic, meaning additional premises don't invalidate conclusions, while human reasoning is non-monotonic since new facts can retract tentative conclusions. Non-monotonic reasoning is useful for representing defaults that can be overridden by exceptions. AI systems must be able to reason under uncertain, imperfect knowledge.
The document discusses knowledge representation in artificial intelligence. It covers several key issues in knowledge representation including important attributes, relationships among attributes, choosing an appropriate level of granularity, representing sets of objects, and finding the right knowledge structure. It also discusses different levels of knowledge-based agents including the knowledge, logical, and implementation levels. Finally, it defines knowledge representation and discusses what types of knowledge should be represented, including objects, events, performance, meta-knowledge, and facts. It identifies two main types of knowledge: declarative knowledge, which represents facts and concepts, and procedural knowledge, which represents how to perform tasks and achieve goals.
The document discusses databases versus data warehousing. It notes that databases are for operational purposes like storage and retrieval for applications, while data warehouses are used for informational purposes like business reporting and analysis. A data warehouse contains integrated, subject-oriented data from multiple sources that is used to support management decisions.
The document provides information about data warehousing including definitions, how it works, types of data warehouses, components, architecture, and the ETL process. Some key points:
- A data warehouse is a system for collecting and managing data from multiple sources to support analysis and decision-making. It contains historical, integrated data organized around important subjects.
- Data flows into a data warehouse from transaction systems and databases. It is processed, transformed, and loaded so users can access it through BI tools. This allows organizations to analyze customers and data more holistically.
- The main components of a data warehouse are the load manager, warehouse manager, query manager, and end-user access tools. The ETL process
This document provides information about a course on data warehousing and data mining, including:
1. It outlines the course syllabus which covers the basics of data warehousing, data preprocessing, association rules, classification and clustering, and recent trends in data mining.
2. It describes the 5 units that make up the course, including an overview of the topics covered in each unit such as data warehouse architecture, data integration, decision trees, and applications of data mining.
3. It lists two textbooks and four references that will be used for the course.
Data Mining is defined as extracting information from huge sets of data. In other words, we can say that data mining is the procedure of mining knowledge from data.
According to Inmon, a data warehouse is a subject oriented,
integrated, time-variant, and non-volatile collection of data. He defined the terms
in the sentence as follows:
this is the ppt this contains definition of data ware house , data , ware house, data modeling , data warehouse architecture and its type , data warehouse types, single tire, two tire, three tire .
This document discusses key concepts in data warehousing and modeling. It describes a multitier architecture for data warehousing consisting of a bottom tier warehouse database, middle tier OLAP server, and top tier front-end client tools. It also discusses different data warehouse models including enterprise warehouses, data marts, and virtual warehouses. The document outlines the extraction, transformation, and loading process used to populate data warehouses and the role of metadata repositories.
This document provides a comprehensive overview of data warehousing analysis. It discusses the evolution of data warehousing from traditional ways of managing historical data in legacy systems and on desktops. Key characteristics of a data warehouse are identified, including being an organized, extensible environment for analyzing non-volatile data over time from various sources. Emerging technologies like falling hardware costs and increasing processing power enabled faster analysis of large datasets and the development of data warehousing.
A data warehouse is a central repository of historical data from an organization's various sources designed for analysis and reporting. It contains integrated data from multiple systems optimized for querying and analysis rather than transactions. Data is extracted, cleaned, and loaded from operational sources into the data warehouse periodically. The data warehouse uses a dimensional model to organize data into facts and dimensions for intuitive analysis and is optimized for reporting rather than transaction processing like operational databases. Data warehousing emerged to meet the growing demand for analysis that operational systems could not support due to impacts on performance and limitations in reporting capabilities.
Implementation of Data Marts in Data ware houseIJARIIT
A data mart is a persistent physical store of operational and aggregated data statistically processed data that supports businesspeople in making decisions based primarily on analyses of past activities and results. A data mart contains a predefined subset of enterprise data organized for rapid analysis and reporting. Data warehousing has come into being because the file structure of the large mainframe core business systems is inimical to information retrieval. The purpose of the data warehouse is to combine core business and data from other sources in a format that facilitates reporting and decision support. In just a few years, data warehouses have evolved from large, centralized data repositories to subject specific, but independent, data marts and now to dependent marts that load data from a central repository of Data Staging files that has previously extracted data from the institution’s operational business systems (e.g., student record, finance and human resource systems, etc.).
- A data warehouse is a central repository for an organization's historical data that is used to support management reporting and decision making. It contains data from multiple sources integrated into a consistent structure.
- Data warehouses are optimized for querying and analysis rather than transactions. They use a dimensional model and denormalized structures to improve query performance for business users.
- There are two main approaches to data warehouse design - the dimensional model advocated by Kimball and the normalized model advocated by Inmon. Both have advantages and disadvantages for query performance and ease of use.
Data Warehouse – Introduction, characteristics, architecture, scheme and modelling, Differences between operational database systems and data warehouse.
The document discusses two common data warehouse architectures: independent data marts and a three-layer approach. With independent data marts, data is extracted from source systems into separate data marts, each with their own ETL process. This can result in redundant work and inconsistent data across marts. The three-layer approach includes an enterprise data warehouse, operational data store, and dependent data marts filled from the warehouse, allowing for consistent, consolidated data and easier analysis across subjects.
A data warehouse consists of several key components:
- Current detail data from operational systems of record which is stored for analysis.
- Integration and transformation programs that convert operational data into a common format for the data warehouse.
- Summarized and archived data used for reporting and analysis over time.
- Metadata that describes the structure and meaning of the data.
Data warehouses are used for standard reporting, queries on summarized data, and data mining of patterns in large datasets to gain business insights.
This document provides an overview of business intelligence, data warehousing, data marts, and data mining presented by Mr. Manish Tripathi. It defines business intelligence as a process for analyzing data to help business decisions. Data warehousing is described as a centralized repository for storing historical data from various sources to support analysis and reporting. Data marts are subsets of data warehouses focused on specific business units or teams. Common business intelligence tools and the benefits of these systems are also summarized.
The document discusses key concepts in data warehousing including:
1) The distinction between data and information, with data becoming valuable when organized and presented as information for decision making.
2) Characteristics of a data warehouse including being subject-oriented, integrated, non-volatile, time-variant, and accessible to end-users.
3) Differences between operational data and data warehouse data including the data warehouse being subject-oriented, summarized over time, and serving managerial communities rather than transactional needs.
The document provides an overview of database management systems including data warehousing, data mining, data definition language, data control language, and data manipulation language. It defines each concept and provides examples. For data warehousing, it describes the purpose, components, architecture, evolution of use, advantages, and disadvantages. For data mining, it discusses the introduction, definition, goal, process, tools, and advantages/disadvantages. It also explains the CREATE, ALTER, DROP statements for data definition language, the GRANT and REVOKE commands for data control language, and the INSERT, SELECT, UPDATE, DELETE commands for data manipulation language.
A data warehouse is a pool of data structured to support decision making. It integrates data from multiple sources and is time-variant and nonvolatile. Data warehouses can take the form of enterprise data warehouses, used across an organization for decision support, or data marts designed for a specific department. The data warehousing process involves extracting data from sources, transforming and loading it into a comprehensive database, and using middleware tools and metadata. Real-time data warehousing allows for information-based decision making using up-to-date data.
Data Warehousing is a topic on Management of Information Technology that would help students on their subject matter and as reference for their assigned report.
ETL processes , Datawarehouse and Datamarts.pptxParnalSatle
The document discusses ETL processes, data warehousing, and data marts. It defines ETL as extracting data from source systems, transforming it, and loading it into a data warehouse. Data warehouses integrate data from multiple sources to support business intelligence and analytics. Data marts are focused subsets of data warehouses that serve specific business functions or departments. The document outlines the key components and architecture of data warehousing systems, including source data, data staging, data storage in warehouses and marts, and analytical applications.
This document discusses different types of reasoning under uncertainty. It describes symbolic reasoning as deriving conclusions from given properties or methodologies. Logical reasoning uses rules of inference to draw conclusions from premises. Formal logic studies symbolic representations of logical inference using formal systems with axioms and inference rules. Mathematical reasoning is monotonic, meaning additional premises don't invalidate conclusions, while human reasoning is non-monotonic since new facts can retract tentative conclusions. Non-monotonic reasoning is useful for representing defaults that can be overridden by exceptions. AI systems must be able to reason under uncertain, imperfect knowledge.
The document discusses knowledge representation in artificial intelligence. It covers several key issues in knowledge representation including important attributes, relationships among attributes, choosing an appropriate level of granularity, representing sets of objects, and finding the right knowledge structure. It also discusses different levels of knowledge-based agents including the knowledge, logical, and implementation levels. Finally, it defines knowledge representation and discusses what types of knowledge should be represented, including objects, events, performance, meta-knowledge, and facts. It identifies two main types of knowledge: declarative knowledge, which represents facts and concepts, and procedural knowledge, which represents how to perform tasks and achieve goals.
The document discusses inference in first-order logic. It covers universal instantiation and existential instantiation as inference rules. It also discusses reducing first-order logic knowledge bases to propositional logic by instantiating quantified sentences. Unification is introduced as a way to efficiently perform inferences by matching logical statements. Forward chaining and backward chaining algorithms for automated inference are described. Resolution is also briefly discussed as a complete inference procedure for first-order logic.
The document provides an introduction to artificial intelligence, including definitions, goals, and applications of AI. It discusses key concepts such as intelligent systems, the history of AI, foundations of AI, and components of AI systems. Examples are given throughout such as chess-playing programs, self-driving cars, and chatbots like Eliza. The document also summarizes an approach for developing an AI to play the game Tic-Tac-Toe.
This document discusses several applications of artificial intelligence including the metaverse, security and surveillance, no-code AI, and top applications of AI such as movies and entertainment, improved learning, tracking customer satisfaction, and software testing. The metaverse uses AI, augmented reality, virtual reality, and blockchain to create virtual worlds, while AI enhances security through face and voice recognition. No-code AI allows automation with minimal coding.
The document contains instructions to write Prolog programs to solve various problems involving lists, searching, arithmetic, puzzles, and more. Specifically, it includes tasks like counting list elements, reversing lists, solving the 8 queens problem, water jug problem, 8 puzzle, monkey banana problem, family relationships, and operations like insertion, deletion, replacement in lists. It also includes instructions to implement depth first search, cut operator, arithmetic operations, and case studies using Prolog programming.
This document summarizes key points from a lecture on software engineering given by Dr. Thomas E. Potok at the University of Tennessee. It discusses using COCOMO models to estimate software project effort and duration based on lines of code and cost drivers. It also covers using PERT charts and estimates to model project schedules, determine critical paths, and calculate completion probabilities.
This document provides information about several historical monuments built during the Delhi Sultanate period, including the Quwwat-ul-islam mosque built by Muhammad bin Sam in 1193, the Qutub Minar built by Qutb-ud-din Aibak and later expanded by Iltutmish in 1192, and the Red Fort built by Shah Jahan in 1546, all located in Delhi. It also mentions the Taj Mahal built by Shah Jahan in 1632, the Jama Masjid built by Shah Jahan in 1656, and the Tughlaqabad Fort built by Ghiyasuddin Tughlaq in 1321, also located in Delhi.
The document discusses various concepts related to memory management, parameters, scopes, and control structures in programming. It covers the mark and sweep algorithm for garbage collection, explicit and implicit dynamic memory allocation, short-circuit evaluation, formal vs actual parameters, dynamic scoping, implementing blocks, pass by value vs pass by result, the end pointer and its purpose, arithmetic vs relational vs boolean expressions, overloaded operators, primitive data types, attribute grammars, stack and dynamic variables, nested and overloaded subprograms, and generic methods.
artificial intelligence and data science contents.pptxGauravCar
What is artificial intelligence? Artificial intelligence is the ability of a computer or computer-controlled robot to perform tasks that are commonly associated with the intellectual processes characteristic of humans, such as the ability to reason.
› ...
Artificial intelligence (AI) | Definitio
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
Rainfall intensity duration frequency curve statistical analysis and modeling...bijceesjournal
Using data from 41 years in Patna’ India’ the study’s goal is to analyze the trends of how often it rains on a weekly, seasonal, and annual basis (1981−2020). First, utilizing the intensity-duration-frequency (IDF) curve and the relationship by statistically analyzing rainfall’ the historical rainfall data set for Patna’ India’ during a 41 year period (1981−2020), was evaluated for its quality. Changes in the hydrologic cycle as a result of increased greenhouse gas emissions are expected to induce variations in the intensity, length, and frequency of precipitation events. One strategy to lessen vulnerability is to quantify probable changes and adapt to them. Techniques such as log-normal, normal, and Gumbel are used (EV-I). Distributions were created with durations of 1, 2, 3, 6, and 24 h and return times of 2, 5, 10, 25, and 100 years. There were also mathematical correlations discovered between rainfall and recurrence interval.
Findings: Based on findings, the Gumbel approach produced the highest intensity values, whereas the other approaches produced values that were close to each other. The data indicates that 461.9 mm of rain fell during the monsoon season’s 301st week. However, it was found that the 29th week had the greatest average rainfall, 92.6 mm. With 952.6 mm on average, the monsoon season saw the highest rainfall. Calculations revealed that the yearly rainfall averaged 1171.1 mm. Using Weibull’s method, the study was subsequently expanded to examine rainfall distribution at different recurrence intervals of 2, 5, 10, and 25 years. Rainfall and recurrence interval mathematical correlations were also developed. Further regression analysis revealed that short wave irrigation, wind direction, wind speed, pressure, relative humidity, and temperature all had a substantial influence on rainfall.
Originality and value: The results of the rainfall IDF curves can provide useful information to policymakers in making appropriate decisions in managing and minimizing floods in the study area.
Design and optimization of ion propulsion dronebjmsejournal
Electric propulsion technology is widely used in many kinds of vehicles in recent years, and aircrafts are no exception. Technically, UAVs are electrically propelled but tend to produce a significant amount of noise and vibrations. Ion propulsion technology for drones is a potential solution to this problem. Ion propulsion technology is proven to be feasible in the earth’s atmosphere. The study presented in this article shows the design of EHD thrusters and power supply for ion propulsion drones along with performance optimization of high-voltage power supply for endurance in earth’s atmosphere.
An improved modulation technique suitable for a three level flying capacitor ...IJECEIAES
This research paper introduces an innovative modulation technique for controlling a 3-level flying capacitor multilevel inverter (FCMLI), aiming to streamline the modulation process in contrast to conventional methods. The proposed
simplified modulation technique paves the way for more straightforward and
efficient control of multilevel inverters, enabling their widespread adoption and
integration into modern power electronic systems. Through the amalgamation of
sinusoidal pulse width modulation (SPWM) with a high-frequency square wave
pulse, this controlling technique attains energy equilibrium across the coupling
capacitor. The modulation scheme incorporates a simplified switching pattern
and a decreased count of voltage references, thereby simplifying the control
algorithm.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
1. 1
IT6702 - DATA WAREHOUSING AND
DATA MINING
UNIT – I DATA WAREHOUSING
Data warehousing Components –Building a Data warehouse –
Mapping the Data Warehouse to a Multiprocessor Architecture
– DBMS Schemas for Decision Support – Data Extraction,
Cleanup, and Transformation Tools –Metadata.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
3. 3
What is Data Warehousing
Data Warehousing is an architectural construct of
information systems that provides users with current and historical
decision support information that is hard to access or present in
traditional operational data stores
The need for data warehousing
•Business perspective
–In order to survive and succeed in today’s highly
competitive global environment
•Decisions need to be made quickly and correctly
•The amount of data doubles every 18 months, which affects
response time and the sheer ability to comprehend its content
•Rapid changes
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
4. 4
Business Problem Definition
Providing the organizations with a sustainable competitive
Advantage
• Customer retention
• Sales and customer service
• Marketing
• Risk assessment and fraud detection
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
5. 5
Business problem and data warehousing
Classified into
Retrospective analysis:
Focuses on the issues of past and present events.
Predictive analysis:
Focuses on certain events or behavior based on historical
information. Further classified into
Classification:
Used to classify database records into a number of
predefined classes based on certain criteria.
Clustering:
Used to segment a database into subsets or clusters based on
a set of attributes
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
6. 6
Association
It identify affinities among the collection as reflected in the
examined records.
Sequencing
This techniques helps identify patterns over time, thus
allowing , for example, an analysis of customers purchase during
separate visits.
Operational and Informational Data Store
Operational Data
Focusing on transactional function such as bank card
withdrawals and deposits
•Detailed
•Updateable
•Reflects current
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
7. 7
Informational Data
Informational data, is organized around subjects such as
customer, vendor, and product. What is the total sales today?.
Focusing on providing answers to problems posed by decision
makers
• Summarized
•Nonupdateable
Operational data store.
An operational data store (ODS) is an architectural concept
to support day-to-day operational decision support and constrains
current value data propagated from operational applications.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
8. 8
A data warehouse is a subject-oriented, integrated,
nonvolatile, time-variant collection of data in support of
management's decisions. [WH Inmon]
Subject Oriented
Data warehouses are designed to help to analyze the data.
Integrated
The data in the data warehouse is loaded from different
sources that store the data in different formats and focus on
different aspects of the subject.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
9. 9
Nonvolatile
Nonvolatile means that, once entered into the warehouse,
data should not change.
Time Variant
Provides information from historical perspective
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
10. 10
Seven data warehouse components
• Data sourcing, cleanup, transformation, and migration tools
• Metadata repository
• Warehouse/database technology
• Data marts
• Data query, reporting, analysis, and mining tools
• Data warehouse administration and management
• Information delivery system
Data Warehouse Architecture
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
13. 13
Data Warehousing Components
Operational data and processing is completely separate
form data warehouse processing.
Data Warehouse Database
It is an important concept (Marked as 2 in the diagram) in
the Warehouse environment.
In additional to transaction operation such as ad hoc query
processing, and the need for flexible user view creation
including aggregation, multiple joins, and drill-down.
• Parallel relational database designs that require a parallel
computing platform.
• Using new index structures to speed up a traditional RDBMS.
• Multidimensional database (MDDBS) that are based on
proprietary database technology or implemented using already
familiar RDBMS.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
14. 14
Sourcing, Acquisition, Cleaning, and Transformation tools
• Removing unwanted data from operational database
• Converting to common data names and definitions
• Calculating summarizes and derived data.
• Establishing default for missing data.
• Accommodating source data definition changes.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
15. 15
Metadata
data about data
Used for building, maintaining, and using the data warehouse
Classified into
Technical metadata
Information about data sources
• Transformation, descriptions, i.e., the mapping methods from
operational databases into the warehouse and algorithms used to
convert, enhance or transform data.
• Warehouse objects and data structure definitions for data targets.
• The rules used to perform data cleanup and data enhancement.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
16. 16
• Data mapping operations when capturing data from source
systems and applying to the target warehouse database.
• Access authorization, backup history, archive history,
information delivery history, data acquition history, data access
etc.,
Business metadata
Gives perspective of the information stored in the data warehouse
• Subject areas and information object type, including queries,
reports, images, video, and / or audio clips.
• Internet home pages.
• Other information to support all data warehouse components.
• Data warehouse operational information e.g., data history,
ownership, extract, audit trail, usage data.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
17. 17
Access Tools
The tools divided into five main groups.
• Data query and reporting tools
• Application development tools
• Executive information system (EIS) tools
• On-line analytical processing tools
• Data mining tools
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
18. 18
Query and reporting tools
This category can be further divided into two groups.
• Reporting tools
• Managed query tools
Managed query tools shield end users from the complexities
of SAL and database structures by inserting a metalayer between
users and the database
Applications
Applications developed using a language for the users
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
19. 19
OLAP
Based on the concepts of multidimensional database
Data mining
To discovery meaningful new correlations, patterns, and
trends by digging into (mining) large amount of data stored in
warehouse using artificial-intelligence (AI) and statistical and
mathematical techniques
Discover knowledge. The goal of knowledge discovery is to
determine the following things.
• Segmentation
• Classification
• Association
• Preferencing
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
20. 20
Visualize data. Prior to any analysis, the goal is to “humanize” the
mass of data they must deal with and find a clever way to display
the data.
Correct data. While consolidating massive database may enterprise
find that the data is not complete and invariably contains erroneous
and contradictory information. Data mining techniques can help
identify and correct problems in the most consistent way possible.
Data visualization
Presenting the output of all the previously mentioned tools
Colors, shapes, 3-D images, sound, and virtual reality
Data Marts
Data store that is subsidiary to data warehouse
It is partition of data that is created for the use of dedicated
group of users
Placed on the data warehouse database rather than placing it as
separate store of data.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
21. 21
In most instance, the data mart is physically separate store of
data and is normally resident on separate database server
Data Warehouse administration and Management
• Managing data warehouse includes
• Security and priority management
• Monitoring updates form multiple sources
• Data quality checks
• Managing and updating metadata
• Auditing and reporting data warehouse usage and status
• Replicating, sub setting, and distributing data
• Backup and recover
• Data warehouse storage management
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
22. 22
Information delivery system
The information delivery system distributes warehouse
stored data and other information objects to other data
warehouse and end-user products such as spread sheets and
local databases.
Delivery of information may be based on time of day, or a
completion of an external event.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
23. 2. Building a Data Warehouse
23
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
24. 24
Business considerations
Return on Investment
Approach
The information scope of the data warehouse varies with the business
requirements, business priorities, and magnitude of the problem
Two data warehouses
Marketing
Personnel
• The top-down approach
Building an enterprise data warehouse with subset data marts.
• The bottom-up approach
Resulted in developing individual data marts, which are then
integrated into the enterprise data warehouse.
Building a Data Warehouse
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
25. 25
Organizational issues
A data warehouse implementation is not truly a technological
issue; rather, it should be more concerned with identifying and
establishing information requirements, the data sources fulfill these
requirements, and timeliness.
Design considerations
A data Warehouse’s design point is to consolidate from
multiple, often heterogeneous sources into a query database. The
main factors include
• Heterogeneity of data sources, which affects data conversion,
quality, timeliness
• Use of historical data, which implies that data may be “old”.
• Tendency of databases to grow very large
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
26. 26
Data content
A data warehouse may contain details data, but the data is
cleaned up and transformed to fit the warehouse model, and
certain transactional attributes of the data are filtered out.
Metadata
A data warehouse design should ensure that there is
mechanism that populates and maintains the metadata
repository, and that all access paths to the data warehouse
have metadata as an entry point.
Data distribution
One of the challenges when designing a data warehouse is
to know how the data should be divided across multiple
servers and which users should get access to which types of
data.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
27. 27
The data placement and distribution design should consider
several options, including data distribution by subject area,
location, or time.
Tools
Each tool takes a slightly different approach to data
warehousing and often maintain its own version of the
metadata which is placed in a tool-specific, proprietary
metadata repository.
The designers of the tool have to make sure that all selected
tools are compatible with the given data warehouse
environment and with each other.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
28. 28
Performance considerations
Rapid query processing is highly desired feature that should
be designed into the data warehouse.
Design warehouse database to avoid the majority of the most
expensive operations such as multi table search and joins
Nine decisions in the design of data warehouse
1. Choosing the subject matter.
2. Deciding what a fact table represents.
3. Identifying and confirming the dimensions.
4. Choosing the facts.
5. Storing pre calculations in the fact table.
6. Rounding out the dimension tables.
7. Choosing the duration of the database.
8. The need to track slowly changing dimensions.
9. Deciding the query priorities and the query modes
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
29. 29
Technical Considerations
• The hardware platform that would house the data warehouse
• The database management system that supports the warehouse
database.
• The communications infrastructure that connects the
warehouse, data marts, operational systems, and end users.
• The hardware platform and software to support the metadata
repository.
• The systems management framework that enables centralized
management and administration. of the entire environment
Hardware platforms
Data warehouse server is its capacity for handling the
volumes of data required by decision support applications,
some of which may require a significant amount of historical
data.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
30. 30
This capacity requirement can be quite large
The data warehouse residing on the mainframe is best suited
for situations in which large amounts of data
The data warehouse server has to be able to support large data
Volumes and complex query processing.
Balanced approach.
An important design point when selecting a scalable
computing platform is the right balance between all computing
components
Data warehouse and DBMS specialization
The requirements for the data warehouse DBMS are
performance, throughput, and scalability because the database
large in size and the need to process complex ad hoc queries in a
relatively in short time.
The database that have been optimized specifically for data
warehousing.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
31. 31
Communications infrastructure
Communications networks have to be expanded, and new
hardware and software may have to be purchased to meet out the
cost and efforts associated with bringing access to corporate data
directly to the desktop.
Implementation Considerations
Data warehouse implementation requires the integration of
many products within a data warehouse.
• The steps needed to build a data warehouse are as follows.
• Collect and analyze business requirements.
• Create a data model and a physical design for the data
warehouse.
• Define data warehouse.
• Choose the database technology and platform for the warehouse.
• Extract the data from the operational databases, transform it,
clean it up, and load it into the database.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
32. 32
• Choose the database access and reporting tools.
• Choose database connectivity software.
• Choose data analysis and presentation software.
• Update the data warehouse.
Access tools
Suit of tools are needed to handle all possible data
warehouse access needs and the selection of tools based on
definition of deferent types of access to the data
• Simple tabular form reporting.
• Ranking.
• Multivariable analysis.
• Time series analysis.
• Data visualization, graphing, charting and pivoting.
• Complex textual search.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
33. 33
• Statistical analysis.
• Artificial intelligence techniques for testing of hypothesis,
trend discovery, definition and validation of data cluster and
segments.
• Information mapping
• Ad hoc user-specified queries
• Predefined repeatable queries
• Interactive drill-down reporting and analysis.
• Complex queries with multitable joins, multilevel sub queries,
and sophisticated search criteria.
Data extraction, cleanup, transformation and migration
Data extraction decides the ability to transform, consolidate,
integrate, and repair the data should be considered
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
34. 34
• A field-level data examination for the transformation of data
into information is needed.
• The ability to perform data-type and character-set translation is
a requirement when moving data between incompatible
systems.
• The capability to create summarization, aggregation, and
derivation records and fields in very important
• The data warehouse database management should be able to
perform the load directly form the tool, using the native API
available with the RDBMS.
• Vendor stability and support for the product are items that must
be carefully evaluated.
Data placement strategies
As a data warehouse grows, there at least two options for
data placement. One is to put some of the data in the data
warehouse into another storage media e.g., WORM, RAID, or
photo-optical technology.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
35. 35
The second option is to distribute the data in the data warehouse
across multiple servers
Data replication
Data that is relevant to a particular workgroup in a localized
database can be a more affordable solution than data warehousing
Replication technology creates copies of databases on a periodic
bases, so that data entry and data analysis can be performed
separately
Metadata
Metadata is the roadmap to the information stored in the
warehouse
The metadata has to be available to all warehouse users in order
to guide them as they use the warehouse.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
36. 36
User sophistication levels
Casual users
Power users.
Experts
Integrated Solutions
A number of vendors participated in data warehousing by
providing a suit of services and products that go beyond one
particular
Component of the data warehouse.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
37. 37
Digital Equipment Corp. Digital has combined the data
modeling, extraction and cleansing capabilities of Prism
Warehouse Manager with the copy management and data
replication capabilities of Digital’s ACCESSWORKS family
of database access servers in providing users with the ability to
build and use information warehouse
Hewlett-Packard. Hewlett-Packard’s client/server based HP
open warehouse comprises multiple components, including a
data management architecture, the HP-UX operating system
HP 9000 computers, warehouse management tools, and the HP
information Access query tool
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
38. 38
• IBM. The IBM information warehouse framework consists of an
architecture; data management tools; OS/2, AIX, and MVS
operating systems; hardware platforms, including mainframes
and servers; and a relational DBMS (DB2).
• Sequent. Sequent computer systems Inc.’s DecisionPoint
Program is a decision support program for the delivery of data
warehouses dedicated to on-line complex query processing
(OLCP). Using graphical interfaces users query the data
warehouse by pointing and clicking on the warehouse data item
they want to analyze. Query results are placed on the program’s
clipboard for pasting onto a variety of desktop applications, or
they can be saved on to a disk.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
39. 39
Benefits of Data Warehousing
Data warehouse usage includes
• Locating the right information
• Presentation of Information (reports, graphs).
• Testing of hypothesis
• Sharing and the analysis
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
40. 40
Tangible benefits
• Product inventory turnover is improved
• Cost of product introduction are decreased with improved
selection of target markets.
• More cost-effective decision making is enabled by increased
quality and flexibility of market analysis available through
multilevel data structures, which may range from detailed to
highly summarized.
• Enhanced asset and liability management means that a data
warehouse can provide a “big” picture of enterprise wide
purchasing and inventory patterns.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
41. 41
Intangible benefits
The intangible benefits include.
• Improved productivity, by keeping all required data in a single
location and eliminating the redundant processing
• Reduced redundant processing.
• Enhance customer relations through improved knowledge of
individual requirement and trends.
• Enabling business process reengineering.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
42. 3. Mapping the Warehouse to a
Multiprocessor Architecture
42
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
43. 43
Mapping the Warehouse to a Multiprocessor Architecture
Relational Database Technology for Data Warehouse
The Data warehouse environment needs
• Speed up
• Scale-p
Parallel hardware architectures, parallel operating systems
and parallel database management systems will provide the
requirement of warehouse environment.
Types of parallelism
Interquery parallelism
Threads (or process) handle multiple requests at the same time.
Intraquery parallelism
scan, join, sort, and aggregation operations are executed
concurrently in parallel.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
44. 44
Intraquery parallelism can be done in either of two ways
Horizontal parallelism
Database is partitioned across multiple disks, and parallel
processing occurs within a specific task that is performed
concurrently on different sets of data.
Vertical parallelism
An output from on tasks (e.g., scan) becomes are input into
another task (e.g., join) as soon as records become available.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
45. 45
Data Partitioning
Spreads data from database tables across multiple disks so
that I/O operations such as read and write can be performed in
parallel.
Random partitioning
It includes data striping across multiple disks on a
single server. Another options for random partitioning is
round-robin partitioning. In which each new record is placed
on the next assigned to the database.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
46. 46
Case 1
Response
Time
Case 2
Serial
RDBMS
Horizontal Parallelism
(Data Partitioning)
Case 3
Vertical Parallelism
(Query Decomposition)
Case 4
Intelligent partitioning
DBMS knows where a specific record is located and does
not waste time searching for it across all disks.
Hash partitioning. A hash algorithm is used to calculate the
partition umber (hash value) based on the value of the portioning
key for each row.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
47. 47
Key range partitioning. Rows are placed and located in the
partitions according to the value of the partitioning key (all rows
with the key value form A to K are in partition 1, L to T are in
partition 2 etc.).
Schema partitioning. an entire table is placed on one disk, another
table is placed on a different disk, etc. This is useful for small
reference tables that are more effectively used when replicated in
each partition rather than spread across partitions.
User-defined partitioning. This is a partitioning method that allows
a table to be partitioned on the basis of a user-defined expression.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
48. Database Architectures for
Database Architectures for
Parallel Processing
Parallel Processing
Shared-memoryArchitecture-
multiple processors share the main memory space, as well as mass
storage (e.g. hard disk drives)
Shared Disk Architecture - each node has its own main
memory, but all nodes share mass storage, usually a storage area network
Shared-nothing Architecture - each node has its own
mass storage as well as main memory.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2 48
49. Database Architecture for Parallel Processing
Shared-memory architecture- SMP
(Symmetric Multiprocessors)
Multiple database components executing SQL
statements communicate with each other by
exchanging messages and data via the shared
memory.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2 49
50. 50
Scalability can be achieved through process-based multitasking
or thread-based multitasking.
Interconnection Network
Processor
Unit
(PU)
Global Shared Memory
Processor
Unit
(PU)
Processor
Unit
(PU)
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
51. 51
Shared-disk architecture
The entire database shared between RDBMS servers, each of
which is running on a node of a distributed memory system.
Each RDBMS server can read, write, update, and delete
records from the same shared database
Implemented by using distribute lock manager (DLM)
Disadvantage.
All nodes are reading and updating the same data, the
RDBMS and its DLM will have to spend a lot of resources
synchronizing
multiple buffer pools.
It may have to handle significant message traffic in a highly
utilized REBMS environment.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
52. 52
Advantages.
It reduce performance bottlenecks resulting from data skew (an
uneven distribution of data), and can significantly increases system
availability.
It eliminates the memory access bottleneck typical of large SMP
systems, and helps reduce DBMS dependency on data partitioning.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
53. 53
Interconnection Network
Processor
Unit
(PU)
Global Shared Disk Subsystem
Processor
Unit
(PU)
Processor
Unit
(PU)
Local
Memory
Local
Memory
Local
Memory
Figure 4.3 Distributed-memory shared-disk architecture
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
54. 54
Shared-nothing architecture
Each processor has its own memory and disk, and communicates
with other processors by exchanging messages and data over the
interconnection network. Interconnection Network
Processor
Unit
(PU)
Processor
Unit
(PU)
Processor
Unit
(PU)
Local
Memory
Local
Memory
Local
Memory
Disadvantages.
It is most difficult to implement.
It requires a new programming paradigm
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
55. 55
Combined architecture
Combined hardware architecture could be a cluster of SMP
nodes
combined parallel DBMS architecture should support
intersever parallelism of distributed memory MPPs and
intraserver parallelism of SMP nodes.
Parallel RDBMS features
Scope and techniques of parallel DBMS operations
Optimizer implementation
Application transparency
The parallel environment
DBMS management tool
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
56. 4. DBMS Schemas for Decision
Support
56
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
57. 57
DBMS Schemas for Decision Support
Data Layout for best access
Multidimensional Data Model
Star Schema
Two groups: facts and dimension
Facts are the core data element being analyzed
e.g.. items sold
dimensions are attributes about the facts
e.g. date of purchase
The star schema is designed to overcome this limitation in
the two-dimensional relational model.
DBA Viewpoint
The fact table contains raw facts. The facts are typically
additive and are accessed via dimensions.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
59. 59
The dimension tables contain a non-compound primary key
and are heavily indexed.
Dimension tables appear in constraints and GROUP BY
Clauses, and are joined to the fact tables using foreign key
references.
Once the star schema database is defined and loaded, the
queries that answer simple and complex questions.
Potential Performance Problems with star schemas
The star schema suffers the following performance problems.
Indexing
Multipart key presents some problems in the star schema model.
(day->week-> month-> quarter-> year )
• It requires multiple metadata definition( one for each component)
to design a single table.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
60. 60
• Since the fact table must carry all key components as part of its
primary key, addition or deletion of levels in the hierarchy will
require physical modification of the affected table, which is time-
consuming processed that limits flexibility.
Level Indicator
The dimension table design includes a level of hierarchy
indicator for every record.
The user is not and aware of the level indicator, or its
values are in correct, the otherwise valid query may result in a
totally invalid answer.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
61. 61
Alternative to using the level indicator is the snowflake schema
Aggregate fact tables are created separately from detail tables
Snowflake schema contains separate fact tables for each level of
aggregation
Other problems with the star schema design
Pairwise Join Problem
5 tables require joining first two tables, the result of this join
with third table and so on.
The intermediate result of every join operation is used to
join with the next table.
Selecting the best order of pairwise joins rarely can be solve
in a reasonable amount of time.
Five-table query has 5!=120 combinations
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
62. 62
This problem is so serious that some databases will not run a
query that tries to join too many tables.
STARjoin and STARindex
A STARjoin is a high-speed, single-pass, parallelizable
multitable join and is introduced by Red Brick’s RDBMS.
STARindexes to accelerate the join performance
STARindexes are created in one or more foreign key columns
of a fact table.
Traditional multicolumn references a single table where as the
STARindex can reference multiple tables
With multicolumn indexes, if a query’s WHERE Clause does
not contain on all the columns in the composite index, the index
cannot be fully used unless the specified columns are a leading
subset.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
63. 63
The STARjoin using STARindex could efficiently join the
dimension tables to the fact table without penalty of generating
the full Cartesian product.
The STARjoin algorithm is able to generate a Cartesian
product in regions where these are rows of interest and bypass
generating Cartesian products over region where these are no
rows.
Bit mapped Indexing
SYBASE IQ
Overview.
Data is loaded into SYBASE IQ, it converts all data into a
series of bitmaps; which are them highly compressed and
stored in disk.
SYBASE IQ indexes do not point to data stored elsewhere all
data is contained in the index structure.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
64. 64
Data Cardinality.
Bitmap indexes are used to queries against low-cardinality
data-that is data in which the total number of potential values is
relatively low.
For low cardinality data, each distinct value has its own
bitmap index consisting of a bit for every row in the table.
SYBASE IQ high cardinality index starts at 1000 distinct
values.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
65. 65
Emp-Id Gender Last Name First Name Address
104345 M Karthik Ramasamy 10, North street
104567 M Visu Pandian 12, Pallavan street
104788 F Mala Prathap 123, Koil street
1 1 0 0 0 1 1 0 0 1 0 1 1 1 0
Record 1
Record 2
Record N
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
66. 66
Index Types.
The SYBASE IQ provides five index techniques. One is a
default index called the Fast projection index and the other is either
a low-or high-cardinality index.
Performance.
SYBASE IQ technology achieves very good performance in
ad hoc queries for several reasons.
• Bitwise Technology. This allows raped response to queries
containing various data type, supports data aggregation and
grouping.
• Compression. SYBASE IQ uses sophisticated algorithm to
compress data into bitmapping SYBASE IQ can hold more data in
memory minimizing expensive I/O operations.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
68. 68
• Optimized memory-based processing. Columnwise
processing.
• Low Overhead.
• Large Block I/O.
• Operating-system-level parallelism.
• Prejoin and ad hoc join Capabilities.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
69. 69
Shortcoming of Indexing.
Some of the tradeoffs of the SYBASE IQ are as follows
• No Updates.
• Lack of core RDBMS features.
• Less advantage for planned queries.
• High memory Usage.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
70. 70
Column Local Storage
Performance in the data warehouse environment can be
achieved by storing data in memory in column wise instead to
store one row at a time and each row can be viewed and accessed
as single record.
Emp-id Emp-Name Dept Salary
1004
1005
1006
Suresh
Mani
Sara
CSE
MECH
CIVIL
15000
25000
23000
1004 Suresh CSE 15000
1005 Mani MECH 25000
1006 Sara CIVIL 23000
1004 1005 1006
Suresh Mani Sara
CSE MECH CIVIL
15000 25000 23000
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
71. 71
Complex Data types
The warehouse environment support for datatypes of
complex like text, image, full-motion video, some and large
objects called binary large object (BLOBs) other than simple
such as alphanumeric.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
72. 5. Data Extraction, Cleanup, and
Transformation Tools
72
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
73. 73
Data Extraction, Cleanup, and Transformation Tools
Tools Requirements
The tools that enable sourcing of the proper data
contents and formats from operational and external data stores
into the data warehouse to perform a number of important tasks
that include
• Data transformation from one format to another on the basis of
possible differences between the source and the target platform.
• Data transformation and calculation based on the application of
the business rules that force certain transformations..
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
74. 74
Vendor Approaches
The integrated solutions can fall into one of the categories
described below
Code generators
Database data replication tools
Rule-driven dynamic transformation engines capture data
from source systems at user-defined intervals, transform the data,
and then send and load the results into a target environment,
typically a data mart
Access to Legacy Data
Many organizations develop middleware solutions that can
manage the interaction between the new applications and growing
data warehouses on one hand and back-end legacy systems in the
other hand.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
75. 75
A three architecture that defines how applications are
partitioned to meet both near-term integration and long-term
migration objectives.
• The data layer provides data access and transaction services for
management of corporate data assets.
• The process layer provides services to manage automation and
support for current business process.
• The user layer manages user interaction with process and /or data
layer services.
Vendor Solutions
Prism Solutions
Provides a comprehensive solution of data warehousing by
mapping source data to a target database management system to
be used as warehouse.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
76. 76
Warehouse Manager generates code to extract and integrate
data, create and manage metadata, and build a subject-oriented,
historical base.
SAS Institute
SAS tools to serve all data warehousing functions.
Its data repository function can act to build the
informational database.
SAS Data Access Engine serve as extraction tools to
combine common variables, transform data representation forms
for consistency, consolidate redundant data, and use business
rules to produce computed values in the warehouse.
SAS engines can work with hierarchical and relational
databases and sequential files
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
77. 77
Carleton Corporation’s PASSPORT and MetaCenter.
PASSPORT.
PASSPORT is sophisticated metadata-driven, data-mapping
and data-migration facility.
PASSPORT Workbench runs as a client on various PC
platforms in the three-tiered environment, including OS/2 and
Windows.
The product consists of two components.
The first, which is mainframe-based, collects the file,
record, or table layouts for the required inputs and outputs and
converts them to the Passport Data Language (PDL).
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
78. 78
Overall, PASSPORT offers
• A metadata dictionary at the core of the process.
• Robust data conversion, migration, analysis, and auditing
facilities.
• The PASSPORT Workbench that enables project development
on a workstations, with uploading of the generated application to
the source data platform.
• Native interfaces to existing data files and RDBMS, helping
users to lever-age existing legacy applications and data.
• A comprehensive fourth-generation specification language and
the full power of COBOL.
The MetaCenter.
The MetaCenter, developed by Carleton Corporation in
partnership with Intellidex System, Inc., is and integrated tool
suite that is designed to put users in control of the data
warehouse.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
79. 79
It is used to manage
• Data extraction
• Data transformation
• Metadata capture
• Metadata browsing
• Data mart subscription
• Warehouse control center functionality
• Event control and notification
Vality Corporation
Vality Corporation’s Integrity data reengineering tool is
used to investigate, standardize, transform, and integrate data
from multiple operational systems and external sources.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
80. 80
• Data audits
• Data warehouse and decision support systems
• Customer information files and house holding applications
• Client/server business applications such as SAP, Oracle, and
Hogan
• System consolidations
• Rewrites of existing operational systems
Transformation Engines
Informatica
Informatica’s product, the PowerMart suite, captures
technical and business metadata on the back-end that can be
integrated with the metadata in front-end partner’s products.
PowerMart creates and maintains the metadata repository
automatically.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
81. 81
It consists of the following components
PowerMart Designer is made up of three integrated
modules- Source Analyzer, Warehouse Designer, and
Transformation Designer
PowerMart Server runs on a UNIX or Windows NT
platform.
The Information Server Manager is responsible for
configuring, scheduling, and monitoring the Information Server.
The Information Repository is the metadata integration hub
of the Informatica PowerMart Suite.
Constellar
The Constellar Hub is designed to handle the movement
and transformation of data for both data migration and data
distribution in an operational system, and for capturing
operational data for loading a data warehouse.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
82. 82
Constellar employs a hub and spoke architecture to manage
the flow of data between source and target systems.
Hubs that perform data transformation based on rules
defined and developed using Migration Manager
Each of the spokes represents a data path between a
transformation hub and a data source or target.
A hub and its associated sources and targets can be installed
on the same machine, or may run on separate networked
computers.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
84. 84
Metadata
The metadata contains
• The location and description of warehouse system and data
components
• Names, definition, structure, and content of the warehouse and
end-user views.
• Identification of authoritative data sources.
• Integration and transformation rules used to populate the data
warehouse; these include the mapping method from operational
databases into the warehouse, and algorithms used to convert,
enhance, or transform data
• Integration and transforms rules used to deliver data to end-user
analytical tools.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
85. 85
Metadata Interchange Initiative
A Metadata standard developed for metadata interchange
format and its support mechanism.
The goal of the standard include
• Creating a vendor-independent, industry-defined and application
programming interface (API) for metadata.
• Allowing users to build tool configurations that meet their needs
and to incrementally adjust those configurations as necessary to
add or subtract tools without impact on the interchange standards
environment.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
86. 86
Metadata Interchange Standard framework.
The components of the Metadata Interchange Standard
Framework are
• The Standard Metadata Model, which refers to the ASCII file
format used to represent the metadata that is being exchanged.
• The Standard Access Framework, which describes the minimum
number of API functions a vendor must support.
• Tool Profile, which is provided by each tool vendor. The Tool
Profile is a file that describes what aspects of the interchange
standard metamodel a particular tool supports.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
87. 87
User Configuration
Standard Access Framework
Standard API
Standard
Metadata
Model
TOOL 1
Tool
Profile
TOOL 2
Tool
Profile
TOOL 3
Tool
Profile
TOOL 4
Tool
Profile
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
88. 88
Metadata Repository
The metadata itself is housed in and managed by the
metadata repository.
Metadata repository management software can be used to
map the source data to the target database, generate code for
data transformations, integrate and transform the data, and
control moving data to the warehouse.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
89. 89
Metadata Management
Metadata define all data elements and their attributes, data
sources and timing, and the rules that govern data use and data
transformations.
The metadata also has to be available to all warehouse users
in order to guide them as they use the warehouse.
Awell-thought-through strategy for collecting, maintaining,
and distributing metadata is needed for a successful data
warehouse implementation.
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2
90. 90
Metadata Trends
The process of integrating external and internal data into
the warehouse faces a number of challenges
• Inconsistent data formats
• Missing or invalid data
• Different level of aggregation
• Semantic inconsistency (e.g., different codes may mean
different things from different suppliers of data)
• Unknown or questionable data quality and timeliness
IFETCE/CSE/III YEAR/VI SEM/IT6702/DWDM/PPT/UNIT-1/ VER 1.2