In recent years, the number of digital information storage and retrieval systems has increased immensely.
Data warehousing has been found to be an extremely useful technology for integrating such heterogeneous
and autonomous information sources. Data within the data warehouse is modelled in the form of a star or
snowflake schema which facilitates business analysis in a multidimensional perspective. As user
requirements are interesting measures of business processes, the data warehouse schema is derived from
the information sources and business requirements. Due to the changing business scenario, the information
sources not only change their data, but also change their schema structure. In addition to the source
changes the business requirements for data warehouse may also change. Both these changes results in data
warehouse schema evolution. These changes can be handled either by just updating it in the DW model, or
can be developed as a new version of the DW structure. Existing approaches either deal with source
changes or requirements changes in a manual way and changes to the data warehouse schema is carried
out at the physical level. This may induce high maintenance costs and complex OLAP server
administration. As ontology seems to be a promising solution for the data warehouse research, in this
paper an ontological approach to automate the evolution of a data warehouse schema is proposed. This
method assists the data warehouse designer in handling evolution at the ontological level based on which
decision can be made to carry out the changes at the physical level. We evaluate the proposed ontological
approach with the existing method of manual adaptation of data warehouse schema.
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCEcscpconf
Data warehouse use has increased significantly in recent years and now plays a fundamental role in many organizations’ decision-support processes. An effective business intelligence
infrastructure that leverages the power of a data warehouse can deliver value by helping companies enhance their customer experience. In this paper is to generate reports with various
drilldowns and slier conditions with suitable parameters which provide a complete business solution which is helpful for monitor the company inflow and outflow. The goal of the work is
for potential users of the data warehouse in their decision making process in the Business process system to get a complete visual effort of those reports by creating the chart and grid interface from warehouse. The example in this paper relate directly to the Adventure Work Data Warehouse Project implementation which helps to know the internet sales amount according to different date
A META DATA VAULT APPROACH FOR EVOLUTIONARY INTEGRATION OF BIG DATA SETS: CAS...ijcsit
A data warehouse integrates data from various and heterogeneous data sources and creates a consolidated view of the data that is optimized for reporting and analysis. Today, business and technology are constantly evolving, which directly affects the data sources. New data sources can emerge while some can become unavailable. The DW or the data mart that is based on these data sources needs to reflect these changes. Various solutions to adapt a data warehouse after the changes
in the data sources and the business requirements have been proposed in the literature [1]. However, research in the problem of DW evolution has focused mainly on managing changes in the dimensional model while other aspects related to the ETL, and maintaining the history of changes has not been addressed. The paper presents a Meta Data vault model that includes a data vault based data warehouse and a master data management. A major area of focus in this research is to keep both history of changes and a “single version of the truth,” through an MDM, integrated with the DW. The paper also outlines the load patterns used to load data into the data warehouse and materialized views to deliver data to end-users. To test the proposed model, we have used big data sets from the biomedical field and for each modification of the data source schema, we outline the changes that need to be made to the EDW, the data marts and the ETL.
Data marts,Types of Data Marts,Multidimensional Data Model,Fact table ,Dimension table ,Data Warehouse Schema,Star Schema,Snowflake Schema,Fact-Constellation Schema
SALES BASED DATA EXTRACTION FOR BUSINESS INTELLIGENCEcscpconf
Data warehouse use has increased significantly in recent years and now plays a fundamental role in many organizations’ decision-support processes. An effective business intelligence
infrastructure that leverages the power of a data warehouse can deliver value by helping companies enhance their customer experience. In this paper is to generate reports with various
drilldowns and slier conditions with suitable parameters which provide a complete business solution which is helpful for monitor the company inflow and outflow. The goal of the work is
for potential users of the data warehouse in their decision making process in the Business process system to get a complete visual effort of those reports by creating the chart and grid interface from warehouse. The example in this paper relate directly to the Adventure Work Data Warehouse Project implementation which helps to know the internet sales amount according to different date
A META DATA VAULT APPROACH FOR EVOLUTIONARY INTEGRATION OF BIG DATA SETS: CAS...ijcsit
A data warehouse integrates data from various and heterogeneous data sources and creates a consolidated view of the data that is optimized for reporting and analysis. Today, business and technology are constantly evolving, which directly affects the data sources. New data sources can emerge while some can become unavailable. The DW or the data mart that is based on these data sources needs to reflect these changes. Various solutions to adapt a data warehouse after the changes
in the data sources and the business requirements have been proposed in the literature [1]. However, research in the problem of DW evolution has focused mainly on managing changes in the dimensional model while other aspects related to the ETL, and maintaining the history of changes has not been addressed. The paper presents a Meta Data vault model that includes a data vault based data warehouse and a master data management. A major area of focus in this research is to keep both history of changes and a “single version of the truth,” through an MDM, integrated with the DW. The paper also outlines the load patterns used to load data into the data warehouse and materialized views to deliver data to end-users. To test the proposed model, we have used big data sets from the biomedical field and for each modification of the data source schema, we outline the changes that need to be made to the EDW, the data marts and the ETL.
Data marts,Types of Data Marts,Multidimensional Data Model,Fact table ,Dimension table ,Data Warehouse Schema,Star Schema,Snowflake Schema,Fact-Constellation Schema
A Computer database is a collection of logically related data that is stored in a computer system,so that a computer program or person using a query language can use it to answer queries. An operational database (OLTP) contains up-to-date, modifiable application specific data. A data warehouse (OLAP) is a subject-oriented, integrated, time-variant and non-volatile collection of data used to make business decisions. Hadoop Distributed File System (HDFS) allows storing large amount of data on a cloud of
machines. In this paper, we surveyed the literature related to operational databases, data warehouse and hadoop technology.
Elimination of data redundancy before persisting into dbms using svm classifi...nalini manogaran
Elimination of data redundancy before persisting into dbms using svm classification,
Data Base Management System is one of the
growing fields in computing world. Grid computing, internet
sharing, distributed computing, parallel processing and cloud
are the areas store their huge amount of data in a DBMS to
maintain the structure of the data. Memory management is
one of the major portions in DBMS due to edit, delete, recover
and commit operations used on the records. To improve the
memory utilization efficiently, the redundant data should be
eliminated accurately. In this paper, the redundant data is
fetched by the Quick Search Bad Character (QSBC) function
and intimate to the DB admin to remove the redundancy.
QSBC function compares the entire data with patterns taken
from index table created for all the data persisted in the
DBMS to easy comparison of redundant (duplicate) data in
the database. This experiment in examined in SQL server
software on a university student database and performance is
evaluated in terms of time and accuracy. The database is
having 15000 students data involved in various activities.
Keywords—Data redundancy, Data Base Management System,
Support Vector Machine, Data Duplicate.
I. INTRODUCTION
The growing (prenominal) mass of information
present in digital media has become a resistive problem for
data administrators. Usually, shaped on data congregate
from distinct origin, data repositories such as those used by
digital libraries and e-commerce agent based records with
disparate schemata and structures. Also problems regarding
to low response time, availability, security and quality
assurance become more troublesome to manage as the
amount of data grow larger. It is practicable to specimen
that the peculiarity of the data that an association uses in its
systems is relative to its efficiency for offering beneficial
services to their users. In this environment, the
determination of maintenance repositories with “dirty” data
(i.e., with replicas, identification errors, equal patterns,
etc.) goes greatly beyond technical discussion such as the
everywhere quickness or accomplishment of data
administration systems.
Nalini.M, nalini.tptwin@gmail.com, Anbu.S, anomaly detection,
data mining
big data
dbms
intrusion detection
dublicate detection
data cleaning
data redundancy
data replication, redundancy removel, QSBC, Duplicate detection, error correction, de-duplication, Data cleaning, Dbms, Data sets
Big Data triggered furthered an influx of research and prospective on concepts and processes pertaining previously to the Data Warehouse field. Some conclude that Data Warehouse as such will disappear;others present Big Data as the natural Data Warehouse evolution (perhaps without identifying a clear division between the two); and finally, some others pose a future of convergence, partially exploring the
possible integration of both. In this paper, we revise the underlying technological features of Big Data and Data Warehouse, highlighting their differences and areas of convergence. Even when some differences exist, both technologies could (and should) be integrated because they both aim at the same purpose: data exploration and decision making support. We explore some convergence strategies, based on the common elements in both technologies. We present a revision of the state-of-the-art in integration proposals from the point of view of the purpose, methodology, architecture and underlying technology, highlighting the
common elements that support both technologies that may serve as a starting point for full integration and we propose a proposal of integration between the two technologies.
Role of Data Cleaning in Data WarehouseRamakant Soni
Data cleaning, also called data cleansing or scrubbing, deals with detecting and removing errors and inconsistencies from data in order to improve the quality of data.
Introduction to Data Warehouse. Summarized from the first chapter of 'The Data Warehouse Lifecyle Toolkit : Expert Methods for Designing, Developing, and Deploying Data Warehouses' by Ralph Kimball
Project report on the design and build of a data warehouse from unstructured and structured data sources (Quandl, yelp and UK Office for National Statistics) using SQL Server 2016, MongoDB and IBM Watson. Design and implementation of business intelligence visualisations using Tableau to answer cross domain business questions
A data warehouse integrates data from various and heterogeneous data sources and creates a
consolidated view of the data that is optimized for reporting and analysis. Today, business and
technology are constantly evolving, which directly affects the data sources. New data sources can
emerge while some can become unavailable. The DW or the data mart that is based on these data
sources needs to reflect these changes. Various solutions to adapt a data warehouse after the changes
in the data sources and the business requirements have been proposed in the literature [1].
A data warehouse integrates data from various and heterogeneous data sources and creates a consolidated view of the data that is optimized for reporting and analysis. Today, business and technology are constantly evolving, which directly affects the data sources. New data sources can emerge while some can become unavailable. The DW or the data mart that is based on these data sources needs to reflect these changes. Various solutions to adapt a data warehouse after the changes in the data sources and the business requirements have been proposed in the literature [1]. However, research in the problem of DW evolution has focused mainly on managing changes in the dimensional model while other aspects related to the ETL, and maintaining the history of changes has not been addressed. The paper presents a Meta Data vault model that includes a data vault based data warehouse and a master data management. A major area of focus in this research is to keep both history of changes and a “single version of the truth,” through an MDM, integrated with the DW. The paper also outlines the load patterns used to load data into the data warehouse and materialized views to deliver data to end-users. To test the proposed model, we have used big data sets from the biomedical field and for each modification of the data source schema, we outline the changes that need to be made to the EDW, the data marts and the ETL.
Design and implementation of the web (extract, transform, load) process in da...IAESIJAI
Owing to the recent increased size and complexity of data in addition to management issues, data storage requires extensive attention so that it can be employed in realistic applications. Hence, the requirement for designing and implementing a data ware-house system has become an urgent necessity. Data extraction, transformation and loading (ETL) is a vital part of the data warehouse architecture. This study designs a data warehouse application that contains a web ETL process that divides the work between the input device and server. This system is proposed to solve the lack of work partitioning between the input device and server. The designed system can be used in many branches and disciplines because of its high performance in adding data, analyzing data by using a web server and building queries. Analysis of the results proves that the designed system is fast in cleaning and transferring data between the remote parts of the system connected to the internet. ETL without missing any data consumes 0.00582 seconds.
A Computer database is a collection of logically related data that is stored in a computer system,so that a computer program or person using a query language can use it to answer queries. An operational database (OLTP) contains up-to-date, modifiable application specific data. A data warehouse (OLAP) is a subject-oriented, integrated, time-variant and non-volatile collection of data used to make business decisions. Hadoop Distributed File System (HDFS) allows storing large amount of data on a cloud of
machines. In this paper, we surveyed the literature related to operational databases, data warehouse and hadoop technology.
Elimination of data redundancy before persisting into dbms using svm classifi...nalini manogaran
Elimination of data redundancy before persisting into dbms using svm classification,
Data Base Management System is one of the
growing fields in computing world. Grid computing, internet
sharing, distributed computing, parallel processing and cloud
are the areas store their huge amount of data in a DBMS to
maintain the structure of the data. Memory management is
one of the major portions in DBMS due to edit, delete, recover
and commit operations used on the records. To improve the
memory utilization efficiently, the redundant data should be
eliminated accurately. In this paper, the redundant data is
fetched by the Quick Search Bad Character (QSBC) function
and intimate to the DB admin to remove the redundancy.
QSBC function compares the entire data with patterns taken
from index table created for all the data persisted in the
DBMS to easy comparison of redundant (duplicate) data in
the database. This experiment in examined in SQL server
software on a university student database and performance is
evaluated in terms of time and accuracy. The database is
having 15000 students data involved in various activities.
Keywords—Data redundancy, Data Base Management System,
Support Vector Machine, Data Duplicate.
I. INTRODUCTION
The growing (prenominal) mass of information
present in digital media has become a resistive problem for
data administrators. Usually, shaped on data congregate
from distinct origin, data repositories such as those used by
digital libraries and e-commerce agent based records with
disparate schemata and structures. Also problems regarding
to low response time, availability, security and quality
assurance become more troublesome to manage as the
amount of data grow larger. It is practicable to specimen
that the peculiarity of the data that an association uses in its
systems is relative to its efficiency for offering beneficial
services to their users. In this environment, the
determination of maintenance repositories with “dirty” data
(i.e., with replicas, identification errors, equal patterns,
etc.) goes greatly beyond technical discussion such as the
everywhere quickness or accomplishment of data
administration systems.
Nalini.M, nalini.tptwin@gmail.com, Anbu.S, anomaly detection,
data mining
big data
dbms
intrusion detection
dublicate detection
data cleaning
data redundancy
data replication, redundancy removel, QSBC, Duplicate detection, error correction, de-duplication, Data cleaning, Dbms, Data sets
Big Data triggered furthered an influx of research and prospective on concepts and processes pertaining previously to the Data Warehouse field. Some conclude that Data Warehouse as such will disappear;others present Big Data as the natural Data Warehouse evolution (perhaps without identifying a clear division between the two); and finally, some others pose a future of convergence, partially exploring the
possible integration of both. In this paper, we revise the underlying technological features of Big Data and Data Warehouse, highlighting their differences and areas of convergence. Even when some differences exist, both technologies could (and should) be integrated because they both aim at the same purpose: data exploration and decision making support. We explore some convergence strategies, based on the common elements in both technologies. We present a revision of the state-of-the-art in integration proposals from the point of view of the purpose, methodology, architecture and underlying technology, highlighting the
common elements that support both technologies that may serve as a starting point for full integration and we propose a proposal of integration between the two technologies.
Role of Data Cleaning in Data WarehouseRamakant Soni
Data cleaning, also called data cleansing or scrubbing, deals with detecting and removing errors and inconsistencies from data in order to improve the quality of data.
Introduction to Data Warehouse. Summarized from the first chapter of 'The Data Warehouse Lifecyle Toolkit : Expert Methods for Designing, Developing, and Deploying Data Warehouses' by Ralph Kimball
Project report on the design and build of a data warehouse from unstructured and structured data sources (Quandl, yelp and UK Office for National Statistics) using SQL Server 2016, MongoDB and IBM Watson. Design and implementation of business intelligence visualisations using Tableau to answer cross domain business questions
A data warehouse integrates data from various and heterogeneous data sources and creates a
consolidated view of the data that is optimized for reporting and analysis. Today, business and
technology are constantly evolving, which directly affects the data sources. New data sources can
emerge while some can become unavailable. The DW or the data mart that is based on these data
sources needs to reflect these changes. Various solutions to adapt a data warehouse after the changes
in the data sources and the business requirements have been proposed in the literature [1].
A data warehouse integrates data from various and heterogeneous data sources and creates a consolidated view of the data that is optimized for reporting and analysis. Today, business and technology are constantly evolving, which directly affects the data sources. New data sources can emerge while some can become unavailable. The DW or the data mart that is based on these data sources needs to reflect these changes. Various solutions to adapt a data warehouse after the changes in the data sources and the business requirements have been proposed in the literature [1]. However, research in the problem of DW evolution has focused mainly on managing changes in the dimensional model while other aspects related to the ETL, and maintaining the history of changes has not been addressed. The paper presents a Meta Data vault model that includes a data vault based data warehouse and a master data management. A major area of focus in this research is to keep both history of changes and a “single version of the truth,” through an MDM, integrated with the DW. The paper also outlines the load patterns used to load data into the data warehouse and materialized views to deliver data to end-users. To test the proposed model, we have used big data sets from the biomedical field and for each modification of the data source schema, we outline the changes that need to be made to the EDW, the data marts and the ETL.
Design and implementation of the web (extract, transform, load) process in da...IAESIJAI
Owing to the recent increased size and complexity of data in addition to management issues, data storage requires extensive attention so that it can be employed in realistic applications. Hence, the requirement for designing and implementing a data ware-house system has become an urgent necessity. Data extraction, transformation and loading (ETL) is a vital part of the data warehouse architecture. This study designs a data warehouse application that contains a web ETL process that divides the work between the input device and server. This system is proposed to solve the lack of work partitioning between the input device and server. The designed system can be used in many branches and disciplines because of its high performance in adding data, analyzing data by using a web server and building queries. Analysis of the results proves that the designed system is fast in cleaning and transferring data between the remote parts of the system connected to the internet. ETL without missing any data consumes 0.00582 seconds.
Designing a Framework to Standardize Data Warehouse Development Process for E...ijdms
Data warehousing solutions work as information base for large organizations to support their decision
making tasks. With the proven need of such solutions in current times, it is crucial to effectively design,
implement and utilize these solutions. Data warehouse (DW) implementation has been a challenge for the
organizations and the success rate of its implementation has been very low. To address these problems, we
have proposed a framework for developing effective data warehousing solutions. The framework is
primarily based on procedural aspect of data warehouse development and aims to standardize its process.
We first identified its components and then worked on them in depth to come up with the framework for
effective implementation of data warehousing projects. To verify effectiveness of the designed framework,
we worked on National Rural Health Mission (NRHM) project of Indian government and designed data
warehousing solution using the proposed framework.
KEYWORDS
Data warehousing, Framework Design, Dimensional
CoDe Modeling of Graph Composition for Data Warehouse Report VisualizationKaashivInfoTech Company
The visualization of information contained in reports is an important aspect of human-computer interaction, for both the accuracy and the complexity of relationships between data must be preserved. A greater attention has been paid to individual report visualization through different types of standard graphs (Histograms, Pies, etc.). However, this kind of representation provides separate information items and gives no support to visualize their relationships which are extremely important for most decision processes. This paper presents a design methodology exploiting the visual language CoDe [1] based on a logic paradigm. CoDe allows to organize the visualization through the CoDe model which graphically represents relationships between information items and can be considered a conceptual map of the view. The proposed design methodology is composed of four phases: the CoDe Modelingand OLAP Operation pattern definition phases define the CoDe model and underlying metadata information, the OLAP Operation phase physically extracts data from a data warehouse and the Report Visualization phase generates the final visualization. Moreover,a case study on real data is provided.
http://kaashivinfotech.com/
http://inplanttrainingchennai.com/
http://inplanttraining-in-chennai.com/
http://internshipinchennai.in/
http://inplant-training.org/
http://kernelmind.com/
http://inplanttraining-in-chennai.com/
http://inplanttrainingchennai.com/
DATA WAREHOUSING AS KNOWLEDGE POOL : A VITAL COMPONENT OF BUSINESS INTELLIGENCEijcseit
Increasing amounts of information and diverse formats have forced organizations to create large data
repositories in response to the information explosion in the 21st century. As a result, the model of a data
warehouse has been introduced to define a large data repository. The purpose of this article is to describe
the principles of data warehousing in business and how it can enhance the generation of new knowledge
throughout the organization. Definitions of data warehousing are considered and include methods of its
use, namely query and data simulation. The steps required prior to transforming the raw data into a data
warehouse and project goals for data warehouses and metadata are also discussed. The objective of this
document is to provide a clear and simple description of data warehousing terms and concepts, especially
for busy managers and laymen. They may only need basic and direct information about data warehouses to
gain a complete understanding of the principles of the data warehouse.
Journal of Physics Conference SeriesPAPER • OPEN ACCESS.docxLaticiaGrissomzz
Journal of Physics: Conference Series
PAPER • OPEN ACCESS
The methodology of database design in
organization management systems
To cite this article: I L Chudinov et al 2017 J. Phys.: Conf. Ser. 803 012030
View the article online for updates and enhancements.
You may also like
The Construction of Group Financial
Management Information System
Yuan Ma
-
Identification of E-Maintenance Elements
and Indicators that Affect Maintenance
Performance of High Rise Building: A
Literature Review
Nurul Inayah Wardahni, Leni Sagita
Riantini, Yusuf Latief et al.
-
Web-Based Project Management
Information System in Construction
Projects
M R Fachrizal, J C Wibawa and Z Afifah
-
This content was downloaded from IP address 75.44.16.235 on 09/10/2022 at 19:18
https://doi.org/10.1088/1742-6596/803/1/012030
https://iopscience.iop.org/article/10.1088/1757-899X/750/1/012025
https://iopscience.iop.org/article/10.1088/1757-899X/750/1/012025
https://iopscience.iop.org/article/10.1088/1757-899X/1007/1/012021
https://iopscience.iop.org/article/10.1088/1757-899X/1007/1/012021
https://iopscience.iop.org/article/10.1088/1757-899X/1007/1/012021
https://iopscience.iop.org/article/10.1088/1757-899X/1007/1/012021
https://iopscience.iop.org/article/10.1088/1757-899X/879/1/012064
https://iopscience.iop.org/article/10.1088/1757-899X/879/1/012064
https://iopscience.iop.org/article/10.1088/1757-899X/879/1/012064
The methodology of database design in organization
management systems
I L Chudinov, V V Osipova, Y V Bobrova
Tomsk Polytechnic University, 30, Lenina ave., Tomsk, 634050, Russia
E-mail: [email protected]
Abstract. The paper describes the unified methodology of database design for management
information systems. Designing the conceptual information model for the domain area is the
most important and labor-intensive stage in database design. Basing on the proposed integrated
approach to design, the conceptual information model, the main principles of developing the
relation databases are provided and user’s information needs are considered. According to the
methodology, the process of designing the conceptual information model includes three basic
stages, which are defined in detail. Finally, the article describes the process of performing the
results of analyzing user’s information needs and the rationale for use of classifiers.
1. Introduction
Management information systems are among the most important components of information
technologies (IT), used in a company. They are usually classified by the functions into the following
systems: Manufacturing Execution Systems (MES), Human Resource Management (HRM), Enterprise
Content Management (ECM), Customer Relationship Management (CRM), etc. [1]. Such systems are
used a special structured database and are required for reengineering of the whole enterprise
management system, while the integration makes it difficult to use them. These systems are expensive
enough and particularly devel.
BI Architecture in support of data qualityTom Breur
Business intelligence (BI) projects that involve substantial data integration have often proven failure prone and difficult to plan. Data quality issues trigger rework, which makes it difficult to accurately schedule deliverables. Two things can bring improvement. Firstly, one should deliver information products in the smallest possible chunks, but without adding prohibitive overhead for breaking up the work in tiny increments. This will increase the frequency and improve timeliness of feedback on suitability of information products and hence make planning and progress more predictable. Secondly, BI teams need to provide better stewardship when they facilitate discussions between departments whose data cannot easily be integrated. Many so-called data quality errors do not stem from inaccurate source data, but rather from incorrect interpretation of data. This is mostly caused by different interpretation of essentially the same underlying source system facts across departments with misaligned performance objectives. Such problems require prudent stakeholder management and informed negotiations to resolve such differences. In this chapter I suggest an innovation to data warehouse architecture to help accomplish these objectives.
Similar to An ontological approach to handle multidimensional schema evolution for data warehouse (20)
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
An ontological approach to handle multidimensional schema evolution for data warehouse
1. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.3, June 2014
DOI : 10.5121/ijdms.2014.6303 33
AN ONTOLOGICAL APPROACH TO HANDLE
MULTIDIMENSIONAL SCHEMA EVOLUTION
FOR DATA WAREHOUSE
M.Thenmozhi and K.Vivekanandan
Department of Computer Science and Engineering,
Pondicherry Engineering College, Puducherry, India
ABSTRACT
In recent years, the number of digital information storage and retrieval systems has increased immensely.
Data warehousing has been found to be an extremely useful technology for integrating such heterogeneous
and autonomous information sources. Data within the data warehouse is modelled in the form of a star or
snowflake schema which facilitates business analysis in a multidimensional perspective. As user
requirements are interesting measures of business processes, the data warehouse schema is derived from
the information sources and business requirements. Due to the changing business scenario, the information
sources not only change their data, but also change their schema structure. In addition to the source
changes the business requirements for data warehouse may also change. Both these changes results in data
warehouse schema evolution. These changes can be handled either by just updating it in the DW model, or
can be developed as a new version of the DW structure. Existing approaches either deal with source
changes or requirements changes in a manual way and changes to the data warehouse schema is carried
out at the physical level. This may induce high maintenance costs and complex OLAP server
administration. As ontology seems to be a promising solution for the data warehouse research, in this
paper an ontological approach to automate the evolution of a data warehouse schema is proposed. This
method assists the data warehouse designer in handling evolution at the ontological level based on which
decision can be made to carry out the changes at the physical level. We evaluate the proposed ontological
approach with the existing method of manual adaptation of data warehouse schema.
KEYWORDS
Data Warehouse Evolution, Multidimensional Schema Evolution, Data Warehouse Schema Changes
1. INTRODUCTION
Business analysts increasingly rely on data warehouse for making strategic decisions for their
organization. The data warehouse is populated from several operational sources using extraction,
transformation and loading (ETL) operations [8]. In order to discover trends and critical factors
in business, the data warehouse provides multidimensional view of the data which is used by
Online Analytical Processing (OLAP) and other reporting tools [3]. A retail organization for
example, analyzes purchases by its customers to come up with products of likely interest to the
customer. The analytical results generated by these tools need to be accurate and reliable. Hence
2. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.3, June 2014
34
the management of such data warehouse system is rather difficult when compared to traditional
operational systems.
The common assumption is that once created the DW remains static. However, due to changes in
business needs, the DW needs to evolve. Evolution in a DW can be generated by two different
causes: (i) a change in the source schema or (ii) a change in the DW requirements [19]. An
inherent feature of source schema is their evolution in time with respect not only to their contents
(data) but also to their structures (schemas). In addition to this, the business processes in which
the analyst is involved are subject to frequent changes. These changes in business processes are
reflected in the analysis requirements. New types of queries that require different data become
necessary. The new query requirements lead to changes in the DW schema [9]. These changes
need to be propagated to the data warehouse system in order to provide accurate in-sight of the
business data.
Considering the importance of the evolution problem in the data warehouse domain, some
existing works have addressed this issue. In the literature, the DW evolution can be classified
into three different approaches namely schema evolution [1] [2] and schema versioning [14] [15]
[17]. Schema evolution consists in replacing old schema into a new schema as well transferring
old data from old schema and updating it in a new schema. Schema versioning consists in
keeping the history of all versions by temporal extension or by physical storing of different
versions.
Existing works to the data warehouse evolution problems mainly concentrated on handling the
DW schema changes at the physical level. As a consequence, the DW administrator is
responsible for complex maintenance task either for schema evolution approach or versioning
approach. The proposed work gives a different perspective for data warehouse evolution. It helps
the data warehouse designer to give an insight of how the requirement or source changes can be
propagated to the data warehouse schema using ontology. Hence, the designer can make a choice
of propagating the changes to the physical level.
2. LITERATURE SURVEY
In [15] the authors proposed an approach based on versioning called MVTDW (Multiversion
Trajectory Data Warehouse) in order to handle structural changes of the DW. They defined a set
of constraints that need to be guaranteed by the real versions and alternative versions maintained
by the MVTDW. Moreover they proposed some algorithms that can be applied in case of schema
and instance changes on the TDW versions. For MVTDW to answer the queries the user needs to
define the correct version. If data required by a query exists in different versions, it is necessary
provide a data mapping between them. In this paper [17] the authors proposed the metadata
model for a multi-version data warehouse. The metadata is used to describe the data warehouse
schema versions at logical level, their storage in the relational database at physical level,
information about reports defined by users on schema versions, and semantics of data stored in
the data warehouse. The proposed data warehouse evolution framework enables the user to
construct report based on desirable terms and term versions from the semantic metadata. In this
paper [1] the authors formally defined the data warehouse model supporting extended hierarchies.
Different features such as multiple hierarchies, non-covering hierarchies, non-onto hierarchies,
and non-strict hierarchies are explained. They use Uni-level Description Language (ULD) and
multilevel dictionary definition (MDD) in order to model the constructs. In this paper [4] the
3. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.3, June 2014
35
authors developed the multi-version data warehouse (MVDW) approach to manage DW
evolution. Since in MVDW fact and dimension data are physically distributed among multiple
DW versions, the authors proposed a multi-version join index (MVJI) technique. The MVJI has
an upper level for indexing attributes and a lower level for indexing DW versions. The paper also
presents the theoretical upper bound analysis of the MVJI performance characteristic with respect
to I/O operations. In [18] the authors proposed a framework for DW evolution based on
requirements. The framework consists of Requirement level; which keep tracks of changes in
requirements, Change Management level; keeps track of changes in source, Design level; to
perform the required changes in the data warehouse schema, and View level; to define and
generate customized reports and views. In this paper [2] the authors proposed a Rule-based Data
Warehouse (R-DW) model to support data warehouse evolution. They follow a user-driven
approach in order to update the data warehouse schema. Based on users’ knowledge the
aggregated data is represented in the form of “if-then” rules. Using these aggregation rules the
granularity of the dimension are defined dynamically.
3. BACKGROUND
The following sections describe about the multidimensional model for data warehouse and the
basics of ontology and its applicability in data warehouse domain.
3.1. Multidimensional Model
Data within the data warehouse is represented as Multidimensional model. This “cube” structure
representation makes more compatible logical data representation suitable for On-Line Analytical
Processing (OLAP) and data management. The advantages of dimensional modelling are: (i) To
produce database structures that are easy for end-users to understand and write queries against,
and (ii) To maximize the efficiency of queries. The basic concepts of dimensional modelling are:
facts, dimensions and measures [10]. A fact is a collection of related data items, consisting of
measures and context data. In general it represents business items or business transactions of a
particular domain. A dimension is a collection of data that describe one business dimension. It
determines the contextual background for the facts and they are the parameters over which we
want to perform OLAP operations. A measure represents the numeric attribute of a fact which
provides the performance or behaviour of the business relative to the dimensions. There are two
basic models that are used in dimensional modelling: Star model and Snowflake model [3]. The
basic structure for a dimensional model is the star model. It has one large central table (fact table)
and a set of smaller tables (dimensions) arranged in a radial pattern around the central table as
shown in Figure 1. The Figure 1 represents the star schema the sales domain. The snowflake
model is the result of decomposing one or more of the dimensions. There exist many-to-one
relationships among sets of attributes of a dimension which can separate new dimension tables,
forming a hierarchy as shown in Figure 2. The decomposed snowflake structure visualizes the
hierarchical structure of dimensions very well.
4. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.3, June 2014
36
Figure 1. Star Schema
Figure 2. Snow Flake Schema
3.2. Ontology Basics
A relational database is a structured, formal representation of a set of data; in the same way,
ontology is a structured, formal representation of an area of knowledge. It defines and restricts
what can be said about that area of knowledge. Ontology is most commonly defined as “a formal,
explicit specification of a shared conceptualization” [7]. Ontology describes the knowledge in a
domain in terms of classes and properties, as well as relationships between them”[7]. Basic
building blocks of ontology design include: classes or concepts, properties of each concept
describing various features and attributes of the concept, restrictions on slots (facets). Classes
(Concepts) are abstract groups, sets, or collections of objects. Concepts in the ontology should be
close to objects (physical or logical) and relationships in the domain of interest. The decision in
using an ontology-based approach for data warehouse, instead of using another technology for
example a UML-based approach, lies in the fact that ontology provides an elegant way to
represent concepts using Web Ontology Language (OWL) format. The OWL is an international
standard for encoding and exchanging ontologies [16]. The reason for choosing the OWL is that,
it provides the system with the means of not only representing information but also for automatic
processing of that information.
5. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.3, June 2014
37
4. PROPOSED APPROACH
The objective of the work described in this paper is to propose a methodology that supports an
automatic adaptation of the multidimensional schema when the requirements and source evolves.
Our main idea for the data warehouse evolution is that the whole system is specified at the
conceptual level. Any changes at the source or requirements are propagated to the data warehouse
schema at the ontological level. Hence the proposed approach analyses the impact of a given
change at the logical level, before it is propagated to the data warehouse schema at the physical
level.
In our work we represent the data source, requirements and data warehouse using ontology. The
ontological representation of these entities helps us to automate the evolution task. As the data
source schema changes, the data warehouse schema need to evolve. Any changes at the original
source or requirements are obtained and these are updated at the source and requirement
ontology. Depending on the type of change, the changes are propagated over the data warehouse
ontology. Based on the evolved data warehouse ontology the DW administrator can make a
decision to carry the changes at the physical level. Following are the phases of our methodology:
1. Formalizations of inputs
2. Defining Evolution operators
3. Update change in requirements or source ontology
4. Extract Data Warehouse schema change
5. Propagate change to Data Warehouse schema
6. Check consistency of Data Warehouse schema
4.1. Formalization of Inputs
In this section, we formalize the proposed method in order to standardize and to ensure the
correctness of the DW evolution process. We use OWL ontology to describe the semantics of
different entities involved in our methodology. The reason for using OWL as the utility, instead
of XML, UML or others, is that OWL supports the semantic reasoning, and is better for future
extensions of this work, such as, reasoning-based DW schema evolution. We describe the data
source, requirements and DW schema by ontology.
We illustrate our approach using TPC-H [5] and Star Schema Benchmark (SSB) [12] schemas.
TPC-H is a decision support benchmark which represents our source. The Star Schema
Benchmark is a variation of the TPC-H benchmark, which models the data warehouse for the
TPC-H schema. The following sections describe our approach in detail.
4.1.1. Ontology for data sources
Data sources may contain different types of structures. Hence these sources are mapped to a local
ontology which will express the semantic of the data sources. Following are the rules to map a
database to ontology:
• The database table is mapped to an ontology class.
• If a database table is related to another, then the two tables are mapped to classes with
parents-child relationship.
6. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.3, June 2014
38
• If a database table is related to two tables, then the table is divided into two transferred
classes.
• The primary key is mapped to a data type property of the ontology. The foreign key
would be mapped to an object property of the ontology.
• The attributes of a table are mapped to properties of the equivalent class.
In the case that source data is extract from distributed data sources, more than one initial
ontologies could be generated. Ontology merging [11] is used to merge different initial ontology
together to represent a single integrated source. Figure 3 represents the ontology for the TPC-H
schema. Formally, the data source ontology (DSO) to be used by our approach can be defined:
DSO = {C,DP,OP}
- C is a set of OWL classes representing the tables
- DP is a set of data type properties representing the table attributes
- OP is a set of object type properties representing the relationship between tables
Figure 3. Data Source Ontology
4.1.2. Ontology for Requirements
Requirements play a major role in DW design. We assume that a requirement analysis has been
carried out earlier and the corresponding requirement specification is available for the business
domain. Here we explain the process for constructing the DW requirements ontology (DWRO)
7. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.3, June 2014
39
for semantically describing the requirement glossaries. DWRO is based on the i* framework [6]
for the modelling of goals and information requirements for DWs. The DWRO should be capable
to model the following type of information: i) Strategic goals: These are the main objectives of
the business process (for example, ‘‘increase sales”); ii) Decision goals: They attempt to answer
the question: ‘‘how can a strategic goal be achieved?” (for example, ‘‘determine some kind of
promotion”); iii) Information goals: They attempt to answer the question: ‘‘how can decision
goals be achieved in terms of information required” (for example, ‘‘analyze customer purchases”
or ‘‘examine stocks”). Information requirements for decision makers are derived from
information goals. The multidimensional elements are the process measures under analysis
(Measure stereotype), and context of analysis (Context stereotype). Figure 4 represents the
requirement ontology based on TPC-H benchmark schema. Formally, the DWRO can be defined:
DWRO = {S, I, D, IR, BP,M,C}, where:
- S is a set of OWL classes representing the strategic goals
- I is a set of OWL classes representing the information goals
- D is a set of OWL classes representing the decision goals
- IR is a set of OWL classes representing the information requirements
- BP is a set of object properties representing business process
- M is a set of data type properties representing measures
- C is a set of OWL classes representing the contexts
Figure 4. Data Warehouse Requirement Ontology
8. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.3, June 2014
40
4.1.3. Ontology for Data Warehouse
The SSB data warehouse schema for the TPC-H source schema is given in Figure 5. DW schema
can be formally defined using DWO as given below:
DWO = {F,FP,M,D,DP,RP}
- F is a set of OWL classes representing the fact
- FP is a set of OWL classes representing the fact properties
- M is a set of data type properties representing the measures of the fact
- D is a set of OWL classes representing the dimensions
- DP is a set of data type properties representing the dimension properties
- RP is a set of object properties representing the relationship between facts and
dimensions
Figure 5. Data Warehouse Ontology
4.2. Defining Evolution Operators
In this section we present the set of evolution operators to represent the type of change and
concept changed. The three possible changes that occur are addition, deletion and rename. The
DW elements such as Fact, Dimension, Measures etc., are subject to change. Their equivalent
ontology concepts in DWO need to be changed accordingly. Performing a change over the DWO
may require additional changes to be executed over the ontology. For example, addition of a new
dimension i.e., class to the DWO requires addition of its data property and object property. The
type of change, element changed and additional changes are given in Table 1.
9. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.3, June 2014
41
Table 1. Evolution Operators
Type of
Change
DW Schema Elements
Equivalent
Ontology
Concept
Changed
Elementary Changes
Addition
Table
(Fact, Dimension)
Class
Add Data Property
Add Object Property
Attributes
(Measures, Descriptive)
Data Property
Add Property Domain
Add Property Range
Relationship
(Primary Key, Foreign Key)
Object Property
Add Property Domain
Add Property Range
Deletion
Table
(Fact, Dimension)
Class
Delete Data Property
Delete Object Property
Attributes
(Measures, Descriptive)
Data Property
Delete Property Domain
Delete Property Range
Relationship
(Primary Key, Foreign Key)
Object Property
Delete Property Domain
Delete Property Range
Rename
Table
(Fact, Dimension)
Class
Rename Class
(If required)
Attributes
(Measures, Descriptive)
Data Property
Rename Data Property
(If required)
Relationship
(Primary Key, Foreign Key)
Object Property
Rename Object Property
(If required)
4.3 Update Change in Requirements or Source Ontology
Given the changes in source schema or business requirements, the next step is to update them in
the corresponding ontology. Using an ontology editor such as protégé [13] the changes can be
directly applied over the ontology. When there is change in the business requirements, before the
change is propagated over the DWO, the DSO need to be verified for the existence of the
particular concept. Algorithm 2 SearchOntology verifies the existence of the concept. Steps 1-6
verifies for class existence, steps 7-12 verifies for data property existence, steps 8-17 verifies for
object property existence. If concept not found the business requirements need to be refined.
Figure 6. Algorithm Search Ontology
10. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.3, June 2014
42
For our TPC-H domain a new decision goal “Increase Revenue through promotions” has been
added in the requirement ontology. “Study Revenue by Customer Promotions” is the information
goal and the corresponding information requirement is “Analyse Revenue based on Customer for
a given Promotions”. The context for the new information requirement is “Customer” and
“Promotions”. The measures for analyzing Revenue are “ExtendedPrice” and “Discount”. Figure
7 highlights the new requirement added to the requirement ontology.
Table 2 represents the details changes of a new table/class “Promotions” and its corresponding
attributes/data properties that have been added to the TPC-H source schema ontology and other
changes such as rename and deletions.
Figure 6. Data Warehouse Requirement Ontology after Updation
Table 2. Change Details.
Data Source
Change
Data Source Ontology
Change
Entity Changed
ADDITION
Table Class Promotion
Attribute Data Property Promotion _p_id
Attribute Data Property Promotion _p_name
Attribute Data Property Promotion _p_category
Attribute Data Property Promotion _p_subcategory
Attribute Data Property Promotion _p_cost
11. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.3, June 2014
43
Attribute Data Property Promotion _p_begdate
Attribute Data Property Promotion _p_enddate
Attribute Data Property Promotion _p_total
RENAME
Attribute Data Property OldName:Customer_c_comment ,
NewName:Customer_c_feedbac
k
Attribute Data Property OldName:Part_p_category,
NewName: Part_p_model
DELETION
Attribute Data Property Customer_c_mktsegment
Attribute Data Property Part_p_container
4.4. Extract Data Warehouse Schema Change
If change type is addition then it is necessary to find the multidimensional element type of the
concept. Figure 7 represents the Algorithm to propagate change. It is used for finding the
multidimensional element. The inputs for Algorithm 3 are DSO, change information,
multidimensional element list available in DWO.
First the algorithm checks whether the concept added is a class and identifies it as a fact or a
dimension or a level. To find whether the class ci is a fact, the range class of ci is obtained. If the
range class exists in DSO and it belongs to a dimension in DWO, ci is likely to be a fact. The data
properties of ci is derived and checked whether it contains enough numerical properties to qualify
as a fact. If ci has n:1 relationship with range class then it is identified as fact (steps 2-16). To
find whether ci is a dimension, the domain class of ci is obtained. If the domain class cj belongs
to a fact in DWO then ci is likely to be a dimension. If the domain class ci has 1:n relationship
with fact then ci is identified as dimension (steps 17-21). To find whether ci is a level, the domain
class of ci is obtained. If the domain class cj belongs to a dimension or level in DWO then ci is
identified as a level (steps 18-24).
Next the algorithm checks whether the concept added is a data property and identifies it as a fact
or a dimension or a level property. The domain d of dpi is obtained. If d is a fact then concept
added is identified as fact property. If d is a dimension then concept added is identified as
dimension property. If d is a level then concept added is identified as level property.
Finally, the algorithm checks whether the concept added is an object property and identifies it as
a fact or a dimension or a level relation. The domain d and range r of opi is obtained. If d is a fact
and r is a dimension then concept added is identified as fact-dimension relation. If d is a
dimension and r is a fact then concept added is identified as dimension-fact relation. If d is a
dimension and r is a level then concept added is identified as dimension-level relation.
In Table 2 the changes related to addition has been listed for the TPC-H domain. By using
Algorithm multidimensional type we are able to identify the multidimensional element type for
the class “Promotions” and its data properties. Table 3 gives the details of data warehouse
multidimensional type of the change that need to be propagated to DWO.
12. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.3, June 2014
44
Figure 7. Algorithm Multidimensional Type
13. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.3, June 2014
45
Table 3. Multidimensional Type of Addition Change.
Data Source Ontology
Change
Entity Changed Multidimensional Type
Class Promotions Dimension Class
Data Property Promotion _p_id Dimension Data Property
Data Property Promotion _p_name Dimension Data Property
Data Property Promotion _p_category Dimension Data Property
Data Property Promotion _p_subcategory Dimension Data Property
Data Property Promotion _p_cost Dimension Data Property
Data Property Promotion _p_begdate Dimension Data Property
Data Property Promotion _p_enddate Dimension Data Property
Data Property Promotion _p_total Dimension Data Property
4.5. Propagate Changes to Data Warehouse
To apply the changes over the DWO, we use different algorithms depending on the type of
change. If the type of change is addition the multidimensional element is identified using the
previous step. For propagating the addition change we apply Algorithm Apply Change Addition
given in Figure 8. If the concept type is a class then we retrieve the list of data properties and
object properties for the class to be added. The new class is added to the DWO and for each data
property its range and domain is included. Similarly, for each object property its range and
domain is included. If the concept type is data property, the new data property is added to the
class in DWO and its range and domain are included accordingly. If the concept type is object
property, the new object property is added to the class in DWO and its range and domain are
included accordingly.
Figure 8. Algorithm Apply Change Addition
14. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.3, June 2014
46
For propagating the deletion change we apply Algorithm Apply Change Deletion given in Figure
9. If the concept type is a class then we delete the class. The corresponding data properties and
object properties for the class is also deleted. If the concept type is data property or object
property it can be directly deleted from the given class.
Figure 9. Algorithm Apply Change Deletion
For propagating the rename change we apply Algorithm Apply Change Rename given in Figure
10. If the concept type is a class or data property or object property for the old concept is deleted
and the new concept name is included in the ontology.
Table 4 presents the changes that are propagated over the DWO. For adding the dimension class
“Promotions”, its domain, range and data properties are added. For adding a dimension data
property say, “Promotion _p_id”, its corresponding domain and range are included in the DWO.
For renaming the Customer dimension data property say, “Customer_c_comment”, its old name
is deleted and new name is added as “Customer_c_feedback”. And for deleting the Customer
dimension data property say, “Customer_c_mktsegment”, its domain and range are deleted from
the DWO. Figure 11 represents the final DWO after the changes are applied.
15. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.3, June 2014
47
Figure 10. Algorithm Apply Change Rename
Table 4. Change Propagation
16. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.3, June 2014
48
Figure 11. Updated Data Warehouse Ontology
4.6. Check the Consistency
This step is used to check the consistency of DWO after the changes are applied. Ontology
reasoner available in protégé [13] is used in order to check the consistency of the given ontology.
Following are the steps used check consistency:
• Load the DWO
• Load the reasoner
• Using the loaded reasoner check the consistency of the ontology
Using the DW designer suggestions any inconsistency can be resolved for the DWO. The
modified DWO can be kept as a new version of DW schema.
5. EVALUATION OF THE PROPOSED APPROACH
In order to evaluate the efficiency of our approach we examine the cost of manually handling
evolution at the physical level with respect to our ontological approach for handling evolution.
17. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.3, June 2014
49
The manual effort comprises of detection, inspection and where necessary the rewriting of
affected activities by an event.
Human effort for manual handling of schema evolution for a change c, over an event e, is
expressed as:
Where,
AX = no. of changes c, affected by event e, that is manually detected
RX = no. of changes c, which must be manually updated to event e
For a set of evolution operators O, in an activity A, the overall cost of manual adaption to the
change c, for an event e is given as:
Automatic handling of schema evolution using the proposed ontological approach is quantified as
a sum of no. of changes imposed on the DW schema CS and cost of manually discovering and
adjusting activities AMC that escape the automation Ad, The latter cost AMC is expressed as:
The overall cost of automated adoption is given by,
CAA = CS + AMC
The set of evolution operations occurred in the TPC_H source schema included addition of
attributes and table renaming of attributes and table, deletion of attributes and table. A total
number of 849 evolution operations where encountered and the distribution of occurrence per
kind of operation is shown in Figure 12.
Figure 12. Distribution of occurrence per kind of evolution operations
∑ ∑∈ ∈
=
Oc Ae
e
c
d
MCAMC
18. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.3, June 2014
50
In Table 4, we summarize our results for different kinds of events. First, we note that most of the
activities were affected by attribute additions and renaming, since these kinds of operations were
the most common in our scenarios. Most important, we can conclude that our framework can
effectively adapt activities to the examined kinds of operations. Figure 13 and Figure 14 shows
the comparison of no. of entities affected and that are corrected by using the proposed approach
by taking the evolution operators along the x-axis and of no. of entities along the y-axis.
Table 4. Affected and Corrected operations.
Evolution Event
Type Total Affected Total Corrected
Attribute Addition 210 206
Attribute Deletion 189 189
Attribute Rename 196 194
Table Addition 98 95
Table Deletion 84 83
Table Rename 72 69
Figure 13. No. of Attributes Affected and Corrected Status
Figure 14. No. of Tables Affected and Corrected Status
19. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.3, June 2014
51
From the Figure15 it is found that the cost of automated adaptation (CAA) for the proposed
approach is comparatively less than that of manual cost of adoption (CMA) in the existing
approach.
Figure 15. Comparison of Adaptation cost for existing and proposed approach
6. CONCLUSIONS
The data warehouse is considered as the core component of the modern decision support systems.
As the information sources and business requirements from which the data warehouse is derived
frequently change, it may have its impact on the data warehouse schema. The existing works on
DW evolution such as schema versioning and schema evolution mainly concentrate on changing
the schema structure at the physical level. The proposed approach handles evolution of the data
warehouse schema at the ontological level. The ontological representation of the data source,
requirements and data warehouse schema helps us to provide automation (semi-automation) of
evolution task. The impact that the evolution has brought over the data warehouse schema are
analyzed and the designer is left with the choice of carrying the changes over the existing
physical schema of the data warehouse. Compared to existing approaches of manually handling
the evolution task the proposed ontological approach provides minimal adaptation cost.
REFERENCES
[1] Banerjee, S., & Davis, K. C. (2009). Modeling data warehouse schema evolution over extended
hierarchy semantics. In Journal on Data Semantics XIII (pp. 72-96). Springer Berlin Heidelberg
[2] Bentayeb, F., Favre, C., & Boussaid, O. (2008). A user-driven data warehouse evolution approach for
concurrent personalized analysis needs. Integrated Computer-Aided Engineering, 15(1), 21-36.
[3] Chaudhuri, S., & Dayal, U. (1997). An overview of data warehousing and OLAP technology. ACM
Sigmod record, 26(1), 65-74.
[4] Chmiel, J., Morzy, T., & Wrembel, R. (2009). Multiversion join index for multiversion data
warehouse. Information and Software Technology, 51(1), 98-108.
[5] Council, T. P. P. (2008), TPC-H benchmark specification.[Online] www.tpc.org/tpch/ (Accessed 20
May 2014).
[6] Glorio, O., Pardillo, J., Mazón, J. N., & Trujillo, J. (2008, September). Dawara: An eclipse plugin for
using i* on data warehouse requirement analysis.In International Requirements Engineering, 2008.
RE'08. 16th IEEE (pp. 317-318). IEEE.
20. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.3, June 2014
52
[7] Gruber, T. R. (1993). A translation approach to portable ontology specifications.Knowledge
acquisition, Vol. 5 No. 2, pp. 199-220.
[8] Inmon, W. H. (2005). Building the data warehouse. John wiley & so Golfarelli, M., Maio, D., &
Rizzi, S. (1998). The dimensional fact model: Aconceptual model for data warehouses. International
Journal of CooperativeInformation Systems, 7(02n03), 215-247.
[9] Janet, E., Ramirez, R., Guerrero, E.: A Model and Language for Bi-temporal Schema Versioning in
Data Warehouses. (2006). In Proceedings of the 15th International Conference on Computing (CIC
'06). IEEE Computer Society, 309-314.
[10] Kimball, Ralph, The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, New
York, NY: John Wiley and Sons, Inc., 2002. 436pp.
[11] Noy, N. F., & Musen, M. A. (2000, August). Algorithm and tool for automated ontology merging and
alignment. In Proceedings of the 17th National Conference on Artificial Intelligence (AAAI-00).
Available as SMI technical report SMI-2000-0831.
[12] O’Neil, P., O’Neil, E. J., & Chen, X. (2007). The star schema benchmark [Online]
http://www.cs.umb.edu/∼poneil/StarSchemaB.PDF (Accessed 20 May 2014).
[13] Ontology, P. (2007). Knowledge Acquisition System. See [Online] http://protege. stanford. edu.
(Accessed 12 November 2013).
[14] Oueslati, W., & Akaichi, J. (2010). A survey on Data warehouse evolution. International Journal of
Database Management Systems (IJDMS), 2(4), 11-24.
[15] Oueslati, W., & Akaichi, J. (2011). A Multiversion Trajectory Data Warehouse to Handle Structure
Changes. International Journal of Database Theory & Application, 4(2).
[16] Smith, M. K., Welty, C., & McGuinness, D. L. (2004). OWL Web OntologyLanguage Guide. W3C.
[online] http://www.w3.org/TR/owl-guide/. (Accessed 12 November 2013).
[17] Solodovnikova, D., & Niedrite, L. (2011). Evolution-Oriented User-Centric Data Warehouse. In
Information Systems Development (pp. 721-734). Springer New York.
[18] Thakur, G., & Gosain, A. (2011). DWEVOLVE: a requirement based framework for data warehouse
evolution. ACM SIGSOFT Software Engineering Notes,36(6), 1-8.
[19] Wrembel, R. (2011). On Handling the Evolution of External Data Sources in a Data Warehouse
Architecture.
AUTHORS
M.Thenmozhi received her B.Tech in Computer Science and Engineering from Pondicherry University and
M.E in Computer Science and Engineering from Anna University. She is currently pursuing her Ph.D in
Computer Science and Engineering, from Pondicherry Engineering College affiliated to Pondicherry
University. Presently she is working as Assistant Professor in Department of Computer Science and
Engineering, Pondicherry Engineering College. Her research interest includes Data warehousing, Data
Modeling, Data mining and Ontology.
Dr.K.Vivekanandan received his B.E from Bharathiyar University, M.Tech from Indian Institute of
Technology, Bombay and Ph.D from Pondicherry University. He has been the faculty of Department of
Computer Science and Engineering, Pondicherry Engineering College from 1992. Presently he is working
as Professor in the Department of Computer Science and Engineering. His research interest includes
Software Engineering, Object Oriented Systems, Information Security and Web Services. He has
coordinated two AICTE sponsored RPS projects on “Developing Product Line Architecture and
Components for e-Governance Applications of Indian Context” and “Development of a framework for
designing WDM Optical Network”. He has published more than 60 papers in International Conferences
and Journals.