Most organizations rely on data in their daily transactions and operations. This data is retrieved from different source
systems in a distributed network hence it comes in varying data types and formats. The source data is prepared and cleaned by
subjecting it to algorithms and functions before transferring it to the target systems which takes more time. Moreover, there is pressure
from data users within the data warehouse for data to be availed quickly for them to make appropriate decisions and forecasts. This has
not been the case due to immense data explosion in millions of transactions resulting from business processes of the organizations. The
current legacy systems cannot handle large data levels due to processing capabilities and customizations. This approach has failed
because there lacks clear procedures to decide which data to collect or exempt. It is with this concern that performance degradation
should be addressed because organizations invest a lot of resources to establish a functioning data warehouse. Data staging is a
technological innovation within data warehouses where data manipulations are carried out before transfer to target systems. It carries
out data integration by harmonizing the staging functions, cleansing, verification, and archiving source data. Deterministic
Prioritization Approach will be employed to enhance data staging, and to clearly prove this change Experiment design is needed to test
scenarios in the study. Previous studies in this field have mainly focused in the data warehouses processes as a whole but less to the
specifics of data staging area.
A Data Quality Model for Asset Management in Engineering OrganisationsCyrus Sorab
Abstract: Data Quality (DQ) is a critical issue for effective asset management. DQ problems can result in severe negative consequences for an organisation. This paper aims to explore DQ issues associated with the implementation of Enterprise Asset Management (EAM) systems.
Completed Paper - Authors
Andy Koronios
University of South Australia
andy.koronios@unisa.edu.au
Shien Lin
University of South Australia
shien.lin@unisa.edu.au
Jing Gao
University of South Australia
jing.gao@unisa.edu.au
Paper available for download here - https://pdfs.semanticscholar.org/cb3c/9209e2909033e21019157d279d5363b0db76.pdf
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREEijcsa
Now a day’s every second trillion of bytes of data is being generated by enterprises especially in internet.To achieve the best decision for business profits, access to that data in a well-situated and interactive way is always a dream of business executives and managers. Data warehouse is the only viable solution that can bring the dream into veracity. The enhancement of future endeavours to make decisions depends on the availability of correct information that is based on quality of data underlying. The quality data can only be produced by cleaning data prior to loading into data warehouse since the data collected from different sources will be dirty. Once the data have been pre-processed and cleansed then it produces accurate results on applying the data mining query. Therefore the accuracy of data is vital for well-formed and reliable decision making. In this paper, we propose a framework which implements robust data quality to ensure consistent and correct loading of data into data warehouses which ensures accurate and reliable data analysis, data mining and knowledge discovery.
This document describes the development of a Construction Management Decision Support System (CMDSS) that integrates a data warehouse and decision support system (DSS) to provide construction managers with timely analysis reports and insights to support both long-term and short-term decision making. It first reviews data warehouses, online analytical processing (OLAP), and DSS. It then outlines the design of the data warehouse using a star schema with fact and dimension tables, and the transformation of data into multidimensional cubes using OLAP for analysis. Finally, it discusses the design of the DSS frontend and reporting tools to enable construction managers to directly access and analyze data to make more informed decisions.
Study on potential capabilities of a nodb systemijitjournal
There is a need of optimal data to query processing technique to handle the increasing database size,
complexity, diversity of use. With the introduction of commercial website, social network, expectations are
that the high scalability, more flexible database will replace the RDBMS. Complex application and Big
Table require highly optimized queries. Users are facing the increasing bottlenecks in their data analysis. A
growing part of the database community recognizes the need for significant and fundamental changes to
database design. A new philosophy for creating database systems called noDB aims at minimizing the datato-
query time, most prominently by removing the need to load data before launching queries. That will
process queries without any data preparation or loading steps. There may not need to store data. User can
pipe raw data from websites, DBs, excel sheets into two promise sample inputs without storing anything.
This study is based on PostgreSQL systems. A series of the baseline experiment are executed to evaluate the
Performance of this system as per -a. Data loading cost, b-Query processing timing, c-Avoidance of
Collision and Deadlock, d-Enabling the Big data storage and e-Optimize query processing etc. The study
found significant possible capabilities of noDB system over the traditional database management system.
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...TELKOMNIKA JOURNAL
Data warehouse is a collective entity of data from various data sources. Data are prone to several
complications and irregularities in data warehouse. Data cleaning service is non trivial activity to ensure
data quality. Data cleaning service involves identification of errors, removing them and improve the quality of
data. One of the common methods is duplicate elimination. This research focuses on the service of duplicate
elimination on local data. It initially surveys data quality focusing on quality problems, cleaning methodology,
involved stages and services within data warehouse environment. It also provides a comparison through
some experiments on local data with different cases, such as different spelling on different pronunciation,
misspellings, name abbreviation, honorific prefixes, common nicknames, splitted name and exact match. All
services are evaluated based on the proposed quality of service metrics such as performance, capability to
process the number of records, platform support, data heterogeneity, and price; so that in the future these
services are reliable to handle big data in data warehouse.
IRJET- Analysis of Big Data Technology and its ChallengesIRJET Journal
This document discusses big data technology and its challenges. It begins by defining big data as large, complex data sets that are growing exponentially due to increased internet usage and digital data collection. It then outlines the key steps in big data analysis: acquisition, assembly, analysis, and action. Several big data technologies that support these steps are described, including Hadoop, MapReduce, Pig, and Hive. The document concludes by examining challenges of big data such as storage limitations, data lifecycle management, analysis difficulties, and ensuring data privacy and security when analyzing large datasets.
A Survey of Agent Based Pre-Processing and Knowledge RetrievalIOSR Journals
Abstract: Information retrieval is the major task in present scenario as quantum of data is increasing with a
tremendous speed. So, to manage & mine knowledge for different users as per their interest, is the goal of every
organization whether it is related to grid computing, business intelligence, distributed databases or any other.
To achieve this goal of extracting quality information from large databases, software agents have proved to be
a strong pillar. Over the decades, researchers have implemented the concept of multi agents to get the process
of data mining done by focusing on its various steps. Among which data pre-processing is found to be the most
sensitive and crucial step as the quality of knowledge to be retrieved is totally dependent on the quality of raw
data. Many methods or tools are available to pre-process the data in an automated fashion using intelligent
(self learning) mobile agents effectively in distributed as well as centralized databases but various quality
factors are still to get attention to improve the retrieved knowledge quality. This article will provide a review of
the integration of these two emerging fields of software agents and knowledge retrieval process with the focus
on data pre-processing step.
Keywords: Data Mining, Multi Agents, Mobile Agents, Preprocessing, Software Agents
Lecture 02 - The Data Warehouse Environment phanleson
The document summarizes key concepts from Chapter 2 of the book "Building Data WareHouse" by Inmon. It discusses characteristics of a data warehouse including being subject-oriented, integrated from multiple sources, and containing time-variant data. It also covers topics like granularity, exploration and data mining capabilities, partitioning, structuring data, auditing, homogeneity vs heterogeneity, purging data, reporting, operational windows, and handling incorrect data.
A Data Quality Model for Asset Management in Engineering OrganisationsCyrus Sorab
Abstract: Data Quality (DQ) is a critical issue for effective asset management. DQ problems can result in severe negative consequences for an organisation. This paper aims to explore DQ issues associated with the implementation of Enterprise Asset Management (EAM) systems.
Completed Paper - Authors
Andy Koronios
University of South Australia
andy.koronios@unisa.edu.au
Shien Lin
University of South Australia
shien.lin@unisa.edu.au
Jing Gao
University of South Australia
jing.gao@unisa.edu.au
Paper available for download here - https://pdfs.semanticscholar.org/cb3c/9209e2909033e21019157d279d5363b0db76.pdf
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREEijcsa
Now a day’s every second trillion of bytes of data is being generated by enterprises especially in internet.To achieve the best decision for business profits, access to that data in a well-situated and interactive way is always a dream of business executives and managers. Data warehouse is the only viable solution that can bring the dream into veracity. The enhancement of future endeavours to make decisions depends on the availability of correct information that is based on quality of data underlying. The quality data can only be produced by cleaning data prior to loading into data warehouse since the data collected from different sources will be dirty. Once the data have been pre-processed and cleansed then it produces accurate results on applying the data mining query. Therefore the accuracy of data is vital for well-formed and reliable decision making. In this paper, we propose a framework which implements robust data quality to ensure consistent and correct loading of data into data warehouses which ensures accurate and reliable data analysis, data mining and knowledge discovery.
This document describes the development of a Construction Management Decision Support System (CMDSS) that integrates a data warehouse and decision support system (DSS) to provide construction managers with timely analysis reports and insights to support both long-term and short-term decision making. It first reviews data warehouses, online analytical processing (OLAP), and DSS. It then outlines the design of the data warehouse using a star schema with fact and dimension tables, and the transformation of data into multidimensional cubes using OLAP for analysis. Finally, it discusses the design of the DSS frontend and reporting tools to enable construction managers to directly access and analyze data to make more informed decisions.
Study on potential capabilities of a nodb systemijitjournal
There is a need of optimal data to query processing technique to handle the increasing database size,
complexity, diversity of use. With the introduction of commercial website, social network, expectations are
that the high scalability, more flexible database will replace the RDBMS. Complex application and Big
Table require highly optimized queries. Users are facing the increasing bottlenecks in their data analysis. A
growing part of the database community recognizes the need for significant and fundamental changes to
database design. A new philosophy for creating database systems called noDB aims at minimizing the datato-
query time, most prominently by removing the need to load data before launching queries. That will
process queries without any data preparation or loading steps. There may not need to store data. User can
pipe raw data from websites, DBs, excel sheets into two promise sample inputs without storing anything.
This study is based on PostgreSQL systems. A series of the baseline experiment are executed to evaluate the
Performance of this system as per -a. Data loading cost, b-Query processing timing, c-Avoidance of
Collision and Deadlock, d-Enabling the Big data storage and e-Optimize query processing etc. The study
found significant possible capabilities of noDB system over the traditional database management system.
Data Cleaning Service for Data Warehouse: An Experimental Comparative Study o...TELKOMNIKA JOURNAL
Data warehouse is a collective entity of data from various data sources. Data are prone to several
complications and irregularities in data warehouse. Data cleaning service is non trivial activity to ensure
data quality. Data cleaning service involves identification of errors, removing them and improve the quality of
data. One of the common methods is duplicate elimination. This research focuses on the service of duplicate
elimination on local data. It initially surveys data quality focusing on quality problems, cleaning methodology,
involved stages and services within data warehouse environment. It also provides a comparison through
some experiments on local data with different cases, such as different spelling on different pronunciation,
misspellings, name abbreviation, honorific prefixes, common nicknames, splitted name and exact match. All
services are evaluated based on the proposed quality of service metrics such as performance, capability to
process the number of records, platform support, data heterogeneity, and price; so that in the future these
services are reliable to handle big data in data warehouse.
IRJET- Analysis of Big Data Technology and its ChallengesIRJET Journal
This document discusses big data technology and its challenges. It begins by defining big data as large, complex data sets that are growing exponentially due to increased internet usage and digital data collection. It then outlines the key steps in big data analysis: acquisition, assembly, analysis, and action. Several big data technologies that support these steps are described, including Hadoop, MapReduce, Pig, and Hive. The document concludes by examining challenges of big data such as storage limitations, data lifecycle management, analysis difficulties, and ensuring data privacy and security when analyzing large datasets.
A Survey of Agent Based Pre-Processing and Knowledge RetrievalIOSR Journals
Abstract: Information retrieval is the major task in present scenario as quantum of data is increasing with a
tremendous speed. So, to manage & mine knowledge for different users as per their interest, is the goal of every
organization whether it is related to grid computing, business intelligence, distributed databases or any other.
To achieve this goal of extracting quality information from large databases, software agents have proved to be
a strong pillar. Over the decades, researchers have implemented the concept of multi agents to get the process
of data mining done by focusing on its various steps. Among which data pre-processing is found to be the most
sensitive and crucial step as the quality of knowledge to be retrieved is totally dependent on the quality of raw
data. Many methods or tools are available to pre-process the data in an automated fashion using intelligent
(self learning) mobile agents effectively in distributed as well as centralized databases but various quality
factors are still to get attention to improve the retrieved knowledge quality. This article will provide a review of
the integration of these two emerging fields of software agents and knowledge retrieval process with the focus
on data pre-processing step.
Keywords: Data Mining, Multi Agents, Mobile Agents, Preprocessing, Software Agents
Lecture 02 - The Data Warehouse Environment phanleson
The document summarizes key concepts from Chapter 2 of the book "Building Data WareHouse" by Inmon. It discusses characteristics of a data warehouse including being subject-oriented, integrated from multiple sources, and containing time-variant data. It also covers topics like granularity, exploration and data mining capabilities, partitioning, structuring data, auditing, homogeneity vs heterogeneity, purging data, reporting, operational windows, and handling incorrect data.
Role of Data Cleaning in Data WarehouseRamakant Soni
Data cleaning is an essential part of building a data warehouse as it improves data quality by detecting and removing errors and inconsistencies. Data warehouses integrate large amounts of data from various sources, so the probability of dirty data is high. Clean data is vital for decision making based on the data warehouse. The data cleaning process involves data analysis, defining transformation rules, verification of cleaning, applying transformations, and incorporating cleaned data. Tools can help support the different phases of data cleaning from data profiling to specialized cleaning of particular domains.
This document discusses big data characteristics, issues, challenges, and technologies. It describes the key characteristics of big data as volume, velocity, variety, value, and complexity. It outlines issues related to these characteristics like data volume and velocity. Challenges of big data include privacy and security, data access and sharing, analytical challenges, human resources, and technical challenges around fault tolerance, scalability, data quality, and heterogeneous data. The document also discusses technologies used for big data like Hadoop, HDFS, and cloud computing and provides examples of big data projects.
Data Warehouses store integrated and consistent data in a subject-oriented data repository dedicated
especially to support business intelligence processes. However, keeping these repositories updated usually
involves complex and time-consuming processes, commonly denominated as Extract-Transform-Load tasks.
These data intensive tasks normally execute in a limited time window and their computational requirements
tend to grow in time as more data is dealt with. Therefore, we believe that a grid environment could suit
rather well as support for the backbone of the technical infrastructure with the clear financial advantage of
using already acquired desktop computers normally present in the organization. This article proposes a
different approach to deal with the distribution of ETL processes in a grid environment, taking into account
not only the processing performance of its nodes but also the existing bandwidth to estimate the grid
availability in a near future and therefore optimize workflow distribution.
Applying Classification Technique using DID3 Algorithm to improve Decision Su...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
The document discusses the systems life cycle, which describes the stages of an IT project. The six main stages are analysis, design, development and testing, implementation, documentation, and evaluation. In the analysis stage, methods like observation, examining documents, questionnaires, and interviews are used to understand the current system and identify problems. In the design stage, requirements are specified and designs are made for hardware/software, data collection, reports, screens, and validation routines. In development and testing, data structures and programs are created and extensively tested. Finally, in implementation, the new system is transitioned into use, either through parallel running, direct changeover, or phased implementation. The systems life cycle aims to maximize the chances of a
Efficient Cost Minimization for Big Data ProcessingIRJET Journal
This document discusses efficient cost minimization techniques for big data processing. It characterizes big data processing using a two-dimensional Markov chain model to evaluate expected completion time. The problem is formulated as a mixed non-linear programming problem to optimize data assignment, placement, and migration across distributed data centers. A weighted bloom filter approach is presented to reduce communication costs through distributed incomplete pattern matching.
Characterizing and Processing of Big Data Using Data Mining TechniquesIJTET Journal
The document discusses big data and techniques for processing it, including data mining. It begins by defining big data and its key characteristics of volume, variety, and velocity. It then discusses various data mining techniques that can be used to process big data, including clustering, classification, and prediction. It introduces the HACE theorem for characterizing big data based on its huge size, heterogeneous and diverse sources, decentralized control, and complex relationships within the data. The document proposes a big data processing model involving data set aggregation, pre-processing, connectivity-based clustering, and subset selection to efficiently retrieve relevant data. It evaluates the performance of subset selection versus deterministic search methods.
Certain Investigation on Dynamic Clustering in Dynamic Dataminingijdmtaiir
Clustering is the process of grouping a set of objects
into classes of similar objects. Dynamic clustering comes in a
new research area that is concerned about dataset with dynamic
aspects. It requires updates of the clusters whenever new data
records are added to the dataset and may result in a change of
clustering over time. When there is a continuous update and
huge amount of dynamic data, rescan the database is not
possible in static data mining. But this is possible in Dynamic
data mining process. This dynamic data mining occurs when
the derived information is present for the purpose of analysis
and the environment is dynamic, i.e. many updates occur.
Since this has now been established by most researchers and
they will move into solving some of the problems and the
research is to concentrate on solving the problem of using data
mining dynamic databases. This paper gives some
investigation of existing work done in some papers related with
dynamic clustering and incremental data clustering
Big Data triggered furthered an influx of research and prospective on concepts and processes pertaining previously to the Data Warehouse field. Some conclude that Data Warehouse as such will disappear;others present Big Data as the natural Data Warehouse evolution (perhaps without identifying a clear division between the two); and finally, some others pose a future of convergence, partially exploring the
possible integration of both. In this paper, we revise the underlying technological features of Big Data and Data Warehouse, highlighting their differences and areas of convergence. Even when some differences exist, both technologies could (and should) be integrated because they both aim at the same purpose: data exploration and decision making support. We explore some convergence strategies, based on the common elements in both technologies. We present a revision of the state-of-the-art in integration proposals from the point of view of the purpose, methodology, architecture and underlying technology, highlighting the
common elements that support both technologies that may serve as a starting point for full integration and we propose a proposal of integration between the two technologies.
Data Linkage is an important step that can provide valuable insights for evidence-based decision making, especially for crucial events. Performing sensible queries across heterogeneous databases containing millions of records is a complex task that requires a complete understanding of each contributing database’s schema to define the structure of its information. The key aim is to approximate the structure
and content of the induced data into a concise synopsis in order to extract and link meaningful data-driven facts. We identify such problems as four major research issues in Data Linkage: associated costs in pairwise matching, record matching overheads, semantic flow of information restrictions, and single order classification limitations. In this paper, we give a literature review of research in Data Linkage. The
purpose for this review is to establish a basic understanding of Data Linkage, and to discuss the
background in the Data Linkage research domain. Particularly, we focus on the literature related to the recent advancements in Approximate Matching algorithms at Attribute Level and Structure Level. Their efficiency, functionality and limitations are critically analysed and open-ended problems have been
exposed.
IRJET- Fault Detection and Prediction of Failure using Vibration AnalysisIRJET Journal
This document discusses fault detection and prediction of failures in rotating equipment using vibration analysis. It begins by introducing vibration analysis as a method to monitor machines and detect faults in rotating components that may cause failures. It then discusses how motor vibration is measured and analyzed using techniques like spectrum analysis to identify faults like unbalance, bearing issues, or broken rotor bars. The document proposes decomposing vibration signals using intrinsic mode functions and calculating the Gabor representation's frequency marginal to identify fault types using classifiers like support vector machines or random forests. It provides context on data mining techniques relevant to this type of fault prediction problem.
The document discusses building a data warehouse by migrating data from legacy systems using an iterative methodology. It emphasizes the importance of high quality metadata to handle changes during the migration process and minimize errors. Uniform data access times across all machines are optimal for parallel query execution to avoid data skew. The crossbar switch architecture connects all machines equally, eliminating data skew issues seen in other architectures.
Topics in Data Management include data analysis, database management systems, data modeling, database administration, data warehousing, data mining, data quality assurance, data security, and data architecture. Data analysis involves looking at and summarizing data to extract useful information and develop conclusions. Database management systems are used to manage databases and are used by over 90% of people using computers. Data modeling is the process of structuring and organizing data to be implemented in a database. Database administrators are responsible for ensuring the security, performance, and availability of organizational data.
Review of the Introduction and Use of RFIDEditor IJCATR
We live in an age where technology is an integral part of human daily life. The aim of the technology is secure, accurate,
correct use of time for mankind and nature brings it may also have disadvantages and challenges. In this paper we describe the RFID
technology. Today, this technology in hospitals, shops, airports, tracking birds are used. It can be used in hospital intensive care unit
for monitoring and remote patient care, which causes it to print patients and physicians are not required to comply with very close
distance, the result of which can cause safety Patients and physicians should be. The security technology implementation is a
challenge. This paper introduces the applications, the challenges we have this technology.
Crustal Structure from Gravity and Magnetic Anomalies in the Southern Part of...Editor IJCATR
The gravity and magnetic data along the profile across the southern part of the Cauvery basin have been
collected and the data is interpreted for crustal structure depths.The first profile is taken from Karikudito
Embalecovering a distance of 50 km. The gravity lows and highs have clearly indicated various sub-basins and ridges.
The density logs from ONGC, Chennai, show that the density contrast decreases with depth in the sedimentary basin,
and hence, the gravity profiles are interpreted using variable density contrast with depth. From the Bouguer gravity
anomaly, the residual anomaly is constructed by graphical method correlating with well data and subsurface geology.
The residual anomaly profiles are interpreted using polygon and prismatic models. The maximum depths to the granitic
gneiss basement are obtained as 3.00 km. The regional anomaly is interpreted as Moho rise towards coast. The
aeromagnetic anomaly profiles are also interpreted for charnockite basement below the granitic gneiss group of rocks
using prismatic model.
George eldon ladd teologia do novo testamento.(pg 1 à 211)dyguo
O documento discute a importância da educação para o desenvolvimento econômico e social de um país. Ele argumenta que investimentos em educação melhoram a produtividade e a competitividade da força de trabalho, levando a maiores taxas de crescimento econômico a longo prazo.
Facile Synthesis and Characterization of Pyrolusite, β- MnO2, Nano Crystal wi...Editor IJCATR
MnO2 nanoparticles have been synthesized by a simple combustion method using MnSO4.4H2O. The crystalline phase,
morphology, optical property and magnetic property of the as prepared nanoparticle were characterized using XRD, FT-IR, FTRaman,
SEM, UV-Vis, PL and VSM respectively. Structural studies by XRD indicate that the synthesized material as tetragonal rutile
crystal structure. FT-IR and FT-Raman analysis revealed the stretching vibrations of metal ions in tetrahedral co-ordination confirming
the crystal structure. The PL and UV analysis having an emission band at 390 nm, showed a prominent blue peak at 453 nm as well as
a green emission lines at 553 nm with band gap energy of 3.2eV. Magnetic measurements indicate that the Néel temperature of the β-
MnO2 structures is 92.5K for Hc = 100 Oe which showed antiferromagnetic behaviour
Evaluation of Iris Recognition System on Multiple Feature Extraction Algorith...Editor IJCATR
Multi-algorithmic approach to enhancing the accuracy of iris recognition system is proposed and
investigated. In this system, features are extracted from the iris using various feature extraction algorithms,
namely LPQ, LBP, Gabor Filter, Haar, Db8 and Db16. Based on the experimental results, it is demonstrated
that Mutli-algorithms Iris Recognition System is performing better than the unimodal system. The accuracy
improvement offered by the proposed approach also showed that using more than two feature extraction
algorithms in extracting the iris system might decrease the system performance. This is due to redundant
features. The paper presents a detailed description of the experiments and provides an analysis of the
performance of the proposed method.
Role of Data Cleaning in Data WarehouseRamakant Soni
Data cleaning is an essential part of building a data warehouse as it improves data quality by detecting and removing errors and inconsistencies. Data warehouses integrate large amounts of data from various sources, so the probability of dirty data is high. Clean data is vital for decision making based on the data warehouse. The data cleaning process involves data analysis, defining transformation rules, verification of cleaning, applying transformations, and incorporating cleaned data. Tools can help support the different phases of data cleaning from data profiling to specialized cleaning of particular domains.
This document discusses big data characteristics, issues, challenges, and technologies. It describes the key characteristics of big data as volume, velocity, variety, value, and complexity. It outlines issues related to these characteristics like data volume and velocity. Challenges of big data include privacy and security, data access and sharing, analytical challenges, human resources, and technical challenges around fault tolerance, scalability, data quality, and heterogeneous data. The document also discusses technologies used for big data like Hadoop, HDFS, and cloud computing and provides examples of big data projects.
Data Warehouses store integrated and consistent data in a subject-oriented data repository dedicated
especially to support business intelligence processes. However, keeping these repositories updated usually
involves complex and time-consuming processes, commonly denominated as Extract-Transform-Load tasks.
These data intensive tasks normally execute in a limited time window and their computational requirements
tend to grow in time as more data is dealt with. Therefore, we believe that a grid environment could suit
rather well as support for the backbone of the technical infrastructure with the clear financial advantage of
using already acquired desktop computers normally present in the organization. This article proposes a
different approach to deal with the distribution of ETL processes in a grid environment, taking into account
not only the processing performance of its nodes but also the existing bandwidth to estimate the grid
availability in a near future and therefore optimize workflow distribution.
Applying Classification Technique using DID3 Algorithm to improve Decision Su...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
The document discusses the systems life cycle, which describes the stages of an IT project. The six main stages are analysis, design, development and testing, implementation, documentation, and evaluation. In the analysis stage, methods like observation, examining documents, questionnaires, and interviews are used to understand the current system and identify problems. In the design stage, requirements are specified and designs are made for hardware/software, data collection, reports, screens, and validation routines. In development and testing, data structures and programs are created and extensively tested. Finally, in implementation, the new system is transitioned into use, either through parallel running, direct changeover, or phased implementation. The systems life cycle aims to maximize the chances of a
Efficient Cost Minimization for Big Data ProcessingIRJET Journal
This document discusses efficient cost minimization techniques for big data processing. It characterizes big data processing using a two-dimensional Markov chain model to evaluate expected completion time. The problem is formulated as a mixed non-linear programming problem to optimize data assignment, placement, and migration across distributed data centers. A weighted bloom filter approach is presented to reduce communication costs through distributed incomplete pattern matching.
Characterizing and Processing of Big Data Using Data Mining TechniquesIJTET Journal
The document discusses big data and techniques for processing it, including data mining. It begins by defining big data and its key characteristics of volume, variety, and velocity. It then discusses various data mining techniques that can be used to process big data, including clustering, classification, and prediction. It introduces the HACE theorem for characterizing big data based on its huge size, heterogeneous and diverse sources, decentralized control, and complex relationships within the data. The document proposes a big data processing model involving data set aggregation, pre-processing, connectivity-based clustering, and subset selection to efficiently retrieve relevant data. It evaluates the performance of subset selection versus deterministic search methods.
Certain Investigation on Dynamic Clustering in Dynamic Dataminingijdmtaiir
Clustering is the process of grouping a set of objects
into classes of similar objects. Dynamic clustering comes in a
new research area that is concerned about dataset with dynamic
aspects. It requires updates of the clusters whenever new data
records are added to the dataset and may result in a change of
clustering over time. When there is a continuous update and
huge amount of dynamic data, rescan the database is not
possible in static data mining. But this is possible in Dynamic
data mining process. This dynamic data mining occurs when
the derived information is present for the purpose of analysis
and the environment is dynamic, i.e. many updates occur.
Since this has now been established by most researchers and
they will move into solving some of the problems and the
research is to concentrate on solving the problem of using data
mining dynamic databases. This paper gives some
investigation of existing work done in some papers related with
dynamic clustering and incremental data clustering
Big Data triggered furthered an influx of research and prospective on concepts and processes pertaining previously to the Data Warehouse field. Some conclude that Data Warehouse as such will disappear;others present Big Data as the natural Data Warehouse evolution (perhaps without identifying a clear division between the two); and finally, some others pose a future of convergence, partially exploring the
possible integration of both. In this paper, we revise the underlying technological features of Big Data and Data Warehouse, highlighting their differences and areas of convergence. Even when some differences exist, both technologies could (and should) be integrated because they both aim at the same purpose: data exploration and decision making support. We explore some convergence strategies, based on the common elements in both technologies. We present a revision of the state-of-the-art in integration proposals from the point of view of the purpose, methodology, architecture and underlying technology, highlighting the
common elements that support both technologies that may serve as a starting point for full integration and we propose a proposal of integration between the two technologies.
Data Linkage is an important step that can provide valuable insights for evidence-based decision making, especially for crucial events. Performing sensible queries across heterogeneous databases containing millions of records is a complex task that requires a complete understanding of each contributing database’s schema to define the structure of its information. The key aim is to approximate the structure
and content of the induced data into a concise synopsis in order to extract and link meaningful data-driven facts. We identify such problems as four major research issues in Data Linkage: associated costs in pairwise matching, record matching overheads, semantic flow of information restrictions, and single order classification limitations. In this paper, we give a literature review of research in Data Linkage. The
purpose for this review is to establish a basic understanding of Data Linkage, and to discuss the
background in the Data Linkage research domain. Particularly, we focus on the literature related to the recent advancements in Approximate Matching algorithms at Attribute Level and Structure Level. Their efficiency, functionality and limitations are critically analysed and open-ended problems have been
exposed.
IRJET- Fault Detection and Prediction of Failure using Vibration AnalysisIRJET Journal
This document discusses fault detection and prediction of failures in rotating equipment using vibration analysis. It begins by introducing vibration analysis as a method to monitor machines and detect faults in rotating components that may cause failures. It then discusses how motor vibration is measured and analyzed using techniques like spectrum analysis to identify faults like unbalance, bearing issues, or broken rotor bars. The document proposes decomposing vibration signals using intrinsic mode functions and calculating the Gabor representation's frequency marginal to identify fault types using classifiers like support vector machines or random forests. It provides context on data mining techniques relevant to this type of fault prediction problem.
The document discusses building a data warehouse by migrating data from legacy systems using an iterative methodology. It emphasizes the importance of high quality metadata to handle changes during the migration process and minimize errors. Uniform data access times across all machines are optimal for parallel query execution to avoid data skew. The crossbar switch architecture connects all machines equally, eliminating data skew issues seen in other architectures.
Topics in Data Management include data analysis, database management systems, data modeling, database administration, data warehousing, data mining, data quality assurance, data security, and data architecture. Data analysis involves looking at and summarizing data to extract useful information and develop conclusions. Database management systems are used to manage databases and are used by over 90% of people using computers. Data modeling is the process of structuring and organizing data to be implemented in a database. Database administrators are responsible for ensuring the security, performance, and availability of organizational data.
Review of the Introduction and Use of RFIDEditor IJCATR
We live in an age where technology is an integral part of human daily life. The aim of the technology is secure, accurate,
correct use of time for mankind and nature brings it may also have disadvantages and challenges. In this paper we describe the RFID
technology. Today, this technology in hospitals, shops, airports, tracking birds are used. It can be used in hospital intensive care unit
for monitoring and remote patient care, which causes it to print patients and physicians are not required to comply with very close
distance, the result of which can cause safety Patients and physicians should be. The security technology implementation is a
challenge. This paper introduces the applications, the challenges we have this technology.
Crustal Structure from Gravity and Magnetic Anomalies in the Southern Part of...Editor IJCATR
The gravity and magnetic data along the profile across the southern part of the Cauvery basin have been
collected and the data is interpreted for crustal structure depths.The first profile is taken from Karikudito
Embalecovering a distance of 50 km. The gravity lows and highs have clearly indicated various sub-basins and ridges.
The density logs from ONGC, Chennai, show that the density contrast decreases with depth in the sedimentary basin,
and hence, the gravity profiles are interpreted using variable density contrast with depth. From the Bouguer gravity
anomaly, the residual anomaly is constructed by graphical method correlating with well data and subsurface geology.
The residual anomaly profiles are interpreted using polygon and prismatic models. The maximum depths to the granitic
gneiss basement are obtained as 3.00 km. The regional anomaly is interpreted as Moho rise towards coast. The
aeromagnetic anomaly profiles are also interpreted for charnockite basement below the granitic gneiss group of rocks
using prismatic model.
George eldon ladd teologia do novo testamento.(pg 1 à 211)dyguo
O documento discute a importância da educação para o desenvolvimento econômico e social de um país. Ele argumenta que investimentos em educação melhoram a produtividade e a competitividade da força de trabalho, levando a maiores taxas de crescimento econômico a longo prazo.
Facile Synthesis and Characterization of Pyrolusite, β- MnO2, Nano Crystal wi...Editor IJCATR
MnO2 nanoparticles have been synthesized by a simple combustion method using MnSO4.4H2O. The crystalline phase,
morphology, optical property and magnetic property of the as prepared nanoparticle were characterized using XRD, FT-IR, FTRaman,
SEM, UV-Vis, PL and VSM respectively. Structural studies by XRD indicate that the synthesized material as tetragonal rutile
crystal structure. FT-IR and FT-Raman analysis revealed the stretching vibrations of metal ions in tetrahedral co-ordination confirming
the crystal structure. The PL and UV analysis having an emission band at 390 nm, showed a prominent blue peak at 453 nm as well as
a green emission lines at 553 nm with band gap energy of 3.2eV. Magnetic measurements indicate that the Néel temperature of the β-
MnO2 structures is 92.5K for Hc = 100 Oe which showed antiferromagnetic behaviour
Evaluation of Iris Recognition System on Multiple Feature Extraction Algorith...Editor IJCATR
Multi-algorithmic approach to enhancing the accuracy of iris recognition system is proposed and
investigated. In this system, features are extracted from the iris using various feature extraction algorithms,
namely LPQ, LBP, Gabor Filter, Haar, Db8 and Db16. Based on the experimental results, it is demonstrated
that Mutli-algorithms Iris Recognition System is performing better than the unimodal system. The accuracy
improvement offered by the proposed approach also showed that using more than two feature extraction
algorithms in extracting the iris system might decrease the system performance. This is due to redundant
features. The paper presents a detailed description of the experiments and provides an analysis of the
performance of the proposed method.
1. Frederick Winslow Taylor fue un ingeniero estadounidense pionero en el estudio científico de la administración. 2. Desarrolló principios como la selección científica de trabajadores, establecimiento de cuotas de producción e incentivos salariales. 3. Sus aportes revolucionaron la administración pero también recibieron críticas por considerar que deshumanizaban el trabajo.
Este documento discute a importância da educação para o desenvolvimento econômico e social de um país. A educação é essencial para promover a inovação, o empreendedorismo e a competitividade global. Investimentos em educação de qualidade geram altos retornos econômicos a longo prazo.
Pattern Recognition of Japanese Alphabet Katakana Using Airy Zeta FunctionEditor IJCATR
This document summarizes research on pattern recognition of Japanese Katakana characters using the Airy Zeta function. The researchers tested the system on 460 Katakana characters with varying levels of training data. With 230 training characters, the system achieved an accuracy of 55.65%. With 460 training characters, the accuracy improved to 64.56%. More training data led to higher accuracy, as the system could better distinguish patterns. The highest accuracy was 95% for one character. The Airy Zeta function method produced reasonably high accuracy levels for recognizing handwritten Katakana patterns compared to other methods.
El documento habla sobre la productividad y cómo medirla. Explica que medir la productividad no es fácil, especialmente en el sector de servicios donde depende de cada cliente. También discute formas de mejorar la productividad como inversiones y cultura organizacional. Finalmente, reconoce que medir la productividad de los servicios es complicado debido a que depende de lo satisfecho que esté cada cliente.
Nearly half of the world's population currently lives in cities, and that proportion is projected to rise to 75% by 2050. Some of the major challenges facing cities in the future include growing income inequality, lack of resources, infrastructure decay, and environmental issues. The document outlines various strategies that could help address these challenges, such as developing smarter transportation, energy, education and government systems that are more integrated and efficient. It emphasizes that the future depends on how we address current issues.
Strategic Management of Intellectual Property: R&D Investment Appraisal Using...Editor IJCATR
Company executives are under increasing pressure to proactively evaluate the benefits of the huge amounts of
investment into intellectual property (IP). The main goal of this paper is to propose a Dynamic Bayesian Network
as a tool for modeling the forecast of the distribution of Research and Development (R&D) investment efficiency
towards the strategic management of IP. Dynamic Bayesian Network provides a framework for handling the
uncertainties and impression in the qualitative and quantitative data that impact the effectiveness and efficiency
of investments on R&D. This paper specifies the process of creating the graphical representation using impactful
variables, specifying numerical link between the variables and drawing inference from the network.
Data mining involves discovering hidden patterns in data, while data warehousing involves integrating data from multiple sources and storing it in a centralized location to support analysis. Some key differences are:
- Data mining uses techniques like classification, clustering, and association to discover insights from data, while data warehousing focuses on data integration and OLAP tools.
- Data mining looks for unknown relationships and makes predictions, while data warehousing provides a way to extract and analyze historical data.
- Data warehousing involves extracting, cleaning, and transforming data during an ETL process before loading it into a separate database optimized for analysis. Data mining builds on the outputs of data warehousing.
Comparing and analyzing various method of data integration in big dataIRJET Journal
This document discusses various methods for integrating data from multiple heterogeneous sources in big data systems. It begins by defining data integration and explaining the challenges posed by big data's volume, velocity, variety and veracity. The document then examines different techniques for big data integration including Extract-Transform-Load (ETL), Enterprise Information Integration (EII), data migration, data consolidation, data propagation, data federation and change data capture. It analyzes the advantages and limitations of each technique. In conclusion, the document states that data integration is a major challenge and there is no single best method, as the appropriate solution depends on the specific use case and data characteristics.
This document provides an introduction to data warehousing. It defines a data warehouse as a subject-oriented, integrated, time-invariant, and non-volatile collection of data from multiple sources designed to support analysis and decision making. Data warehouses centralize data for analysis, allow analysis of broad business data over time, and are a core component of business intelligence. They improve decision making, increase productivity and efficiency, and provide competitive advantages for organizations. While data warehouses provide benefits, they also face challenges related to scalability, speed, and security.
This document discusses testing of data warehouses. It describes how data warehouse testing is an important part of the design and ongoing maintenance of a data warehouse. The key components that require testing include the extract, transform, load (ETL) process, online analytical processing (OLAP) engine, and client applications. The document outlines different phases of data warehouse testing including ETL testing, data load testing, initial data load testing, user interface testing, and regression testing during ongoing data feeds. It emphasizes the importance of testing data quality throughout the data warehouse lifecycle.
Advances And Research Directions In Data-Warehousing TechnologyKate Campbell
This document provides an overview of data warehousing technology and discusses current research directions in this area. It describes how data warehouses integrate data from multiple sources to support decision making, and how OLAP and data mining tools analyze this data. The document outlines the typical components of a data warehousing system, including methods for collecting and managing source data, storing aggregated data in the warehouse, and providing multi-dimensional views of the data to analysts. It also discusses open research issues in areas like data warehouse modeling, querying, indexing, and parallelization.
This document discusses modern data pipelines and compares Extract-Transform-Load (ETL) and Extract-Load-Transform (ELT) approaches. ETL extracts data from sources, transforms it, and loads it into a data warehouse. ELT extracts and loads raw data directly into a data warehouse or data lake and then transforms it. The document argues that ELT is better suited for modern data as it handles both structured and unstructured data, supports cloud-based data warehouses/lakes, and is more efficient and cost-effective than ETL. Key advantages of ELT over ETL include lower costs, faster data loading, and better support for large, heterogeneous datasets.
This document discusses data warehousing, including its definition, importance, components, strategies, ETL processes, and considerations for success and pitfalls. A data warehouse is a collection of integrated, subject-oriented, non-volatile data used for analysis. It allows more effective decision making through consolidated historical data from multiple sources. Key components include summarized and current detailed data, as well as transformation programs. Common strategies are enterprise-wide and data mart approaches. ETL processes extract, transform and load the data. Clean data and proper implementation, training and maintenance are important for success.
This document summarizes a survey on data mining. It discusses how data mining helps extract useful business information from large databases and build predictive models. Commonly used data mining techniques are discussed, including artificial neural networks, decision trees, genetic algorithms, and nearest neighbor methods. An ideal data mining architecture is proposed that fully integrates data mining tools with a data warehouse and OLAP server. Examples of profitable data mining applications are provided in industries such as pharmaceuticals, credit cards, transportation, and consumer goods. The document concludes that while data mining is still developing, it has wide applications across domains to leverage knowledge in data warehouses and improve customer relationships.
This document provides information about testing a data warehouse. It discusses the different components that need to be tested, including ETL processes, the OLAP engine, and client applications. It outlines the various phases of data warehouse testing, such as testing the initial data load and periodic updates. Challenges of data warehouse testing include dealing with large volumes of heterogeneous data from multiple sources and ensuring data quality and temporal consistency. Best practices include focusing on data quality testing and identifying critical business scenarios to test.
Traditionally, data integration has meant compromise. No matter how rapidly data architects and developers could complete a project before its deadline, speed would always come at the expense of quality. On the other hand, if they focused on delivering a quality project, it would generally drag on for months thus exceeding its deadline. Finally, if the teams concentrated on both quality and rapid delivery, the costs would invariably exceed the budget. Regardless of which path you chose, the end result would be less than desirable. This led some experts to revisit the scope of data integration. This write up shall focus on the same issue.
Paper Final Taube Bienert GridInterop 2012Bert Taube
This document discusses using NoSQL data management and advanced analytics for automated demand response (ADR) programs. It notes that ADR will require good data management and analytics to support program execution, accounting, and fault detection. NoSQL databases are proposed as they can better handle the large volumes of diverse data from multiple sources in ADR, including telemetry, usage, events and metadata. Object-oriented data models in NoSQL allow fast, reliable access to different data types and relationships needed for ADR strategies, management and compliance with standards like CIM.
Data Lake-based Approaches to Regulatory-Driven Technology ChallengesBooz Allen Hamilton
The document discusses how a data lake approach can help financial institutions address regulatory challenges more effectively than traditional ETL approaches. A data lake allows raw data to be ingested rapidly and indexed as needed for analysis, reducing preparation time. It also enables unified queries across all data sources and quick fusion of multiple sources. This significantly reduces operational complexity and costs while improving security, flexibility, and the ability to address evolving requirements. The data lake approach is well-suited for challenges involving streaming analytics, point-to-point data marts, or data-heavy ETL requirements. Booz Allen has successfully implemented this approach for government clients to prototype solutions around critical applications.
The document discusses databases versus data warehousing. It notes that databases are for operational purposes like storage and retrieval for applications, while data warehouses are used for informational purposes like business reporting and analysis. A data warehouse contains integrated, subject-oriented data from multiple sources that is used to support management decisions.
The document discusses data integration and the ETL process. It provides details on:
1. Data integration, which combines data from different sources to create a unified view, supporting business analysis. It involves extracting, transforming, and loading data.
2. The general approach of integration, which can be achieved through application, business process, and user interaction integration. Techniques include ETL, data federation, and data propagation.
3. Data integration for data warehousing, focusing on the "reconciled data layer" which harmonizes data from sources before loading into the warehouse. This involves transforming operational data characteristics.
- Data lakes emerged as a concept during the Big Data era and offer a highly flexible way to store both structured and unstructured data using a schema-on-read approach. However, they lack adequate security and authentication mechanisms.
- The document discusses the key concepts of data lakes including how they ingest and store raw data without transforming it initially. It also covers the typical architectural layers of a data lake and some challenges in ensuring proper governance and management of data in the lake.
- Improving data quality, metadata management, and security/access controls are identified as important areas to address some of the current limitations of data lakes.
Three reasons why data virtualization is poised to play a key role in data management:
1) Data management challenges are increasing due to needs for quick response times, large and diverse data sources like social media and sensors, and many data management tools.
2) Data virtualization can address these challenges by providing a unified, secure access layer and delivering data as a service to meet business needs.
3) Data virtualization allows for a hybrid data storage model with data stored in both data warehouses and cheaper storage like Hadoop, and provides a common way to access both through its virtualization layer.
Enhancement techniques for data warehouse staging areaIJDKP
This document discusses techniques for enhancing the performance of data warehouse staging areas. It proposes two algorithms: 1) A semantics-based extraction algorithm that reduces extraction time by pruning useless data using semantic information. 2) A semantics-based transformation algorithm that similarly aims to reduce transformation time. It also explores three scheduling techniques (FIFO, minimum cost, round robin) for loading data into the data warehouse and experimentally evaluates their performance. The goal is to enhance each stage of the ETL process to maximize overall performance.
1. The document discusses data warehousing and data mining. Data warehousing involves collecting and integrating data from multiple sources to support analysis and decision making. Data mining involves analyzing large datasets to discover patterns.
2. Web mining is discussed as a type of data mining that analyzes web data. There are three domains of web mining: web content mining, web structure mining, and web usage mining. Common techniques for web mining include clustering, association rules, path analysis, and sequential patterns.
3. Web mining has benefits like addressing ineffective search engines and monitoring user visit habits to improve website design. Data warehousing and data mining can provide useful business intelligence when the right analysis techniques are applied to large amounts of integrated
This document discusses data warehousing and data mining. It defines data warehousing as the process of centralizing data from different sources for analysis. Data mining is described as the process of analyzing data to uncover hidden patterns and relationships. The document provides examples of how data mining and data warehousing can be used together, with data warehousing collecting and organizing data that is then analyzed using data mining techniques to generate useful insights. Applications of data mining and data warehousing discussed include medicine, finance, marketing, and scientific discovery.
IOUG93 - Technical Architecture for the Data Warehouse - PaperDavid Walker
This document provides an overview of the technical architecture for implementing a data warehouse. It discusses the key elements needed, including business analysis, database schema design, and project management. It then focuses on the technical architecture, describing the process of data acquisition which includes extraction, transformation, collation, migration, and loading of data from source systems into the data warehouse. It notes some considerations for each step, such as common data formats, network bandwidth, and handling exceptions during the loading process.
Similar to Enhancing Data Staging as a Mechanism for Fast Data Access (20)
Text Mining in Digital Libraries using OKAPI BM25 ModelEditor IJCATR
The emergence of the internet has made vast amounts of information available and easily accessible online. As a result, most libraries have digitized their content in order to remain relevant to their users and to keep pace with the advancement of the internet. However, these digital libraries have been criticized for using inefficient information retrieval models that do not perform relevance ranking to the retrieved results. This paper proposed the use of OKAPI BM25 model in text mining so as means of improving relevance ranking of digital libraries. Okapi BM25 model was selected because it is a probability-based relevance ranking algorithm. A case study research was conducted and the model design was based on information retrieval processes. The performance of Boolean, vector space, and Okapi BM25 models was compared for data retrieval. Relevant ranked documents were retrieved and displayed at the OPAC framework search page. The results revealed that Okapi BM 25 outperformed Boolean model and Vector Space model. Therefore, this paper proposes the use of Okapi BM25 model to reward terms according to their relative frequencies in a document so as to improve the performance of text mining in digital libraries.
Green Computing, eco trends, climate change, e-waste and eco-friendlyEditor IJCATR
This document discusses green computing practices and sustainable IT services. It provides an overview of factors driving adoption of green computing to reduce costs and environmental impact of data centers, such as rising energy costs and density. Green strategies discussed include improving infrastructure efficiency, power management, thermal management, efficient product design, and virtualization to optimize resource utilization. The document examines how green computing aims to lower costs and environmental footprint, and how sustainable IT services take a broader approach considering economic, environmental and social impacts.
Policies for Green Computing and E-Waste in NigeriaEditor IJCATR
Computers today are an integral part of individuals’ lives all around the world, but unfortunately these devices are toxic to the environment given the materials used, their limited battery life and technological obsolescence. Individuals are concerned about the hazardous materials ever present in computers, even if the importance of various attributes differs, and that a more environment -friendly attitude can be obtained through exposure to educational materials. In this paper, we aim to delineate the problem of e-waste in Nigeria and highlight a series of measures and the advantage they herald for our country and propose a series of action steps to develop in these areas further. It is possible for Nigeria to have an immediate economic stimulus and job creation while moving quickly to abide by the requirements of climate change legislation and energy efficiency directives. The costs of implementing energy efficiency and renewable energy measures are minimal as they are not cash expenditures but rather investments paid back by future, continuous energy savings.
Performance Evaluation of VANETs for Evaluating Node Stability in Dynamic Sce...Editor IJCATR
Vehicular ad hoc networks (VANETs) are a favorable area of exploration which empowers the interconnection amid the movable vehicles and between transportable units (vehicles) and road side units (RSU). In Vehicular Ad Hoc Networks (VANETs), mobile vehicles can be organized into assemblage to promote interconnection links. The assemblage arrangement according to dimensions and geographical extend has serious influence on attribute of interaction .Vehicular ad hoc networks (VANETs) are subclass of mobile Ad-hoc network involving more complex mobility patterns. Because of mobility the topology changes very frequently. This raises a number of technical challenges including the stability of the network .There is a need for assemblage configuration leading to more stable realistic network. The paper provides investigation of various simulation scenarios in which cluster using k-means algorithm are generated and their numbers are varied to find the more stable configuration in real scenario of road.
Optimum Location of DG Units Considering Operation ConditionsEditor IJCATR
The optimal sizing and placement of Distributed Generation units (DG) are becoming very attractive to researchers these days. In this paper a two stage approach has been used for allocation and sizing of DGs in distribution system with time varying load model. The strategic placement of DGs can help in reducing energy losses and improving voltage profile. The proposed work discusses time varying loads that can be useful for selecting the location and optimizing DG operation. The method has the potential to be used for integrating the available DGs by identifying the best locations in a power system. The proposed method has been demonstrated on 9-bus test system.
Analysis of Comparison of Fuzzy Knn, C4.5 Algorithm, and Naïve Bayes Classifi...Editor IJCATR
Early detection of diabetes mellitus (DM) can prevent or inhibit complication. There are several laboratory test that must be done to detect DM. The result of this laboratory test then converted into data training. Data training used in this study generated from UCI Pima Database with 6 attributes that were used to classify positive or negative diabetes. There are various classification methods that are commonly used, and in this study three of them were compared, which were fuzzy KNN, C4.5 algorithm and Naïve Bayes Classifier (NBC) with one identical case. The objective of this study was to create software to classify DM using tested methods and compared the three methods based on accuracy, precision, and recall. The results showed that the best method was Fuzzy KNN with average and maximum accuracy reached 96% and 98%, respectively. In second place, NBC method had respective average and maximum accuracy of 87.5% and 90%. Lastly, C4.5 algorithm had average and maximum accuracy of 79.5% and 86%, respectively.
Web Scraping for Estimating new Record from Source SiteEditor IJCATR
Study in the Competitive field of Intelligent, and studies in the field of Web Scraping, have a symbiotic relationship mutualism. In the information age today, the website serves as a main source. The research focus is on how to get data from websites and how to slow down the intensity of the download. The problem that arises is the website sources are autonomous so that vulnerable changes the structure of the content at any time. The next problem is the system intrusion detection snort installed on the server to detect bot crawler. So the researchers propose the use of the methods of Mining Data Records and the method of Exponential Smoothing so that adaptive to changes in the structure of the content and do a browse or fetch automatically follow the pattern of the occurrences of the news. The results of the tests, with the threshold 0.3 for MDR and similarity threshold score 0.65 for STM, using recall and precision values produce f-measure average 92.6%. While the results of the tests of the exponential estimation smoothing using ? = 0.5 produces MAE 18.2 datarecord duplicate. It slowed down to 3.6 datarecord from 21.8 datarecord results schedule download/fetch fix in an average time of occurrence news.
Evaluating Semantic Similarity between Biomedical Concepts/Classes through S...Editor IJCATR
Most of the existing semantic similarity measures that use ontology structure as their primary source can measure semantic similarity between concepts/classes using single ontology. The ontology-based semantic similarity techniques such as structure-based semantic similarity techniques (Path Length Measure, Wu and Palmer’s Measure, and Leacock and Chodorow’s measure), information content-based similarity techniques (Resnik’s measure, Lin’s measure), and biomedical domain ontology techniques (Al-Mubaid and Nguyen’s measure (SimDist)) were evaluated relative to human experts’ ratings, and compared on sets of concepts using the ICD-10 “V1.0” terminology within the UMLS. The experimental results validate the efficiency of the SemDist technique in single ontology, and demonstrate that SemDist semantic similarity techniques, compared with the existing techniques, gives the best overall results of correlation with experts’ ratings.
Semantic Similarity Measures between Terms in the Biomedical Domain within f...Editor IJCATR
The techniques and tests are tools used to define how measure the goodness of ontology or its resources. The similarity between biomedical classes/concepts is an important task for the biomedical information extraction and knowledge discovery. However, most of the semantic similarity techniques can be adopted to be used in the biomedical domain (UMLS). Many experiments have been conducted to check the applicability of these measures. In this paper, we investigate to measure semantic similarity between two terms within single ontology or multiple ontologies in ICD-10 “V1.0” as primary source, and compare my results to human experts score by correlation coefficient.
A Strategy for Improving the Performance of Small Files in Openstack Swift Editor IJCATR
This is an effective way to improve the storage access performance of small files in Openstack Swift by adding an aggregate storage module. Because Swift will lead to too much disk operation when querying metadata, the transfer performance of plenty of small files is low. In this paper, we propose an aggregated storage strategy (ASS), and implement it in Swift. ASS comprises two parts which include merge storage and index storage. At the first stage, ASS arranges the write request queue in chronological order, and then stores objects in volumes. These volumes are large files that are stored in Swift actually. During the short encounter time, the object-to-volume mapping information is stored in Key-Value store at the second stage. The experimental results show that the ASS can effectively improve Swift's small file transfer performance.
Integrated System for Vehicle Clearance and RegistrationEditor IJCATR
Efficient management and control of government's cash resources rely on government banking arrangements. Nigeria, like many low income countries, employed fragmented systems in handling government receipts and payments. Later in 2016, Nigeria implemented a unified structure as recommended by the IMF, where all government funds are collected in one account would reduce borrowing costs, extend credit and improve government's fiscal policy among other benefits to government. This situation motivated us to embark on this research to design and implement an integrated system for vehicle clearance and registration. This system complies with the new Treasury Single Account policy to enable proper interaction and collaboration among five different level agencies (NCS, FRSC, SBIR, VIO and NPF) saddled with vehicular administration and activities in Nigeria. Since the system is web based, Object Oriented Hypermedia Design Methodology (OOHDM) is used. Tools such as Php, JavaScript, css, html, AJAX and other web development technologies were used. The result is a web based system that gives proper information about a vehicle starting from the exact date of importation to registration and renewal of licensing. Vehicle owner information, custom duty information, plate number registration details, etc. will also be efficiently retrieved from the system by any of the agencies without contacting the other agency at any point in time. Also number plate will no longer be the only means of vehicle identification as it is presently the case in Nigeria, because the unified system will automatically generate and assigned a Unique Vehicle Identification Pin Number (UVIPN) on payment of duty in the system to the vehicle and the UVIPN will be linked to the various agencies in the management information system.
Assessment of the Efficiency of Customer Order Management System: A Case Stu...Editor IJCATR
The Supermarket Management System deals with the automation of buying and selling of good and services. It includes both sales and purchase of items. The project Supermarket Management System is to be developed with the objective of making the system reliable, easier, fast, and more informative.
Energy-Aware Routing in Wireless Sensor Network Using Modified Bi-Directional A*Editor IJCATR
Energy is a key component in the Wireless Sensor Network (WSN)[1]. The system will not be able to run according to its function without the availability of adequate power units. One of the characteristics of wireless sensor network is Limitation energy[2]. A lot of research has been done to develop strategies to overcome this problem. One of them is clustering technique. The popular clustering technique is Low Energy Adaptive Clustering Hierarchy (LEACH)[3]. In LEACH, clustering techniques are used to determine Cluster Head (CH), which will then be assigned to forward packets to Base Station (BS). In this research, we propose other clustering techniques, which utilize the Social Network Analysis approach theory of Betweeness Centrality (BC) which will then be implemented in the Setup phase. While in the Steady-State phase, one of the heuristic searching algorithms, Modified Bi-Directional A* (MBDA *) is implemented. The experiment was performed deploy 100 nodes statically in the 100x100 area, with one Base Station at coordinates (50,50). To find out the reliability of the system, the experiment to do in 5000 rounds. The performance of the designed routing protocol strategy will be tested based on network lifetime, throughput, and residual energy. The results show that BC-MBDA * is better than LEACH. This is influenced by the ways of working LEACH in determining the CH that is dynamic, which is always changing in every data transmission process. This will result in the use of energy, because they always doing any computation to determine CH in every transmission process. In contrast to BC-MBDA *, CH is statically determined, so it can decrease energy usage.
Security in Software Defined Networks (SDN): Challenges and Research Opportun...Editor IJCATR
In networks, the rapidly changing traffic patterns of search engines, Internet of Things (IoT) devices, Big Data and data centers has thrown up new challenges for legacy; existing networks; and prompted the need for a more intelligent and innovative way to dynamically manage traffic and allocate limited network resources. Software Defined Network (SDN) which decouples the control plane from the data plane through network vitalizations aims to address these challenges. This paper has explored the SDN architecture and its implementation with the OpenFlow protocol. It has also assessed some of its benefits over traditional network architectures, security concerns and how it can be addressed in future research and related works in emerging economies such as Nigeria.
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...Editor IJCATR
Report handling on "LAPOR!" (Laporan, Aspirasi dan Pengaduan Online Rakyat) system depending on the system administrator who manually reads every incoming report [3]. Read manually can lead to errors in handling complaints [4] if the data flow is huge and grows rapidly, it needs at least three days to prepare a confirmation and it sensitive to inconsistencies [3]. In this study, the authors propose a model that can measure the identities of the Query (Incoming) with Document (Archive). The authors employed Class-Based Indexing term weighting scheme, and Cosine Similarities to analyse document similarities. CoSimTFIDF, CoSimTFICF and CoSimTFIDFICF values used in classification as feature for K-Nearest Neighbour (K-NN) classifier. The optimum result evaluation is pre-processing employ 75% of training data ratio and 25% of test data with CoSimTFIDF feature. It deliver a high accuracy 84%. The k = 5 value obtain high accuracy 84.12%
Hangul Recognition Using Support Vector MachineEditor IJCATR
The recognition of Hangul Image is more difficult compared with that of Latin. It could be recognized from the structural arrangement. Hangul is arranged from two dimensions while Latin is only from the left to the right. The current research creates a system to convert Hangul image into Latin text in order to use it as a learning material on reading Hangul. In general, image recognition system is divided into three steps. The first step is preprocessing, which includes binarization, segmentation through connected component-labeling method, and thinning with Zhang Suen to decrease some pattern information. The second is receiving the feature from every single image, whose identification process is done through chain code method. The third is recognizing the process using Support Vector Machine (SVM) with some kernels. It works through letter image and Hangul word recognition. It consists of 34 letters, each of which has 15 different patterns. The whole patterns are 510, divided into 3 data scenarios. The highest result achieved is 94,7% using SVM kernel polynomial and radial basis function. The level of recognition result is influenced by many trained data. Whilst the recognition process of Hangul word applies to the type 2 Hangul word with 6 different patterns. The difference of these patterns appears from the change of the font type. The chosen fonts for data training are such as Batang, Dotum, Gaeul, Gulim, Malgun Gothic. Arial Unicode MS is used to test the data. The lowest accuracy is achieved through the use of SVM kernel radial basis function, which is 69%. The same result, 72 %, is given by the SVM kernel linear and polynomial.
Application of 3D Printing in EducationEditor IJCATR
This paper provides a review of literature concerning the application of 3D printing in the education system. The review identifies that 3D Printing is being applied across the Educational levels [1] as well as in Libraries, Laboratories, and Distance education systems. The review also finds that 3D Printing is being used to teach both students and trainers about 3D Printing and to develop 3D Printing skills.
Survey on Energy-Efficient Routing Algorithms for Underwater Wireless Sensor ...Editor IJCATR
In underwater environment, for retrieval of information the routing mechanism is used. In routing mechanism there are three to four types of nodes are used, one is sink node which is deployed on the water surface and can collect the information, courier/super/AUV or dolphin powerful nodes are deployed in the middle of the water for forwarding the packets, ordinary nodes are also forwarder nodes which can be deployed from bottom to surface of the water and source nodes are deployed at the seabed which can extract the valuable information from the bottom of the sea. In underwater environment the battery power of the nodes is limited and that power can be enhanced through better selection of the routing algorithm. This paper focuses the energy-efficient routing algorithms for their routing mechanisms to prolong the battery power of the nodes. This paper also focuses the performance analysis of the energy-efficient algorithms under which we can examine the better performance of the route selection mechanism which can prolong the battery power of the node
Comparative analysis on Void Node Removal Routing algorithms for Underwater W...Editor IJCATR
The designing of routing algorithms faces many challenges in underwater environment like: propagation delay, acoustic channel behaviour, limited bandwidth, high bit error rate, limited battery power, underwater pressure, node mobility, localization 3D deployment, and underwater obstacles (voids). This paper focuses the underwater voids which affects the overall performance of the entire network. The majority of the researchers have used the better approaches for removal of voids through alternate path selection mechanism but still research needs improvement. This paper also focuses the architecture and its operation through merits and demerits of the existing algorithms. This research article further focuses the analytical method of the performance analysis of existing algorithms through which we found the better approach for removal of voids
Decay Property for Solutions to Plate Type Equations with Variable CoefficientsEditor IJCATR
In this paper we consider the initial value problem for a plate type equation with variable coefficients and memory in
1 n R n ), which is of regularity-loss property. By using spectrally resolution, we study the pointwise estimates in the spectral
space of the fundamental solution to the corresponding linear problem. Appealing to this pointwise estimates, we obtain the global
existence and the decay estimates of solutions to the semilinear problem by employing the fixed point theorem
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Enhancing Data Staging as a Mechanism for Fast Data Access
1. International Journal of Computer Applications Technology and Research
Volume 4– Issue 8, 584 - 591, 2015, ISSN: 2319–8656
www.ijcat.com 584
Enhancing Data Staging as a Mechanism for Fast Data
Access
Reagan Muriithi Gatimu
School of Computer Science
and Information Technology
Jomo Kenyatta University of
Agriculture and Technology
(JKUAT)
Nairobi, Kenya
Wilson Cheruiyot
School of Computer Science
and Information Technology
Jomo Kenyatta University of
Agriculture and Technology
(JKUAT)
Nairobi, Kenya
Michael Kimwele
School of Computer Science
and Information Technology
Jomo Kenyatta University of
Agriculture and Technology
(JKUAT)
Nairobi, Kenya
Abstract: Most organizations rely on data in their daily transactions and operations. This data is retrieved from different source
systems in a distributed network hence it comes in varying data types and formats. The source data is prepared and cleaned by
subjecting it to algorithms and functions before transferring it to the target systems which takes more time. Moreover, there is pressure
from data users within the data warehouse for data to be availed quickly for them to make appropriate decisions and forecasts. This has
not been the case due to immense data explosion in millions of transactions resulting from business processes of the organizations. The
current legacy systems cannot handle large data levels due to processing capabilities and customizations. This approach has failed
because there lacks clear procedures to decide which data to collect or exempt. It is with this concern that performance degradation
should be addressed because organizations invest a lot of resources to establish a functioning data warehouse. Data staging is a
technological innovation within data warehouses where data manipulations are carried out before transfer to target systems. It carries
out data integration by harmonizing the staging functions, cleansing, verification, and archiving source data. Deterministic
Prioritization Approach will be employed to enhance data staging, and to clearly prove this change Experiment design is needed to test
scenarios in the study. Previous studies in this field have mainly focused in the data warehouses processes as a whole but less to the
specifics of data staging area.
Keywords: Data Staging; Source System; Deterministic Prioritization; Data Warehouse
1. INTRODUCTION
The growing number of business transactions in any
enterprise is directly proportional to growth of data size. This
data comes from variant source systems and applications and
needs to be organized in a workable state so that it remains
relevant and meaningful to the users. Technological
development has led to the rise of Data Warehouse (DW).
(Inmon, 2002) defines a data warehouse as “collection of
integrated, subject-oriented databases designated to support
the decision making process”. Both (Kimball and Inmon,
2002) agree that a DW has to be integrated, subject-oriented,
nonvolatile and time variant. This concept of time-variance is
so crucial and ultimate concern and sets the basis for this
research since it focuses on improved data access. The
foundations of a DW as explained by (Zineb, Esteban, Jose-
Norberto, Juan, 2011) encompass integration of multiple
different data sources. This allows the provision of complete
and correct view of the enterprise operational data which is
synthesized into a set of strategic indicators and measures that
the users of the data can associate with.
DW has business intelligence implemented in three major
processes used to prepare data to match user’s needs. They are
commonly referred to as ETL processes namely; Extraction,
Transformation and Loading. Extraction process retrieves data
as is from source systems before subjecting it to any
manipulations. Transformation process also referred to as
transportation phase is the operational base and the most
intriguing of all. Business rules and functions are some of the
operations applied to the extracted data. Loading process
involves moving the desired data as determined by the users
to the DW. It’s important to note that the discussed flow of
data is not as simple and smooth as it sounds and this is as a
result of impeding performance issues raised by the following
observed bottlenecks.
(El-Wessimy et al, 2013) shows the relevance of DW in
decision making in today’s environment. “The best decisions
are made when all the relevant data is taken into
consideration. Today, the biggest challenge in any
organization is to achieve better performance with least cost,
and to make better decisions than competitors. That is why
data warehouses are widely used within the largest and most
complex businesses in the world.”
In this paper, SQL Server Integration Services tool is used to
experimentally show the impact of prioritizing the data from
sources as per the confidence levels and the distinctiveness of
the data. The units of measure also focus on general
performance of the whole ETL process after enhancement of
the data staging area.
2. RELATED WORK
2.1 Extraction Stage
This is the initial stage of data migration to a data warehouse.
(Kimball et al., 1998) informs that the extraction process
consists of two phases, initial extraction, and changed data
extraction. In the initial extraction, data from the different
operational sources to be loaded into the DW is captured for
the first time. This process is done only one time after
building the DW to populate it with a huge amount of data
from source systems. The next phase involves incremental
extraction also referred to as changed data capture (CDC).
(Stephen, 2013) informs that “The staging tables usually get
populated by some outside source, by either pulling or
2. International Journal of Computer Applications Technology and Research
Volume 4– Issue 8, 584 - 591, 2015, ISSN: 2319–8656
www.ijcat.com 585
pushing the data from the source systems. This process is
usually an insert only process and therefore does not rely on
statistics for its successful execution.”
2.2 Transformation Stage
Once the data is extracted to the staging area, there are
numerous potential transformations, such as cleansing the data
(correcting misspellings, resolving domain conflicts, dealing
with missing elements, or parsing into standard formats),
combining data from multiple sources, reduplicating data, and
assigning warehouse keys. These transformations are all
precursors to loading the data into the data warehouse
presentation area. Unfortunately, there is still considerable
industry consternation about whether the data that supports or
results from this process should be instantiated in physical
normalized structures prior to loading into the presentation
area for querying and reporting.
(Erhard and Hong, 2000) elaborate on activities within
transformation phase towards clean data. These include data
analysis that focus on meta-data and due to fewer integrity
rules it cannot guarantee sufficient data quality of a source.
Two approaches have been put across to assist in data analysis
i.e. data profiling and data mining. Data profiling focuses on
the instance analysis of individual attributes. It derives
information such as the data type, length, value range, discrete
values and their frequency, variance, uniqueness, occurrence
of null values, typical string pattern providing an exact view
of various quality aspects of the attribute. Data mining helps
discover specific data patterns in large data sets, e.g.,
relationships holding between several attributes.
2.3 Loading Stage
Loading in the data warehouse environment usually takes the
form of presenting the quality-assured dimensional tables to
the bulk loading facilities of each data mart. The target data
mart must then index the newly arrived data for query
performance. When each data mart has been freshly loaded,
indexed, supplied with appropriate aggregates, and further
quality assured, the user community is notified that the new
data has been published.
When it comes to moving data to DW (Stephen, 2013)
informs “The biggest question for the staging area is –
how do we keep the statistics up-to-date such that the
statistics for a particular daily load are always available and
reasonably accurate. This is actually more difficult than it
sounds. If the partitions would only be analyzed in the first
quarter of the month each night, going to every other night
and eventually each week because of the 10% stale
setting. This obviously leaves us with a problem…. In order to
have the statistics available for the latest day which is loaded,
the statistics would have to be gathered after the staging tables
have been loaded but before the ETL process starts.
2.4 Data Staging
Data staging emerges as a new technological development
with an attempt to handle the low performance issues noted
above. Its location within the ETL process differs as (Kimball
and Ross, 2002) state that data staging is available in the
extraction and transformation phases of ETL framework. In
some current systems a data stage exists as a location that
interconnects Online Transaction Processing systems
(OLTPs) to the Online Analytical processing systems
(OLAPs).
2.5 Reasons for Data Staging Enhancement
Although data staging is not a new technology since it has
been researched before, the focus has been shifted to designs
and development of data staging frameworks. Little attention
has been given to its operability and its significant role in
speeding the ETL process.
In a production environment especially a busy organization
that deals with large transactions in its daily operations, data
flow to DW and storage repositories becomes an issue. Some
may not be experiencing the performance problems initially
but when their data levels grow they start getting intermittent
performance. This should be handled early to have a
maintained work path. It still remains a challenge on selection
algorithms that can pre-determine the data needed at the target
system. The proposed solution in data staging which forms the
enhancement is to work on the pre-determining and
prioritization mechanisms on the data to load.
(Aksoy et al, 2001) introduces a more workable approach to
data staging concerns. They based their work on broadcast
scheduling and data staging. According to them the key
design considered for development of large scale on-demand
broadcast server was the scheduling algorithm selection useful
for selecting of items to be broadcast. However, this solution
is based on assumptions that data will be available before
hand which is not true due to its dynamism.
3. PROPOSED APPROACH IN DATA
STAGING
Considering the improvements already observed from great
works of other authors it’s important to appreciate their efforts
in finding gaps in existing systems. Relating to the recurring
problems within the data staging, the researcher identifies the
following dependent and independent variables that assist in
choosing deterministic prioritization approach as a probable
solution.
3.1 Dependent and Independent Variables
Figure 1. Dependent and Independent variables
Legacy systems use stacking, where a job is pushed into the
pool of tasks and popped out of the stack sequentially when
its time of execution arrives according to the queue structure.
The order of execution is not highly dependent on pre-
arranged structure but on the procedural mode which degrades
performance.
The cache offer pre-fetching capability where data storage
occurs temporarily for the most used data. The scheduled
procedure looks firstly in the cache memory before checking
on secondary storage locations to minimize the search time.
The limitation lies on the cache size and amount of data to be
maintained in cache at a time.
To achieve concurrent operation there needs to be selective
algorithms to decide the priority of jobs from the source
systems to the data staging area. With Opportunistic
scheduling there is high probability of improving speed of
data retrieval and access.
Remote connection to source systems affects the speed of
retrieval and query execution is delayed by the time-lapse for
3. International Journal of Computer Applications Technology and Research
Volume 4– Issue 8, 584 - 591, 2015, ISSN: 2319–8656
www.ijcat.com 586
distributed systems. This impact on the nature of ordering
results from query execution and thus optimization should be
introduced to work with stored procedures and cache facility.
The data characteristics are defined by type and formats since
it comes from disparate systems. Destination requirements
must be matched before data is moved to the target systems.
Poor data manipulation functions result in longer time
processing the data slowing down the systems. The functions
for manipulating flat files are different from the ones for
relational tables and databases due to underlying data formats.
3.2 Deterministic Prioritization (DP)
After thorough considerations of the above variables and the
research gaps identified, Deterministic Prioritization approach
is put forward as a solution to the data access problem. The
implementation of deterministic prioritization in the data
staging area expounds the relationship of the other ETL
processes. This approach will tend to manipulate data
immediately it is collected and availed to the staging area.
Less activity is experienced in the extraction phase but the
actual data work area is within the staging area. With this
approach appropriate data selection is coordinated to filter out
unwanted “dirty” data and assigning priority to the important
“clean” data that is moved to data warehouse or data marts.
The fact remains that previous activities within staging area
are important and hence the approach aims to improve on the
order of execution to avoid redundancy and repetition of
tasks. This concept is illustrated in the following diagram that
shows the relationship among the ETL processes of a data
warehouse by applying the DP rules.
Figure 2. Proposed Conceptual Framework
This implementation is in two sections i.e. deterministic
section and prioritization section as discussed below.
3.3 Deterministic Section
According to Macmillan dictionary, the term deterministic
means “using or believing in the idea that everything is caused
by another event or action and so you are not free to choose
what you do“. The deterministic approach is relying on data
selections based on the confidence levels or behavior of data
accesses in relation to previous selections. The key
characteristic of determinism is that, the output remains the
same for the same inputs. When applied to data staging, the
selection of column values from a table is based on the
distinctiveness of the data stored and frequency of previous
accesses. The most distinct values (fewer NULLS) the higher
the confidence levels to be loaded to the next stage in ETL.
The data columns with high nullity are negated from the
selection hence reducing the amount of load to the next stage.
3.4 Prioritization Section
According to Macmillan dictionary, the term prioritize means
“to decide in what order you should do things, based on how
important or urgent they are”.
The main goal of implementing prioritization is to reduce the
amount of active data being manipulated at a time by focusing
on the minimum and meaningful details. Once the data to be
loaded has been determined, then the order at which
processing occurs is vital to reduce delays in data handling.
The data selection from the raw tables needs to be prioritized
by altering the query execution plan to give more priority on
the data columns with high confidence levels. To implement
this clustered indexes are introduced based on these distinct
columns. These indexes alter the normal execution plan for
the queries hence improving performance and efficiency. This
execution plan is explained by (Grant, 2009) that “an
execution plan is the result of query optimizer’s attempt to
calculate the most efficient way to implement the request
represented by the T-SQL query...”
The query executer has an engine that optimizes query
execution on its own and by altering the data selection time is
reduced. Prioritization affects the logical aspects enforced by
the business rules to maintain the dimensional model and
giving way for faster way of retrieving data.
3.5 Included Improvements
Creating a new stable staging framework that is freely
available to everyone and run across different hardware
platforms (cross-platform) and supporting concurrent
processing. This will magnify the core benefits of having
intelligent data warehouses that are supportive to the top-level
management systems majorly due to development of staging
area.
(Stephen, 2013) elaborates their approach in Oracle
environment.” Most ETL applications use a staging area to
stage source system data before loading it into the warehouse
or marts. When implemented within an oracle environment a
partitioning strategy is usually employed such that data that is
not required any longer can be removed from the tables with
the minimum amount of effort.”
The proposed framework will have forecasting and
prioritization mechanisms to decide which data is necessary
before transfer begins hence saving on network services and
bandwidth.
Current systems such as HANA databases have high
processing capability which is meaningless if there is no
proper scheduling of resources. This can result to lots of
losses of resources not being manned properly. Hardware is
static while data is dynamic and at some point the available
hardware would not be sufficient to handle the data affecting
on performance. Eventually, Scheduling plays a vital role in
4. International Journal of Computer Applications Technology and Research
Volume 4– Issue 8, 584 - 591, 2015, ISSN: 2319–8656
www.ijcat.com 587
the performance implications of any system. It represents the
effect being sought is measurable to make comparison.
3.6 Prioritization by indexing specific
columns
The newly distinct derived columns that were added to each
staging table (external columns; hence do not affect the data
from the source in any way to ensure consistency and
integrity), are used to create indexes as shown below.
Figure 3. Index Creation Sample
The created index named “IX_newStaging_Customer” is
prioritizing the derived unique column named
“STCustomerID” on the staging table named
“newStaging_Customer”. The created user defined index hints
the order of execution of the Data Definition Language (DDL)
queries submitted to the server hence overriding the server’s
query execution plan. The created index is also not affected in
future in case of recreations of the source data from extraction
stage since its an external derived column. Prioritization by
distinct columns enhances the efficiency and performance of
the query execution plan by the server during query search.
This results in optimized selection costs while maintaining the
quality of data to the data warehouse. The measure of
improved efficiency is shown below for a selection query.
Figure 4. New Execution plan for Customer Table
3.7 Test scenario preparation in SSIS tool
The ETL processes of a data warehouse are demonstrated
using the SSIS tool. The first scenario setup is for the current
situation before enhancement and the second scenario setup is
for the new situation after enhancement. The following is a
demonstration of the scenario used in the experiments in run
mode.
Figure 5. Test Scenario Setup.
4. RESULTS
The experiments are performed and collecting results of time
variables in a Ms Excel file. This file is generated
automatically from the scripts written in Visual Studio 2008
and C# programming language when the scenario is run. The
experiment is run for fifteen times and for each cycle it
records the time change for the different variables to the file.
Finally, the comparison is made based on these results from
both scenarios to note the impact of the change introduced as
shown below.
Table 1. Before Enhancement ETL Processes results
TestRunNum
ber
prevExtraction
Time
prevStagingT
ime
prevLoading
Time
1 63368.28 68717.21 92851.52
2 97940.12 79815.68 132989.3
3 52449.29 45874.92 62384.48
4 54358.59 87338.59 49553.42
5 49600.26 50350.14 45086.72
6 48285.48 54800.62 45312.21
7 52569.22 48670 48858.79
8 53581.86 51044.78 66801.6
9 88992.69 96697.22 60973.71
10 72884.88 56306.1 52252.09
11 58839.45 54556.57 59913
12 69740.99 58111.31 53562.38
13 59401.77 50840.68 51099.61
14 65250.34 55357.5 52560.51
15 56115.72 71712.82 58035.14
5. International Journal of Computer Applications Technology and Research
Volume 4– Issue 8, 584 - 591, 2015, ISSN: 2319–8656
www.ijcat.com 588
Table2. After Enhancement ETL Processes results
TestRunN
umber
newExtractio
nTime
newStagin
gTime
newLoadin
gTime
newTotal
Time
1 62261.37 50691.27 4824.541 117777.2
2 59639.95 53591.02 4299.265 117530.2
3 55766.46 54468.57 3652.78 113887.8
4 54012.87 53544.93 3552.135 111109.9
5 61811.67 54836.81 3340.999 119989.5
6 59963.43 57082.69 3374.887 120421
7 63116.5 69659.48 4147.719 136923.7
8 61813.26 61603.78 3455.976 126873
9 69365.94 54521.06 3623.27 127510.3
10 56561.51 52100.49 5690.811 114352.8
11 54106.91 59241.29 3701.506 117049.7
12 59291.2 86575.51 3493.712 213390.8
13 66104.49 89692.62 4130.86 159928
14 72926.03 56987.29 5984.634 135898
15 57962.95 50812.86 3926.081 112701.9
5. DISCUSSION
The following discussion is based on the comparison of the
results obtained from the tests. Each of the ETL stage is
compared separately for the two situations and graphically
shown in the following figures to have a clear distinction of
the two situations.
5.1 Comparison of Extraction stage
The following is an illustration of the individual comparison
per stage of the ETL processes to bring out a clear view of the
improvement made. Explanation for each comparison follows
for every illustration. The negative sign indicates that it is in
the reducing direction thus showing the enhancement has
taken place by reducing the particular running time per stage.
Table 3. Extraction stage results analysis
Test prevExtractionTime newExtractionTime
Average 62891.92931 60980.30203
Change -1911.627273
Change % -3.039543061
Figure 6. Extraction stage comparison
The above illustrations show the time taken for extraction
using Deterministic Prioritization approach has reduced by
3.04% compared to the previous extraction time.
5.2 Comparison of Staging stage
Table 4. Staging stage results analysis
Test prevStagingTime newStagingTime
Average 62012.94289 60360.64505
Change -1652.29784
Change % -2.664440297
6. International Journal of Computer Applications Technology and Research
Volume 4– Issue 8, 584 - 591, 2015, ISSN: 2319–8656
www.ijcat.com 589
Figure 7. Staging Stage Comparison
The above illustrations show the time taken in staging area
using Deterministic Prioritization approach has reduced by
2.66% compared to the previous extraction time.
5.3 Comparison of Loading stage
Table 5. Loading stage results analysis
Test prevLoadingTimenewLoadingTime
Average 62148.96424 4079.94504
Change -58069.0192
Change % -93.43521635
Figure 8. Loading Stage Comparison
The above illustrations show the time taken in loading stage
using Deterministic Prioritization approach has immensely
reduced by 93.44% compared to the previous loading time.
This is a great change and is attributed to the fact that the
number of operations talking place at this stage being minimal
and less traffic to the destination targets of the data i.e. the
data warehouses. However with increased traffic of data
especially over the network, the loading time percentage
change may further reduce. The loading process has benefited
a lot from changes in previous stages.
6. SUMMARY OF FINDINGS
There has been a significant change for every stage of the
ETL processes after implementation of deterministic
prioritization approach. Moreover, the subsequent impact of
this improvement has been depicted at the loading stage
where the rate of transfer improved by a large margin of
93.44%. This shows that any change occurring in the staging
area will definitely affect the entire process flow. However, if
more time is taken in proper design of the data staging area,
then there are high expectations for further improvements on
performance beyond the ones achieved by the researcher.
Based on the limitation of resources and the specifications of
the workstation used to run the tests, the performance levels
are considered acceptable for an organization with fast
growing data sets. However, the setup is limited by the
resources at run time but with increased disk space as
provided by the owners as data size grows, then the total data
access time in the data warehouse will be highly improved.
The choice of data prioritization mechanism using indexes for
columns is supported by (Cecilia and Mihai, 2011), where
they state the use of Indexes on database queries improves the
performance of the whole system. Clustered indexes perform
better than nonclustered indexes when the expected returned
records are many and should be set for the most unique
column of a table. This proposition goes in line with the
research with the use of clustered indexes in the staging area
mainly due to the need to fetch and process large data.
(Costel G et al, 2014) did a research on query execution and
optimization in the MSSQL Server and put across the missing
of indexes as a contributor to low performance of query
execution. They inform that when a table misses indexes, the
search engine has to parse through the entire table step by step
to find the searched value. The resources spent on this process
are enormous and considerably increases time to execute the
queries.
(Grant, F. 2012) explains about execution plan management
done by query optimizer. The database relational engine
performs logical reads within the cache memory while the
storage engine performs physical reads directly from disk.
Improvements are highly realized mostly for data
manipulation language statements since the engine needs to
parse the query for correctness. The SQL server generates
statistics against the indexes and sends them to the optimizer
to determine the execution plan.
(El-Wessimy et al, 2013). Similarly did an enhancement in the
data warehouse staging area by using different techniques
(FIFO, MC,RR time and record rotation) targeting the loading
phase. The tests ran captured the time taken to transfer data in
each stage of the ETL process and suggest the most suitable
technique. They did a comparison amongst all techniques and
noted that FIFO performed better for less data set while
Record Limit Based Round Robin was best for large data sets.
Their research supported further reduction of overall time
taken to deliver data from source to destination. The
uniqueness of this study is the ability to handle large data sets
from the beginning as well as newer inclusions of data from
7. International Journal of Computer Applications Technology and Research
Volume 4– Issue 8, 584 - 591, 2015, ISSN: 2319–8656
www.ijcat.com 590
the sources without any readjustments of the system structure
setup.
7. CONCLUSION
The adoption of Deterministic Prioritization approach in the
staging area has shown promising results and the users at the
presentation level of the data warehouse are rest assured of
fast data access and retrievals. They can timely make decisive
conclusions and reports based on current data that is made
available in a timely fashion similar to real time systems.
They will also benefit to wide range of data availed since the
ETL processes aim to denormalize the data comprehensively
before its delivered to users. This forms an association that is
deterministic in nature for future priority loads. The
researcher was keen to avoid compromising data quality for
high performance gains and this resulted in a more balanced
system setup where the data still meet the qualitative
assurance defined by the users’ requirements.
8. RECOMENDATIONS AND FUTURE
WORK
In view of the results and findings of the experiments
undertaken in this research, the researcher recommends the
incorporation of the deterministic prioritization approach in
the design and development phase of the data staging
frameworks. This is so because it is cross-platform to all
database management systems that support the SQL language
in the market today. The change gives room for further
customization since it only happens at the design and before
query execution. Notably for well formed queries the
performance will be even better.
The data staging area is an area which has not been researched
exhaustively and the impact of high resources should be
considered as a next check on performance gain over cost.
The setup experiments are carried based on same format of
data sources and further studies should be carried out on
variant data sources and using different staging framework
other than the SSIS tool used here. The scenario in this test is
demonstrated in a local workstation and it would be essential
to note the performance levels in a distributed system with
both sources and targets widely separated by networks.
9. REFERENCES
[1] Abbasi, H., Wolf, M., Eisenhauer, G., Klasky, S.,
Schwan, K., & Zheng, F. (2010). Datastager: scalable
data staging services for petascale applications. Cluster
Computing, 13(3), 277-290.
[2] Akkaoui, Z. E., Munoz, E. Z. J.-N., and Trujillo .J. A.
(2011). Model-Driven Framework for ETL Process
Development. In Proceedings of the international
workshop on Data Warehousing and OLAP. pp. 45–52
Glasgow, Scotland, UK.
[3] Aksoy, D., Franklin, M. J., & Zdonik, S. (2001). Data
staging for on-demand broadcast. In VLDB (Vol. 1, pp.
571-580).
[4] B´ezivin, J. (2005). On the unification power of
models.Software and System Modeling, 4(2):171–188
[5] Cecilia, C., Mihai, G. (2011). Increasing Database
Performance using Indexes. Database Systems Journal
vol. II, no. 2/2011. Economic Informatics Department,
Academy of Economic Studies Bucharest, Romania.
[6] Costel, G.C., Marius, M. L., Valentina, L., Octavian, T.
P.(2014). Query Optimization Techniques in Microsoft
SQL Server. Database Systems Journal vol. V, no.
2/2014. University of Economic Studies, Bucharest,
Romania.
[7] Da Silva, M.S.,Times, V.C., Kwakye, M.M. (2012).
Journal of Information and Data Management.3 (3).
[8] Deterministic. (2009-2015). In Macmillan Dictionary.
Macmillan Publishers Limited: Accessed from:
www.macmillandictionary.com on 20th
July 2015.
[9] Eckerson, W., & White, C. (2003). Evaluating ETL and
data integration platforms. Seattle: The DW Institute.
[10] El-Wessimy, M., Mokhtar, H. M., & Hegazy, O. (2013).
ENHANCEMENT TECHNIQUES FOR DATA
WAREHOUSE STAGING AREA. International Journal
of Data Mining & Knowledge Management Process,
3(6).
[11] Erhard, R., and Hong, H.D. (2000). Data Cleaning:
Problems and Current Approaches. Journal IEEE Data
Eng. Bull.23 (4), 3-13.
[12] Grant, F.(2009).The Art of High Performance SQL
Code: SQL Server Execution Plans. Simple-Talk
Publishing. ISBN 978-1-906434-02-1
[13] Grant, F. (2012). SQL Server Execution Plans. Second
Ed. ISBN: 978-1-906434-92-2. Simple Talk
Publishing.
[14] El-Wessimy, M., Mokhtar, H. M., & Hegazy, O. (2013).
ENHANCEMENT TECHNIQUES FOR DATA
WAREHOUSE STAGING AREA. International Journal
of Data Mining & Knowledge Management Process,
3(6).
[15] Firestone, J. M. (1998). Dimensional modeling and ER
modeling in the data warehouse. White Paper No, Eight
June, 22.
[16] Flinn, J., Sinnamohideen, S., Tolia, N., &
Satyanarayanan, M. (2003). Data Staging on Untrusted
Surrogates. In FAST (Vol. 3, pp. 15-28).
[17] Inmon, W. H. (2002). Building the Data Warehouse.
Wiley.
[18] Inmon, W. H. (2005). Building the data warehouse. John
wiley & sons.
[19] Kimball, R., & Caserta, J. (2004). The data warehouse
ETL toolkit. John Wiley & Sons.
[20] Kimball, R., Reeves, L., Ross, M., & Thornthwaite, W.
(2008). The Data Warehouse Lifecycle Toolkit, 2nd ed.
Practical Techniques for Building Data Warehouse and
Business Intelligence Systems.
[21] Muller, P. A., Studer, P., Fondement, F., & Bézivin, J.
(2005). Platform independent Web application modeling
and development with Netsilon. Software & Systems
Modeling, 4(4), 424-442.
[22] Per-Åke, L., Cipri, C., Campbell, F., Eric, N. H.,
Mostafa, M., Michal, N., Vassilis, P., Susan, L. P.,
Srikumar, R., Remus, R., Mayukh,
S.(2013).Enhancements to SQL Server Column Stores.
ACM 978-1 -4503-2037-5/13/06. New York, USA.
[23] Prioritize. (2009-2015). In Macmillan Dictionary.
Macmillan Publishers Limited: Accessed from:
www.macmillandictionary.com on 20th
July 2015.
[24] Ralph, K. and Margy, R. (2002). The Data
WarehouseToolkit: The Complete Guide to Dimensional
Modeling. Second Edition. Published by John Wiley and
Sons, Inc. Canada.
8. International Journal of Computer Applications Technology and Research
Volume 4– Issue 8, 584 - 591, 2015, ISSN: 2319–8656
www.ijcat.com 591
[25] Russom, P. (2012). BI Experts: Big Data and Your Data
Warehouse's Data Staging Area. TDWI Best Practices
Report, Fourth Quarter. Retrieved from
http://tdwi.org/articles/2012/07/10/big-data-staging-
area.aspx
[26] SAP AG. (2002). Business Information Warehouse –
Data Staging Retrieved from
http://scn.sap.com/docs/DOC-8100.
[27] Stephen, B. (2013). Staging, Statistics & Common
Sense: Oracle Statistics Maintenance
[28] Strategy in an ETL environment Retrieved from
http://www.seethehippo.com/
[29] Vassiliadis, P. (2009). A survey of Extract–transform–
Load technology. International Journal of Data
Warehousing and Mining (IJDWM), 5(3), 1-27.
[30] Zineb, A., Esteban, Z., Jose-Norberto.M., Juan, T.
(2011). A Model-Driven Framework for ETL Process
Development. In DOLAP 11 Proceedings of the ACM
14th international workshop on Data Warehousing and
OLAP Pages 45-52. ACM New York, NY, USA.
ISBN: 978-1-4503-0963-9