The document discusses a system for integrating structured and unstructured data from heterogeneous environments. The system uses OGSA-DAI services and the Globus Toolkit to provide an abstraction layer that allows database operations on both structured data from databases and unstructured file-based data. It generates metadata from unstructured data and configures the abstraction layer to query across the different data sources. This provides users an integrated view of both structured and unstructured data through a single interface.
The huge volume of text documents available on the internet has made it difficult to find valuable
information for specific users. In fact, the need for efficient applications to extract interested knowledge
from textual documents is vitally important. This paper addresses the problem of responding to user
queries by fetching the most relevant documents from a clustered set of documents. For this purpose, a
cluster-based information retrieval framework was proposed in this paper, in order to design and develop
a system for analysing and extracting useful patterns from text documents. In this approach, a pre-
processing step is first performed to find frequent and high-utility patterns in the data set. Then a Vector
Space Model (VSM) is performed to represent the dataset. The system was implemented through two main
phases. In phase 1, the clustering analysis process is designed and implemented to group documents into
several clusters, while in phase 2, an information retrieval process was implemented to rank clusters
according to the user queries in order to retrieve the relevant documents from specific clusters deemed
relevant to the query. Then the results are evaluated according to evaluation criteria. Recall and Precision
(P@5, P@10) of the retrieved results. P@5 was 0.660 and P@10 was 0.655.
This document provides a survey of file replication techniques used in grid systems. It begins with an introduction to grid systems and discusses their use of replication to improve response times and reduce bandwidth consumption. It then categorizes replication techniques as static or dynamic and describes challenges of replication including maintaining consistency and overhead. The document surveys various replication strategies for different grid topologies like peer-to-peer, tree and hybrid. It evaluates strategies based on factors like access latency, bandwidth consumption and fault tolerance. Specific replication techniques are discussed for peer-to-peer architectures aimed at availability, placement strategies and balancing workloads.
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"Guy K. Kloss
This document discusses the development of a grid data infrastructure called MataNui to manage large amounts of observational astronomical data and metadata from a collaboration between researchers in New Zealand and Japan. The infrastructure uses existing open-source tools like MongoDB, GridFTP, and the DataFinder GUI client to allow distributed storage and access of data while meeting requirements like handling large data volumes, metadata, and remote access. This approach provides a robust, reusable, and user-friendly system to address common data management challenges in scientific collaborations.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
An elastic , effective, activety or intelligent ,graceful networking architecture layout be desired to make processing massive data. next to that ,existent network architectures be considerably incapable for
cleatting the huge data. massive data thrusts network exchequers into border it consequence with in network overcrowding ,needy achievement, then permicious employer exprtises. this offered the current state-of-the-art research affronts ,potential solutions into huge data networking notion. More specifically, present the state of networking problems into massive data connected intrequirements,capacity,running ,
data manipulating also will introduce the architectures of MapReduce , Hadoop paradigm within research
requirements, fabric networks and software defined networks which utilizized into making today’s idly growing digital world and compare and contrast into identify relevant drawbacks and solutions.
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
The document summarizes research on vertical fragmentation, allocation, and re-fragmentation in distributed object relational database systems. It proposes an algorithm for vertical fragmentation and allocation that considers the usage of attributes and methods by queries at different sites. The algorithm forms usage matrices, calculates affinity between methods, clusters methods, and partitions the data into fragments that are allocated to sites where they see the most demand. It also describes handling update queries by redirecting them to a server for processing and then propagating the updates to relevant fragments.
This document describes a proposed tool called Warehouse Creator that can automatically generate data warehouses from heterogeneous data sources within an enterprise. The tool extracts data from various data sources like databases and files, integrates the data by generating dimension and fact tables, and provides a web interface for users to search and retrieve information from the warehouse without needing direct access to the underlying data sources. The tool aims to address issues like the need for users to have detailed knowledge of different data sources and query languages by providing a centralized warehouse that integrates data from multiple sources.
This document describes a technique for mining intentional knowledge from XML documents to improve query answering performance. The technique involves:
1. Parsing an XML document and generating a tree model. Frequent subtrees are then mined from the tree to extract Tree-based Association Rules (TARs).
2. Storing the mined TARs in an XML format. An index is also created to enable faster access to the knowledge during querying.
3. Transforming XML queries to operate on the mined TARs instead of the original XML document. This allows queries to be answered more quickly.
4. The method is evaluated on sample XML data and queries. Results show the approach answers queries significantly faster
The huge volume of text documents available on the internet has made it difficult to find valuable
information for specific users. In fact, the need for efficient applications to extract interested knowledge
from textual documents is vitally important. This paper addresses the problem of responding to user
queries by fetching the most relevant documents from a clustered set of documents. For this purpose, a
cluster-based information retrieval framework was proposed in this paper, in order to design and develop
a system for analysing and extracting useful patterns from text documents. In this approach, a pre-
processing step is first performed to find frequent and high-utility patterns in the data set. Then a Vector
Space Model (VSM) is performed to represent the dataset. The system was implemented through two main
phases. In phase 1, the clustering analysis process is designed and implemented to group documents into
several clusters, while in phase 2, an information retrieval process was implemented to rank clusters
according to the user queries in order to retrieve the relevant documents from specific clusters deemed
relevant to the query. Then the results are evaluated according to evaluation criteria. Recall and Precision
(P@5, P@10) of the retrieved results. P@5 was 0.660 and P@10 was 0.655.
This document provides a survey of file replication techniques used in grid systems. It begins with an introduction to grid systems and discusses their use of replication to improve response times and reduce bandwidth consumption. It then categorizes replication techniques as static or dynamic and describes challenges of replication including maintaining consistency and overhead. The document surveys various replication strategies for different grid topologies like peer-to-peer, tree and hybrid. It evaluates strategies based on factors like access latency, bandwidth consumption and fault tolerance. Specific replication techniques are discussed for peer-to-peer architectures aimed at availability, placement strategies and balancing workloads.
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"Guy K. Kloss
This document discusses the development of a grid data infrastructure called MataNui to manage large amounts of observational astronomical data and metadata from a collaboration between researchers in New Zealand and Japan. The infrastructure uses existing open-source tools like MongoDB, GridFTP, and the DataFinder GUI client to allow distributed storage and access of data while meeting requirements like handling large data volumes, metadata, and remote access. This approach provides a robust, reusable, and user-friendly system to address common data management challenges in scientific collaborations.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
An elastic , effective, activety or intelligent ,graceful networking architecture layout be desired to make processing massive data. next to that ,existent network architectures be considerably incapable for
cleatting the huge data. massive data thrusts network exchequers into border it consequence with in network overcrowding ,needy achievement, then permicious employer exprtises. this offered the current state-of-the-art research affronts ,potential solutions into huge data networking notion. More specifically, present the state of networking problems into massive data connected intrequirements,capacity,running ,
data manipulating also will introduce the architectures of MapReduce , Hadoop paradigm within research
requirements, fabric networks and software defined networks which utilizized into making today’s idly growing digital world and compare and contrast into identify relevant drawbacks and solutions.
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
The document summarizes research on vertical fragmentation, allocation, and re-fragmentation in distributed object relational database systems. It proposes an algorithm for vertical fragmentation and allocation that considers the usage of attributes and methods by queries at different sites. The algorithm forms usage matrices, calculates affinity between methods, clusters methods, and partitions the data into fragments that are allocated to sites where they see the most demand. It also describes handling update queries by redirecting them to a server for processing and then propagating the updates to relevant fragments.
This document describes a proposed tool called Warehouse Creator that can automatically generate data warehouses from heterogeneous data sources within an enterprise. The tool extracts data from various data sources like databases and files, integrates the data by generating dimension and fact tables, and provides a web interface for users to search and retrieve information from the warehouse without needing direct access to the underlying data sources. The tool aims to address issues like the need for users to have detailed knowledge of different data sources and query languages by providing a centralized warehouse that integrates data from multiple sources.
This document describes a technique for mining intentional knowledge from XML documents to improve query answering performance. The technique involves:
1. Parsing an XML document and generating a tree model. Frequent subtrees are then mined from the tree to extract Tree-based Association Rules (TARs).
2. Storing the mined TARs in an XML format. An index is also created to enable faster access to the knowledge during querying.
3. Transforming XML queries to operate on the mined TARs instead of the original XML document. This allows queries to be answered more quickly.
4. The method is evaluated on sample XML data and queries. Results show the approach answers queries significantly faster
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
This chapter discusses managing organizational data and information. It covers the traditional file environment and its problems, how databases provide a modern approach, database management systems, and logical data models including hierarchical, network and relational models. The key topics are data arrangement, traditional file problems like redundancy and inconsistency, how databases solve these with concepts like entities and relationships, data definition and manipulation languages, and the advantages of relational modeling.
Study on potential capabilities of a nodb systemijitjournal
There is a need of optimal data to query processing technique to handle the increasing database size,
complexity, diversity of use. With the introduction of commercial website, social network, expectations are
that the high scalability, more flexible database will replace the RDBMS. Complex application and Big
Table require highly optimized queries. Users are facing the increasing bottlenecks in their data analysis. A
growing part of the database community recognizes the need for significant and fundamental changes to
database design. A new philosophy for creating database systems called noDB aims at minimizing the datato-
query time, most prominently by removing the need to load data before launching queries. That will
process queries without any data preparation or loading steps. There may not need to store data. User can
pipe raw data from websites, DBs, excel sheets into two promise sample inputs without storing anything.
This study is based on PostgreSQL systems. A series of the baseline experiment are executed to evaluate the
Performance of this system as per -a. Data loading cost, b-Query processing timing, c-Avoidance of
Collision and Deadlock, d-Enabling the Big data storage and e-Optimize query processing etc. The study
found significant possible capabilities of noDB system over the traditional database management system.
This document provides a survey of techniques for transferring big data. It discusses using grids and parallel transfers to distribute large datasets. Grid computing allows for coordinated sharing of computational and storage resources across distributed systems. Parallel transfer techniques divide files into segments and transfer portions simultaneously from multiple servers to improve download speeds. However, these techniques require significant user involvement. The document then introduces a new NICE model for big data transfers. This store-and-forward approach transfers data to staging servers during periods of low network traffic to avoid impacting other users. It can accommodate different time zones and bandwidth variations between senders and receivers.
This document discusses cloud databases and database-as-a-service (DBaaS). It outlines the benefits of moving databases to the cloud, such as reduced costs and increased flexibility. Popular cloud databases mentioned include MySQL, PostgreSQL, Google CloudSQL, and MongoLab. The document also discusses features of cloud computing like on-demand self-service, broad network access, resource pooling, rapid elasticity, and monitored service. Associating databases with the cloud provides organizations with a flexible, always-available backend without worrying about hardware and software maintenance.
BIG DATA NETWORKING: REQUIREMENTS, ARCHITECTURE AND ISSUESijwmn
A flexible, efficient and secure networking architecture is required in order to process big data. However, existing network architectures are mostly unable to handle big data. As big data pushes network resources
to the limits it results in network congestion, poor performance, and detrimental user experiences. This paper presents the current state-of-the-art research challenges and possible solutions on big data networking theory. More specifically, we present the state of networking issues of big data related to
capacity, management and data processing. We also present the architectures of MapReduce and Hadoop paradigm with research challenges, fabric networks and software defined networks (SDN) that are used to handle today’s idly growing digital world and compare and contrast them to identify relevant problems and solutions.
BIG DATA NETWORKING: REQUIREMENTS, ARCHITECTURE AND ISSUESijwmn
A flexible, efficient and secure networking architecture is required in order to process big data. However,
existing network architectures are mostly unable to handle big data. As big data pushes network resources
to the limits it results in network congestion, poor performance, and detrimental user experiences. This
paper presents the current state-of-the-art research challenges and possible solutions on big data
networking theory. More specifically, we present the state of networking issues of big data related to
capacity, management and data processing. We also present the architectures of MapReduce and Hadoop
paradigm with research challenges, fabric networks and software defined networks (SDN) that are used to
handle today’s idly growing digital world and compare and contrast them to identify relevant problems and
solutions.
Leveraging Open Source Technologies to Enable Scientific Archiving and Discovery; Steve Hughes, NASA; Data Publication Repositories
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
NewSQL systems seek to provide the scalability of NoSQL for online transaction processing while maintaining the ACID guarantees of a traditional database. There are three defining properties of big data: volume, velocity, and variety. Volume refers to the large amounts of data created each day. Velocity measures how fast data comes in, which can be real-time or in batches. Variety means data now comes in non-traditional forms like video or from devices.
CONTENT BASED DATA TRANSFER MECHANISM FOR EFFICIENT BULK DATA TRANSFER IN GRI...ijgca
A new class of Data Grid infrastructure is needed to support management, transport, distributed access, and analysis of terabyte and peta byte of data collections by thousands of users. Even though some of the existing data management systems (DMS) of Grid computing infrastructures provides methodologies to handle bulk data transfer. These technologies are not usable in addressing some kind of simultaneous data
access requirements. Often, in most of the scientific computing environments, a common data will be needed to access from different locations. Further, most of such computing entities will wait for a common scientific data (such as a data belonging to an astronomical phenomenon) which will be published only
when it is available. These kinds of data access needs were not yet addressed in the design of data component Grid Access to Secondary Storage (GASS) or GridFTP. In this paper, we address an application layer content based data transfer scheme for grid computing environments. By using the
proposed scheme in a grid computing environment, we can simultaneously move bulk data in an efficient way using simple subscribe and publish mechanism.
Data Integration in Multi-sources Information Systemsijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Applications of SOA and Web Services in Grid Computingyht4ever
This document discusses applications of service-oriented architecture (SOA) and web services in grid computing. It provides an overview of key concepts like SOA, web services, OGSA, WSRF, and how they have evolved and been applied in grid computing. Specifically, it describes how early specifications like OGSI aimed to standardize grid services but faced issues aligning with web services standards. This led to the development of the Web Services Resource Framework (WSRF) to better integrate grid and web service standards by treating stateful resources as web services.
A scalabl e and cost effective framework for privacy preservation over big d...amna alhabib
This document proposes a scalable and cost-effective framework called SaC-FRAPP for preserving privacy over big data on the cloud. The key idea is to leverage cloud-based MapReduce to anonymize large datasets before releasing them to other parties. Anonymized datasets are then managed using HDFS to avoid re-computation costs. A prototype system is implemented to demonstrate that the framework can anonymize and manage anonymized big data sets in a highly scalable, efficient and cost-effective manner.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
DISTRIBUTED AND BIG DATA STORAGE MANAGEMENT IN GRID COMPUTINGijgca
Big data storage management is one of the most challenging issues for Grid computing environments, since large amount of data intensive applications frequently involve a high degree of data access locality. Grid applications typically deal with large amounts of data. In traditional approaches high-performance computing consists dedicated servers that are used to data storage and data replication. In this paper we present a new mechanism for distributed and big data storage and resource discovery services. Here we proposed an architecture named Dynamic and Scalable Storage Management (DSSM) architecture in grid environments. This allows in grid computing not only sharing the computational cycles, but also share the storage space. The storage can be transparently accessed from any grid machine, allowing easy data sharing among grid users and applications. The concept of virtual ids that, allows the creation of virtual spaces has been introduced and used. The DSSM divides all Grid Oriented Storage devices (nodes) into multiple geographically distributed domains and to facilitate the locality and simplify the intra-domain storage management. Grid service based storage resources are adopted to stack simple modular service piece by piece as demand grows. To this end, we propose four axes that define: DSSM architecture and algorithms description, Storage resources and resource discovery into Grid service, Evaluate purpose prototype system, dynamically, scalability, and bandwidth, and Discuss results. Algorithms at bottom and upper level for standardization dynamic and scalable storage management, along with higher bandwidths have been designed.
This document summarizes statistical disclosure control techniques for protecting private data, specifically microaggregation. Microaggregation involves clustering individual records into small groups to anonymize the data before release. It aims to minimize information loss while preventing re-identification of individuals. The document discusses challenges with multivariate microaggregation and reviews different heuristic approaches. It also covers related topics like k-anonymity algorithms, various clustering techniques for microaggregation like k-means, and using genetic algorithms to handle large datasets.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
This document summarizes research on palmprint identification. It begins by introducing palmprint biometrics and principal line features. It then summarizes several existing approaches that extract principal lines using techniques like finite radon transform, gradient images, and morphological operators. The proposed approach is described which uses Canny edge detection to extract principal lines based on edge direction. It preprocesses images, applies Canny edge detection, divides the output into blocks to generate templates, and performs matching. Experimental results on a public database achieve an accuracy of 86% for personal identification.
The document proposes an Earthquake Disaster Based Resource Scheduling (EDBRS) framework for efficiently allocating cloud computing resources during earthquake disasters. The framework aims to minimize execution costs and times of cloud workloads by prioritizing urgent workloads related to emergency response. It models the resource scheduling problem and considers factors like workload deadlines, resource speeds and costs. The framework also presents algorithms for optimally assigning equal-length and variable-length workloads across multiple public and private cloud resources to balance performance and cost. The goal is to efficiently allocate cloud resources to disaster response zones based on urgency to reduce loss of life during earthquakes.
The document summarizes notes from a PrintVis partner meeting in Milan on February 23rd 2009. It discusses Marvia, a web-based service that allows creating documents from flexible design templates. Marvia integrates with PrintVis and allows clients to customize workflows. Pricing plans include Personal, Premium and Pro tiers. The technology uses Amazon Web Services and is scalable and reliable.
O documento discute as leis que regem a vida humana e o propósito da existência. Argumenta que assim como as leis da ciência regem o mundo físico, leis divinas regem a vida e o sofrimento humano. Essas leis são os princípios morais evangélicos de Cristo, que a ciência está estudando. A vida tem como objetivo evoluir espiritualmente através da experiência, e não apenas o prazer, para construir o homem consciente.
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
This chapter discusses managing organizational data and information. It covers the traditional file environment and its problems, how databases provide a modern approach, database management systems, and logical data models including hierarchical, network and relational models. The key topics are data arrangement, traditional file problems like redundancy and inconsistency, how databases solve these with concepts like entities and relationships, data definition and manipulation languages, and the advantages of relational modeling.
Study on potential capabilities of a nodb systemijitjournal
There is a need of optimal data to query processing technique to handle the increasing database size,
complexity, diversity of use. With the introduction of commercial website, social network, expectations are
that the high scalability, more flexible database will replace the RDBMS. Complex application and Big
Table require highly optimized queries. Users are facing the increasing bottlenecks in their data analysis. A
growing part of the database community recognizes the need for significant and fundamental changes to
database design. A new philosophy for creating database systems called noDB aims at minimizing the datato-
query time, most prominently by removing the need to load data before launching queries. That will
process queries without any data preparation or loading steps. There may not need to store data. User can
pipe raw data from websites, DBs, excel sheets into two promise sample inputs without storing anything.
This study is based on PostgreSQL systems. A series of the baseline experiment are executed to evaluate the
Performance of this system as per -a. Data loading cost, b-Query processing timing, c-Avoidance of
Collision and Deadlock, d-Enabling the Big data storage and e-Optimize query processing etc. The study
found significant possible capabilities of noDB system over the traditional database management system.
This document provides a survey of techniques for transferring big data. It discusses using grids and parallel transfers to distribute large datasets. Grid computing allows for coordinated sharing of computational and storage resources across distributed systems. Parallel transfer techniques divide files into segments and transfer portions simultaneously from multiple servers to improve download speeds. However, these techniques require significant user involvement. The document then introduces a new NICE model for big data transfers. This store-and-forward approach transfers data to staging servers during periods of low network traffic to avoid impacting other users. It can accommodate different time zones and bandwidth variations between senders and receivers.
This document discusses cloud databases and database-as-a-service (DBaaS). It outlines the benefits of moving databases to the cloud, such as reduced costs and increased flexibility. Popular cloud databases mentioned include MySQL, PostgreSQL, Google CloudSQL, and MongoLab. The document also discusses features of cloud computing like on-demand self-service, broad network access, resource pooling, rapid elasticity, and monitored service. Associating databases with the cloud provides organizations with a flexible, always-available backend without worrying about hardware and software maintenance.
BIG DATA NETWORKING: REQUIREMENTS, ARCHITECTURE AND ISSUESijwmn
A flexible, efficient and secure networking architecture is required in order to process big data. However, existing network architectures are mostly unable to handle big data. As big data pushes network resources
to the limits it results in network congestion, poor performance, and detrimental user experiences. This paper presents the current state-of-the-art research challenges and possible solutions on big data networking theory. More specifically, we present the state of networking issues of big data related to
capacity, management and data processing. We also present the architectures of MapReduce and Hadoop paradigm with research challenges, fabric networks and software defined networks (SDN) that are used to handle today’s idly growing digital world and compare and contrast them to identify relevant problems and solutions.
BIG DATA NETWORKING: REQUIREMENTS, ARCHITECTURE AND ISSUESijwmn
A flexible, efficient and secure networking architecture is required in order to process big data. However,
existing network architectures are mostly unable to handle big data. As big data pushes network resources
to the limits it results in network congestion, poor performance, and detrimental user experiences. This
paper presents the current state-of-the-art research challenges and possible solutions on big data
networking theory. More specifically, we present the state of networking issues of big data related to
capacity, management and data processing. We also present the architectures of MapReduce and Hadoop
paradigm with research challenges, fabric networks and software defined networks (SDN) that are used to
handle today’s idly growing digital world and compare and contrast them to identify relevant problems and
solutions.
Leveraging Open Source Technologies to Enable Scientific Archiving and Discovery; Steve Hughes, NASA; Data Publication Repositories
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
NewSQL systems seek to provide the scalability of NoSQL for online transaction processing while maintaining the ACID guarantees of a traditional database. There are three defining properties of big data: volume, velocity, and variety. Volume refers to the large amounts of data created each day. Velocity measures how fast data comes in, which can be real-time or in batches. Variety means data now comes in non-traditional forms like video or from devices.
CONTENT BASED DATA TRANSFER MECHANISM FOR EFFICIENT BULK DATA TRANSFER IN GRI...ijgca
A new class of Data Grid infrastructure is needed to support management, transport, distributed access, and analysis of terabyte and peta byte of data collections by thousands of users. Even though some of the existing data management systems (DMS) of Grid computing infrastructures provides methodologies to handle bulk data transfer. These technologies are not usable in addressing some kind of simultaneous data
access requirements. Often, in most of the scientific computing environments, a common data will be needed to access from different locations. Further, most of such computing entities will wait for a common scientific data (such as a data belonging to an astronomical phenomenon) which will be published only
when it is available. These kinds of data access needs were not yet addressed in the design of data component Grid Access to Secondary Storage (GASS) or GridFTP. In this paper, we address an application layer content based data transfer scheme for grid computing environments. By using the
proposed scheme in a grid computing environment, we can simultaneously move bulk data in an efficient way using simple subscribe and publish mechanism.
Data Integration in Multi-sources Information Systemsijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Applications of SOA and Web Services in Grid Computingyht4ever
This document discusses applications of service-oriented architecture (SOA) and web services in grid computing. It provides an overview of key concepts like SOA, web services, OGSA, WSRF, and how they have evolved and been applied in grid computing. Specifically, it describes how early specifications like OGSI aimed to standardize grid services but faced issues aligning with web services standards. This led to the development of the Web Services Resource Framework (WSRF) to better integrate grid and web service standards by treating stateful resources as web services.
A scalabl e and cost effective framework for privacy preservation over big d...amna alhabib
This document proposes a scalable and cost-effective framework called SaC-FRAPP for preserving privacy over big data on the cloud. The key idea is to leverage cloud-based MapReduce to anonymize large datasets before releasing them to other parties. Anonymized datasets are then managed using HDFS to avoid re-computation costs. A prototype system is implemented to demonstrate that the framework can anonymize and manage anonymized big data sets in a highly scalable, efficient and cost-effective manner.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
DISTRIBUTED AND BIG DATA STORAGE MANAGEMENT IN GRID COMPUTINGijgca
Big data storage management is one of the most challenging issues for Grid computing environments, since large amount of data intensive applications frequently involve a high degree of data access locality. Grid applications typically deal with large amounts of data. In traditional approaches high-performance computing consists dedicated servers that are used to data storage and data replication. In this paper we present a new mechanism for distributed and big data storage and resource discovery services. Here we proposed an architecture named Dynamic and Scalable Storage Management (DSSM) architecture in grid environments. This allows in grid computing not only sharing the computational cycles, but also share the storage space. The storage can be transparently accessed from any grid machine, allowing easy data sharing among grid users and applications. The concept of virtual ids that, allows the creation of virtual spaces has been introduced and used. The DSSM divides all Grid Oriented Storage devices (nodes) into multiple geographically distributed domains and to facilitate the locality and simplify the intra-domain storage management. Grid service based storage resources are adopted to stack simple modular service piece by piece as demand grows. To this end, we propose four axes that define: DSSM architecture and algorithms description, Storage resources and resource discovery into Grid service, Evaluate purpose prototype system, dynamically, scalability, and bandwidth, and Discuss results. Algorithms at bottom and upper level for standardization dynamic and scalable storage management, along with higher bandwidths have been designed.
This document summarizes statistical disclosure control techniques for protecting private data, specifically microaggregation. Microaggregation involves clustering individual records into small groups to anonymize the data before release. It aims to minimize information loss while preventing re-identification of individuals. The document discusses challenges with multivariate microaggregation and reviews different heuristic approaches. It also covers related topics like k-anonymity algorithms, various clustering techniques for microaggregation like k-means, and using genetic algorithms to handle large datasets.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
This document summarizes research on palmprint identification. It begins by introducing palmprint biometrics and principal line features. It then summarizes several existing approaches that extract principal lines using techniques like finite radon transform, gradient images, and morphological operators. The proposed approach is described which uses Canny edge detection to extract principal lines based on edge direction. It preprocesses images, applies Canny edge detection, divides the output into blocks to generate templates, and performs matching. Experimental results on a public database achieve an accuracy of 86% for personal identification.
The document proposes an Earthquake Disaster Based Resource Scheduling (EDBRS) framework for efficiently allocating cloud computing resources during earthquake disasters. The framework aims to minimize execution costs and times of cloud workloads by prioritizing urgent workloads related to emergency response. It models the resource scheduling problem and considers factors like workload deadlines, resource speeds and costs. The framework also presents algorithms for optimally assigning equal-length and variable-length workloads across multiple public and private cloud resources to balance performance and cost. The goal is to efficiently allocate cloud resources to disaster response zones based on urgency to reduce loss of life during earthquakes.
The document summarizes notes from a PrintVis partner meeting in Milan on February 23rd 2009. It discusses Marvia, a web-based service that allows creating documents from flexible design templates. Marvia integrates with PrintVis and allows clients to customize workflows. Pricing plans include Personal, Premium and Pro tiers. The technology uses Amazon Web Services and is scalable and reliable.
O documento discute as leis que regem a vida humana e o propósito da existência. Argumenta que assim como as leis da ciência regem o mundo físico, leis divinas regem a vida e o sofrimento humano. Essas leis são os princípios morais evangélicos de Cristo, que a ciência está estudando. A vida tem como objetivo evoluir espiritualmente através da experiência, e não apenas o prazer, para construir o homem consciente.
A Survey of File Replication Techniques In Grid SystemsEditor IJCATR
Grid is a type of parallel and distributed systems that is designed to provide reliable access to data
and computational resources in wide area networks. These resources are distributed in different geographical
locations. Efficient data sharing in global networks is complicated by erratic node failure, unreliable network
connectivity and limited bandwidth. Replication is a technique used in grid systems to improve the
applications’ response time and to reduce the bandwidth consumption. In this paper, we present a survey on
basic and new replication techniques that have been proposed by other researchers. After that, we have a full
comparative study on these replication strategies.
A Survey of File Replication Techniques In Grid SystemsEditor IJCATR
Grid is a type of parallel and distributed systems that is designed to provide reliable access to data
and computational resources in wide area networks. These resources are distributed in different geographical
locations. Efficient data sharing in global networks is complicated by erratic node failure, unreliable network
connectivity and limited bandwidth. Replication is a technique used in grid systems to improve the
applications’ response time and to reduce the bandwidth consumption. In this paper, we present a survey on
basic and new replication techniques that have been proposed by other researchers. After that, we have a full
comparative study on these replication strategies
With the rapid development in Geographic Information Systems (GISs) and their applications, more and
more geo-graphical databases have been developed by different vendors. However, data integration and
accessing is still a big problem for the development of GIS applications as no interoperability exists among
different spatial databases. In this paper we propose a unified approach for spatial data query. The paper
describes a framework for integrating information from repositories containing different vector data sets
formats and repositories containing raster datasets. The presented approach converts different vector data
formats into a single unified format (File Geo-Database “GDB”). In addition, we employ “metadata” to
support a wide range of users’ queries to retrieve relevant geographic information from heterogeneous and
distributed repositories. Such an employment enhances both query processing and performance.
An Efficient Approach to Manage Small Files in Distributed File SystemsIRJET Journal
This document proposes an efficient approach to manage small files in distributed file systems. It discusses challenges in storing and retrieving a large number of small files due to their metadata size. The proposed system aims to improve performance by designing a new metadata structure that decreases original metadata size and increases file access speed. It presents a system architecture with clients, metadata server and data servers. The system classifies files into different storage types and combines existing techniques into a single architecture. It provides algorithms for reading local metadata and file data in two phases. The performance of the approach is evaluated on a distributed file system environment.
Big data service architecture: a surveyssuser0191d4
This document discusses big data service architecture. It begins with an introduction to big data services and their economic benefits. It then describes the key components of big data service architecture, including data collection and storage, data processing, and applications. For data collection and storage, it covers Extract-Transform-Load tools, distributed file systems, and NoSQL databases. For data processing, it discusses batch, stream, and hybrid processing frameworks like MapReduce, Storm, and Spark. It concludes by noting big data applications in various fields and cloud computing services for big data.
The document discusses security issues in distributed database systems. It begins by defining distributed databases and their architecture. It then discusses three main security aspects: access control, authentication, and encryption. The document also discusses distributed database system design considerations like concurrency control and data fragmentation. Emerging security tools for distributed databases mentioned include data warehousing, data mining, collaborative computing, distributed object systems, and web applications. Maintaining security when building and querying data warehouses from multiple sources is highlighted as a key challenge.
This document discusses using Hidden Markov Model (HMM) forward chaining techniques for prefetching in distributed file systems (DFS) for cloud computing. It begins by introducing DFS for cloud storage and issues like load balancing. It then discusses using HMM to analyze client I/O and predict future requests to prefetch relevant data. The HMM forward algorithm would be used to prefetch data from storage servers to clients proactively. This could improve performance by reducing client wait times for requested data in DFS for cloud applications.
A novel cloud storage system with support of sensitive data applicationijmnct
Most users are willing to store their data in the c
loud storage system and use many facilities of clou
d. But
their sensitive data applications faces with potent
ial serious security threats. In this paper, securi
ty
requirements of sensitive data application in the c
loud are analyzed and improved structure for the ty
pical
cloud storage system architecture is proposed. The
hardware USB-Key is used in the proposed architectu
re
for purpose of enhancing security of user identity
and interaction security between the users and the
cloud
storage system. Moreover, drawn from the idea of da
ta active protection, a data security container is
introduced in the system to enhance the security of
the data transmission process; by encapsulating th
e
encrypted data, increasing appropriate access contr
ol and data management functions. The static data
blocks are replaced with a dynamic executable data
security container. Then, an enhanced security
architecture for software of cloud storage terminal
is proposed for more adaptation with the user's sp
ecific
requirements, and its functions and components can
be customizable. Moreover, the proposed architectur
e
have capability of detecting whether the execution
environment is according with the pre-defined
environment requirements.
A relational model of data for large shared data banksSammy Alvarez
This document introduces the relational model of data organization for large shared databases. It discusses inadequacies of existing tree-structured and network models, including ordering, indexing, and access path dependencies that impair data independence. The relational model represents data as mathematical n-ary relations and relationships between domains, providing independence from representation changes. It allows a clearer evaluation of existing systems and competing internal representations. The relational view forms a basis for treating issues like derivability, redundancy, and consistency in a sound way.
Survey on Synchronizing File Operations Along with Storage Scalable MechanismIRJET Journal
The document summarizes research on efficient file operations and storage scalability mechanisms. It discusses how data is divided into chunks and distributed to nodes for transmission in peer-to-peer networks. The proposed system aims to provide efficient load balancing, eliminate single points of failure, and ensure synchronization and security during data transmission. It uses synchronization algorithms and a hybrid distribution model combining features of peer-to-peer and client-server networks. The system is designed to securely handle insertions, deletions, splits, and concatenations of file chunks in a distributed storage system.
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET Journal
The document discusses techniques for detecting similarity and deduplication in document analysis using vector analysis. It proposes analyzing documents by extracting abstract content, separating words and combining them in a word cloud to determine frequency. This approach aims to identify whether documents are duplicates by analyzing word vectors at the word, sentence and paragraph level while also applying techniques like stemming, stopping words and semantic similarity.
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET Journal
The document discusses techniques for detecting similarity and deduplication in document analysis using vector analysis. It proposes analyzing documents by extracting abstract content, separating words and combining them in a word cloud to determine frequency. This approach aims to identify whether documents are duplicates by analyzing word vectors at the word, sentence and paragraph level while also applying techniques like stemming, stopping words and semantic similarity.
A systematic review of in-memory database over multi-tenancyIJECEIAES
The significant cost and time are essential to obtain a comprehensive response, the response time to a query across a peer-to-peer database is one of the most challenging issues. This is particularly exact when dealing with large-scale data processing, where the traditional approach of processing data on a single machine may not be sufficient. The need for a scalable, reliable, and secure data processing system is becoming increasingly important. Managing a single in-memory database instance for multiple tenants is often easier than managing separate databases for each tenant. The research work is focused on scalability with multi-tenancy and more efficiency with a faster querying performance using in-memory database approach. We compare the performance of a row-oriented approach and column-oriented approach on our benchmark human resources (HR) schema using Oracle TimesTen in-memory database. Also, we captured some of the key advantages on optimization dimension(s) are the traditional approach, late-materialization, compression and invisible join on column-store (c-store) and row-base. When compression and late materialization are enabled in a query set; it improves the overall performance of query sets. In particular, the paper aims to elucidate the motivations behind multi-tenant application requirements concerning the database engine and highlight major designs over in-memory database for the tenancy approach on cloud.
A Reconfigurable Component-Based Problem Solving EnvironmentSheila Sinclair
This technical report describes a reconfigurable component-based problem solving environment called DISCWorld. The key features discussed are:
1) DISCWorld uses a data flow model represented as directed acyclic graphs (DAGs) of operators to integrate distributed computing components across networks.
2) It supports both long running simulations and parameter search applications by allowing complex processing requests to be composed graphically or through scripting and executed on heterogeneous platforms.
3) Operators can be simple "pure Java" implementations or wrappers to fast platform-specific implementations, and some operators may represent sub-graphs that can be reconfigured to run across multiple servers for faster execution.
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
Apriori is one of the key algorithms to generate frequent itemsets. Analysing frequent itemset is a crucial
step in analysing structured data and in finding association relationship between items. This stands as an
elementary foundation to supervised learning, which encompasses classifier and feature extraction
methods. Applying this algorithm is crucial to understand the behaviour of structured data. Most of the
structured data in scientific domain are voluminous. Processing such kind of data requires state of the art
computing machines. Setting up such an infrastructure is expensive. Hence a distributed environment
such as a clustered setup is employed for tackling such scenarios. Apache Hadoop distribution is one of
the cluster frameworks in distributed environment that helps by distributing voluminous data across a
number of nodes in the framework. This paper focuses on map/reduce design and implementation of
Apriori algorithm for structured data analysis.
The document discusses the Open Grid Services Architecture (OGSA) standard. It describes OGSA's layered architecture including the physical/logical resources layer, web services layer using OGSI, OGSA services layer for core, program execution and data services, and applications layer. It also outlines the functional requirements of OGSA such as interoperability, resource sharing, optimization, quality of service, job execution, data services, security, cost reduction, scalability, and availability.
CLOUD STORAGE AND RETRIEVAL USING BLOCKCHAINIRJET Journal
This document discusses cloud storage and retrieval using blockchain. It begins with an abstract that outlines the benefits of decentralized storage systems like improved security, accessibility, and reduced costs compared to traditional centralized storage. The document then discusses challenges with data privacy and security in third-party cloud storage. It proposes a system that would encrypt and distribute data across multiple nodes on a blockchain network. This would improve security, privacy, availability and allow for unused storage resources to be utilized. The document reviews several related works applying blockchain concepts like encryption, smart contracts and distributed ledgers to decentralized storage and highlights some limitations of existing approaches.
A New Architecture for Group Replication in Data GridEditor IJCATR
Nowadays, grid systems are vital technology for programs running with high performance and problems solving with largescale
in scientific, engineering and business. In grid systems, heterogeneous computational resources and data should be shared
between independent organizations that are scatter geographically. A data grid is a kind of grid types that make relations computational
and storage resources. Data replication is an efficient way in data grid to obtain high performance and high availability by saving
numerous replicas in different locations e.g. grid sites. In this research, we propose a new architecture for dynamic Group data
replication. In our architecture, we added two components to OptorSim architecture: Group Replication Management component
(GRM) and Management of Popular Files Group component (MPFG). OptorSim developed by European Data Grid projects for
evaluate replication algorithm. By using this architecture, popular files group will be replicated in grid sites at the end of each
predefined time interval.
The document discusses privacy and security issues related to cloud storage. It proposes a new privacy-preserving auditing scheme for cloud storage that uses an interactive challenge-response protocol and verification protocol. This allows a third party auditor to verify the integrity and identify corrupted data for a cloud storage user, while preserving data privacy. The scheme aims to be efficient, lightweight and privacy-preserving. Experimental results show the protocol is efficient and achieves its goals.
Electrically small antennas: The art of miniaturizationEditor IJARCET
We are living in the technological era, were we preferred to have the portable devices rather than unmovable devices. We are isolating our self rom the wires and we are becoming the habitual of wireless world what makes the device portable? I guess physical dimensions (mechanical) of that particular device, but along with this the electrical dimension is of the device is also of great importance. Reducing the physical dimension of the antenna would result in the small antenna but not electrically small antenna. We have different definition for the electrically small antenna but the one which is most appropriate is, where k is the wave number and is equal to and a is the radius of the imaginary sphere circumscribing the maximum dimension of the antenna. As the present day electronic devices progress to diminish in size, technocrats have become increasingly concentrated on electrically small antenna (ESA) designs to reduce the size of the antenna in the overall electronics system. Researchers in many fields, including RF and Microwave, biomedical technology and national intelligence, can benefit from electrically small antennas as long as the performance of the designed ESA meets the system requirement.
This document provides a comparative study of two-way finite automata and Turing machines. Some key points:
- Two-way finite automata are similar to read-only Turing machines in that they have a finite tape that can be read in both directions, but cannot write to the tape.
- Turing machines have an infinite tape that can be read from and written to, allowing them to recognize recursively enumerable languages.
- Both models are examined in their ability to accept the regular language L={anbm|m,n>0}.
- The time complexity of a two-way finite automaton for this language is O(n2) due to making two passes over the
This document analyzes and compares the performance of the AODV and DSDV routing protocols in a vehicular ad hoc network (VANET) simulation. Simulations were conducted using NS-2, SUMO, and MOVE simulators for a grid map scenario with varying numbers of nodes. The results show that AODV performed better than DSDV in terms of throughput and packet delivery fraction, while DSDV had lower end-to-end delays. However, neither protocol was found to be fully suitable for the highly dynamic VANET environment. The document concludes that further work is needed to develop improved routing protocols optimized for VANETs.
This document discusses the digital circuit layout problem and approaches to solving it using graph partitioning techniques. It begins by introducing the digital circuit layout problem and how it has become more complex with increasing circuit sizes. It then discusses how the problem can be decomposed into subproblems using graph partitioning to assign geometric coordinates to circuit components. The document reviews several traditional approaches to solve the problem, such as the Kernighan-Lin algorithm, and discusses their limitations for larger circuit sizes. It also discusses more recent approaches using evolutionary algorithms and concludes by analyzing the contributions of various approaches.
This document summarizes various data mining techniques that have been used for intrusion detection systems. It first describes the architecture of a data mining-based IDS, including sensors to collect data, detectors to evaluate the data using detection models, a data warehouse for storage, and a model generator. It then discusses supervised and unsupervised learning approaches that have been applied, including neural networks, support vector machines, K-means clustering, and self-organizing maps. Finally, it reviews several related works applying these techniques and compares their results, finding that combinations of approaches can improve detection rates while reducing false alarms.
This document provides an overview of speech recognition systems and recent progress in the field. It discusses different types of speech recognition including isolated word, connected word, continuous speech, and spontaneous speech. Various techniques used in speech recognition are also summarized, such as simulated evolutionary computation, artificial neural networks, fuzzy logic, Kalman filters, and Hidden Markov Models. The document reviews several papers published between 2004-2012 that studied speech recognition methods including using dynamic spectral subband centroids, Kalman filters, biomimetic computing techniques, noise estimation, and modulation filtering. It concludes that Hidden Markov Models combined with MFCC features provide good recognition results for large vocabulary, speaker-independent, continuous speech recognition.
This document discusses integrating two assembly lines, Line A and Line B, based on lean line design concepts to reduce space and operators. It analyzes the current state of the lines using tools like takt time analysis and MTM/UAS studies. Improvements are identified to eliminate waste, including methods improvements, workplace rearrangement, ergonomic changes, and outsourcing. Paper kaizen is conducted and work elements are retimed. The goal is to integrate the lines to better utilize space and manpower while meeting manufacturing standards.
This document summarizes research on the exposure of microwaves from cellular networks. It describes how microwaves interact with biological systems and discusses measurement techniques and safety standards regarding microwave exposure. While some studies have alleged health hazards from microwaves, independent reviews by health organizations have found no evidence that exposure to microwaves below international safety limits causes harm. The document concludes that with precautions like limiting exposure time and using phones with lower SAR ratings, microwaves from cell phones pose minimal health risks.
This document summarizes a research paper that examines the effect of feature reduction in sentiment analysis of online reviews. It uses principle component analysis to reduce the number of features (product attributes) from a dataset of 500 camera reviews labeled as positive or negative. Two models are developed - one using the original set of 95 product attributes, and one using the reduced set. Support vector machines and naive Bayes classifiers are applied to both models and their performance is evaluated to determine if classification accuracy can be maintained while using fewer features. The results show it is possible to achieve similar accuracy levels with less features, improving computational efficiency.
This document provides a review of multispectral palm image fusion techniques. It begins with an introduction to biometrics and palm print identification. Different palm print images capture different spectral information about the palm. The document then reviews several pixel-level fusion methods for combining multispectral palm images, finding that Curvelet transform performs best at preserving discriminative patterns. It also discusses hardware for capturing multispectral palm images and the process of region of interest extraction and localization. Common fusion methods like wavelet transform and Curvelet transform are also summarized.
This document describes a vehicle theft detection system that uses radio frequency identification (RFID) technology. The system involves embedding an RFID chip in each vehicle that continuously transmits a unique identification signal. When a vehicle is stolen, the owner reports it to the police, who upload the vehicle's information to a central database. Police vehicles are equipped with RFID receivers. If a stolen vehicle passes within range of a receiver, the receiver detects the vehicle's ID signal and displays its details on a tablet. This allows police to quickly identify and recover stolen vehicles. The system aims to make it difficult for thieves to hide a vehicle's identity and allows vehicles to be tracked globally wherever the detection system is implemented.
This document discusses and compares two techniques for image denoising using wavelet transforms: Dual-Tree Complex DWT and Double-Density Dual-Tree Complex DWT. Both techniques decompose an image corrupted by noise using filter banks, apply thresholding to the wavelet coefficients, and reconstruct the image. The Double-Density Dual-Tree Complex DWT yields better denoising results than the Dual-Tree Complex DWT as it produces more directional wavelets and is less sensitive to shifts and noise variance. Experimental results on test images demonstrate that the Double-Density method achieves higher peak signal-to-noise ratios, especially at higher noise levels.
This document compares the k-means and grid density clustering algorithms. It summarizes that grid density clustering determines dense grids based on the densities of neighboring grids, and is able to handle different shaped clusters in multi-density environments. The grid density algorithm does not require distance computation and is not dependent on the number of clusters being known in advance like k-means. The document concludes that grid density clustering is better than k-means clustering as it can handle noise and outliers, find arbitrary shaped clusters, and has lower time complexity.
This document proposes a method for detecting, localizing, and extracting text from videos with complex backgrounds. It involves three main steps:
1. Text detection uses corner metric and Laplacian filtering techniques independently to detect text regions. Corner metric identifies regions with high curvature, while Laplacian filtering highlights intensity discontinuities. The results are combined through multiplication to reduce noise.
2. Text localization then determines the accurate boundaries of detected text strings.
3. Text binarization filters background pixels to extract text pixels for recognition. Thresholding techniques are used to convert localized text regions to binary images.
The method exploits different text properties to detect text using corner metric and Laplacian filtering. Combining the results improves
This document describes the design and implementation of a low power 16-bit arithmetic logic unit (ALU) using clock gating techniques. A variable block length carry skip adder is used in the arithmetic unit to reduce power consumption and improve performance. The ALU uses a clock gating circuit to selectively clock only the active arithmetic or logic unit, reducing dynamic power dissipation from unnecessary clock charging/discharging. The ALU was simulated in VHDL and synthesized for a Xilinx Spartan 3E FPGA, achieving a maximum frequency of 65.19MHz at 1.98mW power dissipation, demonstrating improved performance over a conventional ALU design.
This document describes using particle swarm optimization (PSO) and genetic algorithms (GA) to tune the parameters of a proportional-integral-derivative (PID) controller for an automatic voltage regulator (AVR) system. PSO and GA are used to minimize the objective function by adjusting the PID parameters to achieve optimal step response with minimal overshoot, settling time, and rise time. The results show that PSO provides high-quality solutions within a shorter calculation time than other stochastic methods.
This document discusses implementing trust negotiations in multisession transactions. It proposes a framework that supports voluntary and unexpected interruptions, allowing negotiating parties to complete negotiations despite temporary unavailability of resources. The Trust-x protocol addresses issues related to validity, temporary loss of data, and extended unavailability of one negotiator. It allows a peer to suspend an ongoing negotiation and resume it with another authenticated peer. Negotiation portions and intermediate states can be safely and privately passed among peers to guarantee stability for continued suspended negotiations. An ontology is also proposed to provide formal specification of concepts and relationships, which is essential in complex web service environments for sharing credential information needed to establish trust.
This document discusses and compares various nature-inspired optimization algorithms for resolving the mixed pixel problem in remote sensing imagery, including Biogeography-Based Optimization (BBO), Genetic Algorithm (GA), and Particle Swarm Optimization (PSO). It provides an overview of each algorithm, explaining key concepts like migration and mutation in BBO. The document aims to prove that BBO is the best algorithm for resolving the mixed pixel problem by comparing it to other evolutionary algorithms. It also includes figures illustrating concepts like the species model and habitat in BBO.
This document discusses principal component analysis (PCA) for face recognition. It begins with an introduction to face recognition and PCA. PCA works by calculating eigenvectors from a set of face images, which represent the principal components that account for the most variance in the image data. These eigenvectors are called "eigenfaces" and can be used to reconstruct the face images. The document then discusses how the system is implemented, including preparing a face database, normalizing the training images, calculating the eigenfaces/principal components, projecting the face images into this reduced space, and recognizing faces by calculating distances between projected test images and training images.
This document summarizes research on using wireless sensor networks to detect mobile targets. It discusses two optimization problems: 1) maximizing the exposure of the least exposed path within a sensor budget, and 2) minimizing sensor installation costs while ensuring all paths have exposure above a threshold. It proposes using tabu search heuristics to provide near-optimal solutions. The research also addresses extending the models to consider wireless connectivity, heterogeneous sensors, and intrusion detection using a game theory approach. Experimental results show the proposed mobile replica detection scheme can rapidly detect replicas with no false positives or negatives.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Webinar: Designing a schema for a Data WarehouseFederico Razzoli
Are you new to data warehouses (DWH)? Do you need to check whether your data warehouse follows the best practices for a good design? In both cases, this webinar is for you.
A data warehouse is a central relational database that contains all measurements about a business or an organisation. This data comes from a variety of heterogeneous data sources, which includes databases of any type that back the applications used by the company, data files exported by some applications, or APIs provided by internal or external services.
But designing a data warehouse correctly is a hard task, which requires gathering information about the business processes that need to be analysed in the first place. These processes must be translated into so-called star schemas, which means, denormalised databases where each table represents a dimension or facts.
We will discuss these topics:
- How to gather information about a business;
- Understanding dictionaries and how to identify business entities;
- Dimensions and facts;
- Setting a table granularity;
- Types of facts;
- Types of dimensions;
- Snowflakes and how to avoid them;
- Expanding existing dimensions and facts.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
1771 1775
1. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, No 5, May 2013
1771
www.ijarcet.org
Abstract— In real world applications, data sharing and
integration system design is challenging task with the problem.
The system currently is only used to integrate and query on
structured data (e.g., data of a relational database). To solve this
problem, this system proposed an integrated environment that is
provided for accessing, querying and sharing structured data
(e.g., data of a relational database) and unstructured file-based
data (e.g., data stored in a text or binary file). The system is
provided via Open Grid Service Architecture Data Access and
Integration (OGSA-DAI) services and supported by the Globus
Toolkit and that is allowed database operations on both the
structured and unstructured data.
Index Terms— OGSA-DAI, structured data, unstructured
file-based data, data integration and Globus toolkit
I. INTRODUCTION
In a variety of scientific fields, grid had been developed with
storage, processing, and availability of data. A grid is a
collection of distributed computing resources available over a
local or wide-area network that appears to an end user or
application as one large virtual computing system. In many
research purposes, data should not only be stored and rather
needs to be readily accessible and integrated. Both of
structured and unstructured data are included in resources
although the considered shared resources are only files or
unstructured data. Data can be stored in various formats,
including in a relational database or as a file. A relational
database can include a collection of relations, frequently
known as tables that can correspond to a logical structure in
which data can be stored. Unstructured data consists of any
data stored in an unstructured format at an atomic level.
Furthermore, unstructured data can be divided into two basic
categories: bitmap objects (such as video, image, and audio
files) and textual objects (such as spreadsheets, presentations,
documents and email). Both of them can be treated as a string
of bits. The unstructured data is usually managed by operating
system.
The system provided the data abstraction to overcome
issues of heterogeneity of data sets including relational tables
and text files (comma separated files, CSV) that achieved via
Open Grid Services Architecture Data Access and Integration
(OGSA-DAI) services and supported by the Globus Toolkit.
Manuscript received May, 2013.
Ms. Thu Zar Mon, Faculty of Information and Communication
Technology, University of Technology(Yatanarpon Cyber City), Pyin Oo
Lwin, Myanmar, 09-448542823,
The Globus provide diverse services related to security and
data management based on standard specifications of OGSA.
Globus Toolkit is the most popular middleware for file-based
data access and sharing. OGSA-DAI is a middleware which
has adopted a service oriented architecture (SOA) solution for
integration data and grids through the use of web services.
When a client wants to make a request to an OGSA-DAI data
service, it invokes a web service operation on the data service
using a perform document. A perform document is an XML
document describing the request that the client wants to be
executed defined by linking together a sequence of activities.
An activity is an OGSA-DAI construct corresponding to a
specific task that should be performed. The output of one
activity can be linked to the input of another to perform a
number of tasks in sequence. A range of activities is supported
by OGSA-DAI, falling into the broad categories of relational
activities, XML activities, delivery activities, transformation
activities and file activities [10]. To support the concept of
data grid, the system uses a series of protocols and services
(middleware) as well as a virtual repository. The data sources
are hosted in MySQl Server 5.0, Oracle 10g and Microsoft
SQL Server 2005 databases. The three main nodes are
connected via Fast Ethernet switch (100Mbps). Data-sharing
among large natural resource and environment supports the
ability to find and acquire the desired data quickly by
data-sharing among large natural resource and environment.
The structure of this paper is as follows. Section II
introduces related works of the proposed system. Overview
and Implementation of the system are discussed in Section III
and IV. Section V describes the system operation. This is
followed by some current applications. Section VI concludes
the paper.
II. RELATED WORKS
A data integration system is an automated method for
querying across multiple heterogeneous databases in a
uniform way. There are many researches for data integration
solution should be transparent to the user. They provide a
convenient way of exposing data resources and often used to
implement wrappers. In essence, a mediated schema is
created to represent a particular application domain and data
sources are mapped as views over the mediated schema that
approached in research [1], [2] and [3].
The first proposal to use distributed query processing in a
grid setting was the Polar* that was accessed using a
grid-enabled version of Message Passing Interface (MPI) and
Design and Implementation of Structured and
Unstructured Data Querying System in
Heterogeneous Environment
Thu Zar Mon
2. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, No 5, May 2013
www.ijarcet.org
1772
it supported execution of distributed queries. GRelC-Data
Gather Service (DGS) [5] was another middleware approach
for database access, management and integration.
All the approaches described above only used to
integrate and query on structured data. They still have gaps in
transparency both of structured and unstructured data.
Moreover, a major difficulty in database and files sharing
stems from the inherent semantic and heterogeneity
environment. Enabling the sharing and querying of
distributed data without using a global schema is also one of
the challenging tasks towards grid research community.
III. OVERVIEW OF PROPOSED SYSTEM
The overview of proposed system consists of three parts:
(1) Data Grid Infrastructure, (2) Run Time System, and (3)
User Interface.
1) The Data Grid Infrastructure consists of the grid
middleware, which is achieved via Open Grid Services
Architecture Data Access and Integration (OGSA -
DAI) services and supported by the Globus toolkit
deployed in different nodes of the network.
2) Run-Time System consists of web based user interface
and document repository that are components
deployed in a Web server (Tomcat Server) developed
in Java technologies.
3) The User Interface from which users can submit search
queries and the results back.
Figure1. Overview of Proposed System
A. Method
A method comprising: receiving structured data and
unstructured data; integrating the structured data and the
unstructured data, including: generating metadata from the
unstructured data; and based on the generated metadata,
configuring an abstraction layer to perform a database
operation on both the structured data and the unstructured
data; and providing an integrated view of the structured data
and the unstructured data for display, the integrated view
including a user interface for allowing the user to control the
database operation of the abstraction layer.
The main objectives of this paper is the ability (1) to find
and acquire the desired data quickly by data-sharing among
large natural resource and environment (2) to solve lack of
collaboration and storage capability problem (3) to deploy the
integration framework in a service base grid architecture (4)
to develop a decentralized framework for integration of
heterogeneous data sources meeting the requirements of
scalability, robustness and autonomy.
B. Data Abstraction Layer
Both the structured and unstructured data are allowed by a
data abstraction layer. Inside the implementation of the
OGSA-DAI server, a node has been selected as coordinator of
the abstraction and unstructured data request process on this
layer. The layer can provide document catalog that can be
associated with schema which can define a set of fields that
can track the most common attributes of documents, for e.g.;
file name, file type, file size, creation date, modification date
and metadata including author, title, subject and copyright
information.
Data abstraction layer can include search interface that can
provide functions of basic search and can organize generation
of an index based on full text of the document being uploaded.
The basic search can query both attributes of structured data
and content of the unstructured data.
Generating the metadata from the unstructured data can
include extracting the metadata from the document, and
incorporating user created document attributes into the
extracted metadata. Extracting the metadata can include
determining a file type of the document. Configuring the
abstraction layer can include storing metadata of the
document and a reference of the document into one or more
document description data fields of the database.
The system can insert the metadata and a document
reference that refers to the unstructured data into a table of the
database. The document reference can include a file name and
a path relative to a root directory of the document repository.
C. Flow Diagram of needed functionality
Query the database
Create tuples from
information provided in a
user-friendly format
Separate coordinates from
filenames
Deliver files to FTP
server
Read files from the file
system
Deliver coordinates to the
request status
Figure 2. Functionality Flow Diagram
A relational database contains information about files
1) Coordinates Columns e.g, x,y,z
2) Filename Column e.g, filename
To provide data access and integration, the system uses
the globus toolkit and OGSA-DAI software as the grid
middleware and accesses flat files like CSV, EMBL and
SwissPort files. The screenshots of interfaces are at present
3. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, No 5, May 2013
1773
www.ijarcet.org
stored in a file system and their path is stored in the database.
Usually the database and file servers are kept behind
firewalls. One possibility to evade the problem of severe
security threat is to use OGSA-DAI to deploy the database
and to use GridFTP or Grid Reliable File Transfer in globus
toolkit to move files around in grid. In both cases the database
and file servers just need to open ports for limited number of
machines where OGSA-DAI and GridFTP servers are
running.
D. Architecture of the Model
The architecture of the model is shown in figure 3.
Database are deployed as data service resources, which
contain all the information about the database like their
physical location and ports, the JDBC drivers that are
required to access the database and the user access rights. A
data services exposes the data service resource in a web
container, which could be a globus container or Apache
Tomcat server.
User Interface
Toolkit APIs
Functional Service Components
OGSA-DAI
Service
Metadata
Service
Transaction
Service
Query
Service
OGSA-DAI Core
Grid data
service
Grid data
service
Grid data
service
Grid data
service
Oracle MySQL SQL Server Files
Figure 3. Architecture of the model
The data sources are hosted in MySQl Server 5.0, Oracle
10g and Microsoft SQL Server 2005 databases. The three
main nodes are connected via Fast Ethernet switch
(100Mbps). Data-sharing among large natural resource and
environment supports the ability to find and acquire the
desired data quickly by data-sharing among large natural
resource and environment. The data service resources
perform on behalf of a client. Factory is a service to create a
data service instance to access a specific data source.
E. Process of OGSA-DAI
Figure 4. OGSA-DAI process
The process of OGSA-DAI is shown in figure 4.
GDS (Grid Data Service): Provides the access end point for
a client and holds the client session with that data resource. A
GDS is created by a GDSF.
GDSF (Grid Data Service Factory): Is defined to represent
the point of a data resource on a grid. It is through a GDSF
that a data resource‟s capabilities and metadata are exposed.
DAISGRs (Data Access Integration Service Grid
Registry): GDSF‟s may be located on the grid through the
use of DAISGR with which GDSFs may register to expose
their capabilities and metadata to aid service/data discovery.
The client sends a XML document called perform document,
which specifies the activities to be executed on the data
service. Unstructured data are stored in a file system and their
file location address is stored in a database.
Grid Service Data Resource
Query
Document
Result
Document
Figure 5. Grid Data Service (GDS) mode of operation
Grid Data Service
Grid Data Service Factory
DBMS
FTP Server
File System
Exposes
Figure 6. GDS delivery of file system
In figure 5 and 6, simple data access scenario is shown
1) A client contacts a DAISGR first to locate the GDSFs.
2) Accesses suitable GDSFs directly to find out more about
their properties and the data resources they represent.
3) Asks GDSF to instantiate a GDS
4. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, No 5, May 2013
www.ijarcet.org
1774
4) Accesses resource by sending the GDS the
GDS-perform document.
F. Metadata and Metadata Storage Model
Data integration is done through Filters‟ capability
metadata. Metadata is stored in local file system as a flat file.
OGSA-DAI services provide metadata about the DBMS.
Also metadata is provided about the capabilities of that
DBMS that are being exposed to the Grid through the service
interfaces as well as any inherent capabilities of the service
themselves.
Metadata storage model:
Metadata is kept in Catalog Service (MCS)
MCS enables attribute-based querying
Metadata is for the datasets, data can be anything
(binary, text ...)
Data integration is done through XML based activity file
mixing activities (in SQL queries) and metadata
For relational Databases, the Database schema may be
extracted from the service, which may be helpful for higher
level services such as distributed query processing.
OGSA-DAI use any data, and they have predefined
Database schema to enable querying and accessing data.
Global persistent identifiers for naming files support for
metadata to describe the location and ownership of files.
Support for descriptive metadata to support discovery
through digital library query mechanism.
IV. IMPLEMENTATION OF THE SYSTEM
In proposed system, Globus Toolkit offers a core set of
services for file access and management. OGSA-DAI can run
alongside Globus Toolkit if data security or fast data transfer
(using GridFTP) is needed. All nodes in the system are
connected via Fast Ethernet switch (100Mbps). One node of
the system is the master node which has to manage the data
sources. The document repository included relational
database tables defined for unstructured data metadata
management. The following tools are needed to implement
the services of the system. They are (1) Java Programming
Language, (2) Apache Tomcat Web Container, and (3)
OGSA-DAI 4.1 GT 4.2.0. This only works with Globus
Toolkit 4.2.0.
V. SYSTEM OPERATIONS
The system not only has some static features but also has
dynamic snapshots such as the system interfaces in various
level, grid nodes and file systems.
Figure 7 shows the operation classes in system. All the
operations can be generally divided into 2 groups, the first is
information querying and the second is data transferring
related. The objectives of dividing operations into classes and
groups are for a better and clearer logical structure, and to get
maximum code reuse.
Metadata Query Grid Resource Query
Information Query
Data Transferring Operation
Data Transfer on
FileSystem
Data Transfer on
Grid Node
Figure 7. System Operation Schema
A. The Interaction of Executing an SQL Query on Remote
Server
To process access data stored in relational database, the
interaction of executing an SQL query on a remote server is
needed. A typical OGSA-DAI client-service interaction
involves a client running an SQL query through a remote
OGSA-DAI service that then returns the query response,
typically some data, in an XML document. This interaction
involves the following six steps: The client sends a request
containing the SQL query in SOAP message to an
OGSA-DAI service.
1) The server extracts the request from the SOAP message,
and the SQL query is executed on the relational database.
2) The query results are returned from the relational
database to the OGSA-DAI server as a set of Java
ResultSet objects.
3) The server converts the Java ResultSet objects into a
format suitable for transmission back to the client, such
as WebRowSet.
4) This data is sent back to the client in a SOAP message.
5) The client receives the SOAP message, unpacks the data,
and converts it back to a ResultSet object (assuming this
is a Java client).
The use of an alternative intermediate delivery format,
namely, Comma Separated Values (CSV) was investigated.
Although the embedding of metadata has weaker support than
the case with the WebRowSet format, this CSV format uses
space more efficiently and is easier to parse.
Expression:String
SQL Query
MySQL Resource
MySQL
Data: [Tuple]
Data:[Tuple] nullDataString:String
TupleToCSV
Result: [char[]]
“NULL”
“SELECT * FROM table1;”, “SELECT * FROM table2;”
Figure 8. Activity inputs, outputs and resources
In analyzing the changing of data format, we also noticed
that the difference of using between WebRowSet and Comma
Separated Values (CSV) as an intermediate delivery format.
Using WebRowSet as an intermediate delivery format added
5. ISSN: 2278 – 1323
International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
Volume 2, No 5, May 2013
1775
www.ijarcet.org
a significant amount of XML mark-up that increased the
amount of data that needed to be transferred between the
client and server and the parsing of the messages out of XML
could be slow thus possibly incurring an unnecessary
overhead. By calculating the space required to represent the
same result in each format, the reduction in data size in going
from a WebRowSet to a CSV format can be estimated. This
can be done by calculating the number of extra characters
needed to describe a row of data.
B. Transferring Binary Data
To provide access to files stored on a server‟s file system,
a little variation on the previous use case involves using
OGSA-DAI. Normally, these could be large binary data files,
stored in a file system, for these files stored with the
associated metadata separately in a relational database. By
using the OGSA-DAI delivery mechanisms, the client queries
the databases to locate any files of interest are retrieved. Files
are separately retrieved from the SOAP interactions for data
transport efficiency. However, it can be more convenient for a
client to receive the data back in a SOAP response message
rather than using an alternative delivery mechanism.
Figure 9. Overview of OGSA-DAI‟s process
We have covered both of these concerns by using the
messages of SOAP with attachments. This approach
importantly reduces the time required to process SOAP
messages and permits binary data transfer to take place
without necessitating Base64 encoding.
true true “someDirectory”
Recursive boolean Include Path boolean Directory:String
ListDirectory
FileResource
Data: [String]
Input: Object
DeliverToRequestStatus
[“someDirectory/file1.txt”, “someDirectory/file2.txt”, “someDirectory/file3.txt”]
File
System
Figure 10. A simple example of list files in a file system
VI. CONCLUSION
In this paper, we proposed a system for integrating data
resources in the heterogeneous environment. The system
distinguishes the managed data into two categories, namely
structured and unstructured data (flat files). The data grid
middleware used for virtualization is separated based on the
two categories of data. In our design, we use OGSA-DAI and
Globus as the data grid middleware. The combination of the
two data grids completely handles all kinds of data types.
Hence this system can improve the accessibility, integration
and management of the heterogeneous data sources.
ACKNOWLEDGMENT
We foremost thanks go to Professor Dr. Aung Win, the
Principal of the Technology of University in Yatanarpon
Cyber City, for welcoming our research and giving a hand for
us. Next, we would like to thank Professor Dr. Soe Soe
Khaing, the Head of the Department of Information and
Communication Technology in our university, for giving a
chance to fulfill my goal. Moreover, I also wish to thank to the
other members of our department for encouraging us and
offering guidelines about our research. Finally, our thanks go
to our families and friends for all the love and kindness they
gave us.
REFERENCES
[1] Wiederhold & Genesereth, „The mediator–wrapper architecture‟,
1997.
[2] Jackson et al, „The OGSA-DAI (Open Grid Services Architecture Data
Access and Integration) middleware‟, 2007.
[3] M. Nedim Alpdemir, Arijit Mukherjee…, „OGSA-DQP: A
Service-Based Distributed Query Processor for the Grid‟, 2005.
[4] Aloisio, G., Cafaro, M., Fiore, S., Mirto, M. and Vadacca, S.: „Grelc
Data Gather Service: A Step towards P2P Production Grids’, SAC,
(2007).
[5] GRelC: ‘Grid relational catalog project’, (2007)
http://grelc.unile.it/2007. [4] Jackson et al, „The OGSA-DAI (Open
Grid Services Architecture Data Access and Integration) middleware’,
2007.
[6] Dafang Zhuang , Wen Yuan, Jiyuan Liu, Dongsheng Qiu, Tao Ming,
„The Unstructured Data Sharing System for Natural Resources and
Environment Science Data of the Chinese Academy of Science‟ in
Data Science Journal, Volume 6, Supplement, 20 October 2007.
[7] Manuel Garcia Ruiz, Alvin Garcia Chaves, „mantisGRID: A Grid
Platform for DICOM Medical Images Management in Colombia and
Latin America‟, 3 February 2010.
[8] Aan Kurniawan, „Educational Resource Sharing in the Heterogeneous
Environments using Data Grid‟, Faculty of Computer Science,
University of Indonesia, 2009.
[9] Pan, H.Research on the Interoperability Architecture of the Digital
Library Grid, 2007, in IFIP International Federation for Information
Processing, Volume 251, Integration and Innovation Orient to
E-Society Volume 1, Wang, W.(Eds), (Boston: Springer), pp.
[10] Steven Lynden a Arijit Mukherjee b Alastair C. Hume c Alvaro A.A.
Fernandes a Norman W. “The Design and Implementation of
OGSA-DQP: A Service-Based Distributed Query Processor”, 2010.
[11] Mike Jackson, Amy Krause. “OGSA-DAI practical – developing
workflows and clients”, GridKa School 2008.
[12] Susan Malaika, Dirk Hain. “Accessing DB2 Universal Database
using the Globus Toolkit and OGSA-DAI”, 2003.