The document discusses metascheduling on the grid. A metascheduler coordinates communications between local schedulers and presents users a single resource pool by hiding scheduling details. It abstracts grid resources through cooperation with grid information services. The metascheduler should ask users only three questions about requirements, input, and identity and handle all grid-specific details.
Learning objectives
• Understand how to handle massive amount of data using data grid.
• Explains data replication and namespaces
• Identify the various data access model.
The document discusses the Open Grid Services Architecture (OGSA) standard. It describes OGSA's layered architecture including the physical/logical resources layer, web services layer using OGSI, OGSA services layer for core, program execution and data services, and applications layer. It also outlines the functional requirements of OGSA such as interoperability, resource sharing, optimization, quality of service, job execution, data services, security, cost reduction, scalability, and availability.
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"Guy K. Kloss
This document discusses the development of a grid data infrastructure called MataNui to manage large amounts of observational astronomical data and metadata from a collaboration between researchers in New Zealand and Japan. The infrastructure uses existing open-source tools like MongoDB, GridFTP, and the DataFinder GUI client to allow distributed storage and access of data while meeting requirements like handling large data volumes, metadata, and remote access. This approach provides a robust, reusable, and user-friendly system to address common data management challenges in scientific collaborations.
This document discusses grid architecture and service modeling. It begins with a brief history of grid computing and identifies four main grid families (computational, information, business, and peer-to-peer grids). It then describes a layered grid architecture modeled after the Internet architecture. Next, it examines Open Grid Services Architecture (OGSA) and some of its core interfaces. It also discusses security models in OGSA. Finally, it covers data-intensive grid service models, data replication strategies, and different grid data access models.
Performance and Cost Evaluation of an Adaptive Encryption Architecture for Cl...Editor IJLRES
The cloud database as a service is a novel paradigm that can support several Internet-based applications, but its adoption requires the solution of information confidentiality problems. We propose a novel architecture for adaptive encryption of public cloud databases that offers an interesting alternative to the tradeoff between the required data confidentiality level and the flexibility of the cloud database structures at design time. We demonstrate the feasibility and performance of the proposed solution through a software prototype. Moreover, we propose an original cost model that is oriented to the evaluation of cloud database services in plain and encrypted instances and that takes into account the variability of cloud prices and tenant workloads during a medium-term period.
NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...ijccsa
Cloud collaboration is an emerging technology which enables sharing of computer files using cloud
computing. Here the cloud resources are assembled and cloud services are provided using these resources.
Cloud collaboration technologies are allowing users to share documents. Resource allocation in the cloud
is challenging because resources offer different Quality of Service (QoS) and services running on these
resources are risky for user demands. We propose a solution for resource allocation based on multi
attribute QoS Scoring considering parameters such as distance to the resource from user site, reputation of
the resource, task completion time, task completion ratio, and load at the resource. The proposed algorithm
referred to as Multi Attribute QoS scoring (MAQS) uses Neuro Fuzzy system. We have also included a
speculative manager to handle fault tolerance. In this paper it is shown that the proposed algorithm
perform better than others including power trust reputation based algorithms and harmony method which
use single attribute to compute the reputation score of each resource allocated.
Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...neirew J
This paper proposes a neuro-fuzzy system called Multi Attribute QoS scoring (MAQS) for dynamic resource allocation in collaborative cloud computing. MAQS uses a 3-layer neural network trained on 5 quality of service attributes - distance, reputation, task completion time, completion ratio, and load - to provide a QoS score for each resource. Resources are then allocated based on this score. The algorithm collects data periodically from nodes and calculates QoS scores for incoming tasks to select the highest scoring node for task allocation. The paper argues this approach considers multiple attributes and heterogeneity of resources better than previous single-attribute methods.
Open source grid middleware packages – Globus Toolkit (GT4) Architecture , Configuration – Usage of Globus – Main components and Programming model - Introduction to Hadoop Framework - Mapreduce, Input splitting, map and reduce functions, specifying input and output parameters, configuring and running a job – Design of Hadoop file system, HDFS concepts, command line and java interface, dataflow of File read & File write.
Learning objectives
• Understand how to handle massive amount of data using data grid.
• Explains data replication and namespaces
• Identify the various data access model.
The document discusses the Open Grid Services Architecture (OGSA) standard. It describes OGSA's layered architecture including the physical/logical resources layer, web services layer using OGSI, OGSA services layer for core, program execution and data services, and applications layer. It also outlines the functional requirements of OGSA such as interoperability, resource sharing, optimization, quality of service, job execution, data services, security, cost reduction, scalability, and availability.
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"Guy K. Kloss
This document discusses the development of a grid data infrastructure called MataNui to manage large amounts of observational astronomical data and metadata from a collaboration between researchers in New Zealand and Japan. The infrastructure uses existing open-source tools like MongoDB, GridFTP, and the DataFinder GUI client to allow distributed storage and access of data while meeting requirements like handling large data volumes, metadata, and remote access. This approach provides a robust, reusable, and user-friendly system to address common data management challenges in scientific collaborations.
This document discusses grid architecture and service modeling. It begins with a brief history of grid computing and identifies four main grid families (computational, information, business, and peer-to-peer grids). It then describes a layered grid architecture modeled after the Internet architecture. Next, it examines Open Grid Services Architecture (OGSA) and some of its core interfaces. It also discusses security models in OGSA. Finally, it covers data-intensive grid service models, data replication strategies, and different grid data access models.
Performance and Cost Evaluation of an Adaptive Encryption Architecture for Cl...Editor IJLRES
The cloud database as a service is a novel paradigm that can support several Internet-based applications, but its adoption requires the solution of information confidentiality problems. We propose a novel architecture for adaptive encryption of public cloud databases that offers an interesting alternative to the tradeoff between the required data confidentiality level and the flexibility of the cloud database structures at design time. We demonstrate the feasibility and performance of the proposed solution through a software prototype. Moreover, we propose an original cost model that is oriented to the evaluation of cloud database services in plain and encrypted instances and that takes into account the variability of cloud prices and tenant workloads during a medium-term period.
NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...ijccsa
Cloud collaboration is an emerging technology which enables sharing of computer files using cloud
computing. Here the cloud resources are assembled and cloud services are provided using these resources.
Cloud collaboration technologies are allowing users to share documents. Resource allocation in the cloud
is challenging because resources offer different Quality of Service (QoS) and services running on these
resources are risky for user demands. We propose a solution for resource allocation based on multi
attribute QoS Scoring considering parameters such as distance to the resource from user site, reputation of
the resource, task completion time, task completion ratio, and load at the resource. The proposed algorithm
referred to as Multi Attribute QoS scoring (MAQS) uses Neuro Fuzzy system. We have also included a
speculative manager to handle fault tolerance. In this paper it is shown that the proposed algorithm
perform better than others including power trust reputation based algorithms and harmony method which
use single attribute to compute the reputation score of each resource allocated.
Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...neirew J
This paper proposes a neuro-fuzzy system called Multi Attribute QoS scoring (MAQS) for dynamic resource allocation in collaborative cloud computing. MAQS uses a 3-layer neural network trained on 5 quality of service attributes - distance, reputation, task completion time, completion ratio, and load - to provide a QoS score for each resource. Resources are then allocated based on this score. The algorithm collects data periodically from nodes and calculates QoS scores for incoming tasks to select the highest scoring node for task allocation. The paper argues this approach considers multiple attributes and heterogeneity of resources better than previous single-attribute methods.
Open source grid middleware packages – Globus Toolkit (GT4) Architecture , Configuration – Usage of Globus – Main components and Programming model - Introduction to Hadoop Framework - Mapreduce, Input splitting, map and reduce functions, specifying input and output parameters, configuring and running a job – Design of Hadoop file system, HDFS concepts, command line and java interface, dataflow of File read & File write.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
‘Grids’areanapproachforbuildingdynamicallyconstructedproblem-solvingenvironmentsusing
geographically and organizationally dispersed,
high-performance computing and
data handling resources.
Gridsalsoprovideimportantinfrastructuresupportingmulti-institutionalcollaboration.
MongoDB .local London 2019: Best Practices for Working with IoT and Time-seri...MongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
• The challenges involved with managing time-series data in IoT applications
• Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
• How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
• At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...MongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
This document provides an introduction to grid architecture. It discusses key components of a grid architecture including the layered model and standard protocols. The document outlines requirements for grid architecture including heterogeneous and distributed resources. It also compares web services and grid services, describing standards like OGSA, OGSI, and WSRF. Finally, it provides examples of creating stateful web services.
Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey IJECEIAES
In the modern era, workflows are adopted as a powerful and attractive paradigm for expressing/solving a variety of applications like scientific, data intensive computing, and big data applications such as MapReduce and Hadoop. These complex applications are described using high-level representations in workflow methods. With the emerging model of cloud computing technology, scheduling in the cloud becomes the important research topic. Consequently, workflow scheduling problem has been studied extensively over the past few years, from homogeneous clusters, grids to the most recent paradigm, cloud computing. The challenges that need to be addressed lies in task-resource mapping, QoS requirements, resource provisioning, performance fluctuation, failure handling, resource scheduling, and data storage. This work focuses on the complete study of the resource provisioning and scheduling algorithms in cloud environment focusing on Infrastructure as a service (IaaS). We provided a comprehensive understanding of existing scheduling techniques and provided an insight into research challenges that will be a possible future direction to the researchers.
Mining Of Big Data Using Map-Reduce TheoremIOSR Journals
This document discusses using MapReduce to efficiently extract large and complex data from big data sources. It proposes a MapReduce theorem for big data mining that is more efficient than the Heterogeneous Autonomous Complex and Evolving (HACE) theorem. MapReduce libraries support different programming languages and platforms, allowing for portable big data processing. The document outlines how MapReduce connects to Big Query to allow SQL queries to efficiently extract and analyze large datasets stored in the cloud. It also discusses data cleaning, sampling, and normalization as part of the big data mining process.
This document proposes CATCH, a cloud-based system to improve data transfer efficiency for high-performance computing (HPC) workloads. CATCH uses cloud storage to stage input data for HPC jobs and offload output data, in order to reduce storage usage at HPC centers and improve data transfer times. Evaluation of CATCH using a real cloud platform and HPC workload logs showed it could reduce average transfer times by up to 81.1% and decrease wait times and storage usage at HPC centers.
Data-Intensive Technologies for CloudComputinghuda2018
This document provides an overview of data-intensive computing technologies for cloud computing. It discusses key concepts like data-parallelism and MapReduce architectures. It also summarizes several data-intensive computing systems including Google MapReduce, Hadoop, and LexisNexis HPCC. Hadoop is an open source implementation of MapReduce while HPCC provides distinct processing environments for batch and online query processing using its proprietary ECL programming language.
Data Partitioning in Mongo DB with CloudIJAAS Team
Cloud computing offers various and useful services like IAAS, PAAS SAAS for deploying the applications at low cost. Making it available anytime anywhere with the expectation to be it scalable and consistent. One of the technique to improve the scalability is Data partitioning. The alive techniques which are used are not that capable to track the data access pattern. This paper implements the scalable workload-driven technique for polishing the scalability of web applications. The experiments are carried out over cloud using NoSQL data store MongoDB to scale out. This approach offers low response time, high throughput and less number of distributed transaction. The result of partitioning technique is conducted and evaluated using TPC-C benchmark.
This document describes the design and implementation of a service monitoring console within a service oriented architecture framework. It discusses using Cassandra for data collection and storage, implementing message queues for data collection, and using OpenTSDB as a time series database. It also provides an implementation schedule and references several technologies including MapReduce, Cassandra, Turmeric, and Google Web Toolkit.
This document presents a comparative study on parallel data processing for resource allocation in cloud computing. It discusses Nephele, an open source framework for parallel data processing in the cloud. The study analyzes Nephele's performance compared to Hadoop and how its ability to dynamically allocate virtual machine resources based on task requirements can improve efficiency. Experimental results show how Nephele can leverage heterogeneous cloud resources and automatic scaling to reduce processing costs compared to static allocation and Hadoop. The paper concludes Nephele is an efficient framework for parallel data processing in cloud computing.
IEEE 2015 - 2016 | Combining Efficiency, Fidelity, and Flexibility in Resource...1crore projects
1 CRORE PROJECTS
chennai | kumbakonam
offers (2015-2016) M.E, BE, M. Tech, B. Tech, PhD, MCA, BCA, MSC & MBA projects and also a real time application projects.
Final Year Projects for BE, B. Tech - ECE, EEE, CSE, IT, MCA, ME, M. Tech, M SC (IT), BCA, BSC and MBA.
Project support:-
1.Abstract, Diagrams, Review Details, Relevant Materials, Presentation,
2.Supporting Documents, Software E-Books,
3.Software Development Standards & Procedure
4.E-Book, Theory Classes, Lab Working programs, Project design & Implementation
online support :
For other districts and states
1.we will help in skype and teamviewer support for project
For further details feel free to call us:
1 CRORE PROJECTS ,Door No: 214/215,2nd Floor, No. 172, Raahat Plaza, (Shopping Mall), Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026.
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536 / +91 77081 50152
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...IOSR Journals
This document summarizes a research paper that proposes using Google File System (GFS) and MapReduce in a cloud computing environment to improve resource utilization and processing of large datasets. The paper discusses GFS architecture with a master node and chunk servers, and how MapReduce can split large files into chunks and process them in parallel across idle cloud nodes. It also proposes encrypting data for security and using a third party to audit client files. The goal is to provide fault tolerance, optimize workload processing time, and maximize utilization of cloud resources for data-intensive applications.
The integration of data from multiple distributed and heterogeneous sources has long been an important issue in information system research. In this study, we considered the query access and its optimization in such an integration scenario in the context of energy management by using SPARQL. Specifically, we provided a federated approach - a mediator server - that allows users to query access to multiple heterogeneous data sources, including four typical types of databases in energy data resources: relational database Triplestore, NoSQL database, and XML. A MUSYOP architecture based on this approach is then presented and our solution can realize the process data acquisition and integration without the need to rewrite or transform the local data into a unified data.
COMBINING EFFICIENCY, FIDELITY, AND FLEXIBILITY IN RESOURCE INFORMATION SERV...Nexgen Technology
This document discusses a resource information service that aims to provide high efficiency, fidelity, and flexibility for resource discovery in large-scale distributed systems like cloud computing and grids. It proposes using Locality-Sensitive Hashing (LSH) techniques to map resource descriptions to IDs in a way that preserves similarity, allowing efficient discovery of similar resources. The system is built on a Distributed Hash Table (DHT) for scalable storage and querying of resource information. Simulation and experimental results show the proposed LSH-based service outperforms other approaches in terms of efficiency, fidelity, and flexibility.
BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...Dirk Petersen
Dirk Petersen, Scientific Computing Manager, Fred Hutchinson Cancer Research Center (FHCRC)
Joe Arnold, President and Chief Product Officer, SwiftStack
Considering deploying a multi-petabyte storage-as-a-service offering in your research environment? Learn how an industry-leading software-defined object storage solution, architected by SwiftStack and Silicon Mechanics, helped shift hundreds of users to an object-based workflow for their archival data. With an emphasis on cost efficiencies, scalability, and manageability, see how this implementation at Fred Hutchinson Cancer Research Center (FHCRC) is continually evolving across new use cases and access methods.
This document discusses Fred Hutchinson Cancer Research Center's implementation of the OpenStack Swift object storage system managed by SwiftStack to build a large-scale active archive. Researchers were concerned about rising storage costs and the need for cheaper storage for large, old files. Finance was concerned about predictable costs as data growth increased costs by $1M per year. SwiftStack was implemented in 2014 and saw strong growth as researchers sought to avoid expensive enterprise storage costs after chargebacks were introduced. SwiftStack provides automated management of Swift clusters and supports scientific computing workloads through tools like Swift Commander and integration with HPC systems and Galaxy for scientific workflows.
BIOIT14: Deploying very low cost cloud storage technology in a traditional re...Dirk Petersen
When implementing storage charge backs we wanted to offer researchers an alternative storage solution that would not cost more than AWS Glacier. We also wanted it to be long term durable, self-protecting, easy to manage, store petabytes, survive the loss of an entire data center and deliver predictable performance. Learn how to avoid pitfalls and be able to determine if a solution like this makes sense for your organization.
Those who out-compute can many times out-compete. The cloud gives you access to a massive amount of compute power when you need it. This talk will present an introduction to HPC in the cloud, including, the benefits of HPC in the cloud, how to get started, some tools to use, and how you can manage data. We will showcase several examples of HPC in the cloud by a number of public sector and commercial customers.
Created by: Dr. Jeff Layton, Principal, Solutions Architect
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
‘Grids’areanapproachforbuildingdynamicallyconstructedproblem-solvingenvironmentsusing
geographically and organizationally dispersed,
high-performance computing and
data handling resources.
Gridsalsoprovideimportantinfrastructuresupportingmulti-institutionalcollaboration.
MongoDB .local London 2019: Best Practices for Working with IoT and Time-seri...MongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
• The challenges involved with managing time-series data in IoT applications
• Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
• How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
• At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...MongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
This document provides an introduction to grid architecture. It discusses key components of a grid architecture including the layered model and standard protocols. The document outlines requirements for grid architecture including heterogeneous and distributed resources. It also compares web services and grid services, describing standards like OGSA, OGSI, and WSRF. Finally, it provides examples of creating stateful web services.
Workflow Scheduling Techniques and Algorithms in IaaS Cloud: A Survey IJECEIAES
In the modern era, workflows are adopted as a powerful and attractive paradigm for expressing/solving a variety of applications like scientific, data intensive computing, and big data applications such as MapReduce and Hadoop. These complex applications are described using high-level representations in workflow methods. With the emerging model of cloud computing technology, scheduling in the cloud becomes the important research topic. Consequently, workflow scheduling problem has been studied extensively over the past few years, from homogeneous clusters, grids to the most recent paradigm, cloud computing. The challenges that need to be addressed lies in task-resource mapping, QoS requirements, resource provisioning, performance fluctuation, failure handling, resource scheduling, and data storage. This work focuses on the complete study of the resource provisioning and scheduling algorithms in cloud environment focusing on Infrastructure as a service (IaaS). We provided a comprehensive understanding of existing scheduling techniques and provided an insight into research challenges that will be a possible future direction to the researchers.
Mining Of Big Data Using Map-Reduce TheoremIOSR Journals
This document discusses using MapReduce to efficiently extract large and complex data from big data sources. It proposes a MapReduce theorem for big data mining that is more efficient than the Heterogeneous Autonomous Complex and Evolving (HACE) theorem. MapReduce libraries support different programming languages and platforms, allowing for portable big data processing. The document outlines how MapReduce connects to Big Query to allow SQL queries to efficiently extract and analyze large datasets stored in the cloud. It also discusses data cleaning, sampling, and normalization as part of the big data mining process.
This document proposes CATCH, a cloud-based system to improve data transfer efficiency for high-performance computing (HPC) workloads. CATCH uses cloud storage to stage input data for HPC jobs and offload output data, in order to reduce storage usage at HPC centers and improve data transfer times. Evaluation of CATCH using a real cloud platform and HPC workload logs showed it could reduce average transfer times by up to 81.1% and decrease wait times and storage usage at HPC centers.
Data-Intensive Technologies for CloudComputinghuda2018
This document provides an overview of data-intensive computing technologies for cloud computing. It discusses key concepts like data-parallelism and MapReduce architectures. It also summarizes several data-intensive computing systems including Google MapReduce, Hadoop, and LexisNexis HPCC. Hadoop is an open source implementation of MapReduce while HPCC provides distinct processing environments for batch and online query processing using its proprietary ECL programming language.
Data Partitioning in Mongo DB with CloudIJAAS Team
Cloud computing offers various and useful services like IAAS, PAAS SAAS for deploying the applications at low cost. Making it available anytime anywhere with the expectation to be it scalable and consistent. One of the technique to improve the scalability is Data partitioning. The alive techniques which are used are not that capable to track the data access pattern. This paper implements the scalable workload-driven technique for polishing the scalability of web applications. The experiments are carried out over cloud using NoSQL data store MongoDB to scale out. This approach offers low response time, high throughput and less number of distributed transaction. The result of partitioning technique is conducted and evaluated using TPC-C benchmark.
This document describes the design and implementation of a service monitoring console within a service oriented architecture framework. It discusses using Cassandra for data collection and storage, implementing message queues for data collection, and using OpenTSDB as a time series database. It also provides an implementation schedule and references several technologies including MapReduce, Cassandra, Turmeric, and Google Web Toolkit.
This document presents a comparative study on parallel data processing for resource allocation in cloud computing. It discusses Nephele, an open source framework for parallel data processing in the cloud. The study analyzes Nephele's performance compared to Hadoop and how its ability to dynamically allocate virtual machine resources based on task requirements can improve efficiency. Experimental results show how Nephele can leverage heterogeneous cloud resources and automatic scaling to reduce processing costs compared to static allocation and Hadoop. The paper concludes Nephele is an efficient framework for parallel data processing in cloud computing.
IEEE 2015 - 2016 | Combining Efficiency, Fidelity, and Flexibility in Resource...1crore projects
1 CRORE PROJECTS
chennai | kumbakonam
offers (2015-2016) M.E, BE, M. Tech, B. Tech, PhD, MCA, BCA, MSC & MBA projects and also a real time application projects.
Final Year Projects for BE, B. Tech - ECE, EEE, CSE, IT, MCA, ME, M. Tech, M SC (IT), BCA, BSC and MBA.
Project support:-
1.Abstract, Diagrams, Review Details, Relevant Materials, Presentation,
2.Supporting Documents, Software E-Books,
3.Software Development Standards & Procedure
4.E-Book, Theory Classes, Lab Working programs, Project design & Implementation
online support :
For other districts and states
1.we will help in skype and teamviewer support for project
For further details feel free to call us:
1 CRORE PROJECTS ,Door No: 214/215,2nd Floor, No. 172, Raahat Plaza, (Shopping Mall), Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026.
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536 / +91 77081 50152
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...IOSR Journals
This document summarizes a research paper that proposes using Google File System (GFS) and MapReduce in a cloud computing environment to improve resource utilization and processing of large datasets. The paper discusses GFS architecture with a master node and chunk servers, and how MapReduce can split large files into chunks and process them in parallel across idle cloud nodes. It also proposes encrypting data for security and using a third party to audit client files. The goal is to provide fault tolerance, optimize workload processing time, and maximize utilization of cloud resources for data-intensive applications.
The integration of data from multiple distributed and heterogeneous sources has long been an important issue in information system research. In this study, we considered the query access and its optimization in such an integration scenario in the context of energy management by using SPARQL. Specifically, we provided a federated approach - a mediator server - that allows users to query access to multiple heterogeneous data sources, including four typical types of databases in energy data resources: relational database Triplestore, NoSQL database, and XML. A MUSYOP architecture based on this approach is then presented and our solution can realize the process data acquisition and integration without the need to rewrite or transform the local data into a unified data.
COMBINING EFFICIENCY, FIDELITY, AND FLEXIBILITY IN RESOURCE INFORMATION SERV...Nexgen Technology
This document discusses a resource information service that aims to provide high efficiency, fidelity, and flexibility for resource discovery in large-scale distributed systems like cloud computing and grids. It proposes using Locality-Sensitive Hashing (LSH) techniques to map resource descriptions to IDs in a way that preserves similarity, allowing efficient discovery of similar resources. The system is built on a Distributed Hash Table (DHT) for scalable storage and querying of resource information. Simulation and experimental results show the proposed LSH-based service outperforms other approaches in terms of efficiency, fidelity, and flexibility.
BIO IT 15 - Are Your Researchers Paying Too Much for Their Cloud-Based Data B...Dirk Petersen
Dirk Petersen, Scientific Computing Manager, Fred Hutchinson Cancer Research Center (FHCRC)
Joe Arnold, President and Chief Product Officer, SwiftStack
Considering deploying a multi-petabyte storage-as-a-service offering in your research environment? Learn how an industry-leading software-defined object storage solution, architected by SwiftStack and Silicon Mechanics, helped shift hundreds of users to an object-based workflow for their archival data. With an emphasis on cost efficiencies, scalability, and manageability, see how this implementation at Fred Hutchinson Cancer Research Center (FHCRC) is continually evolving across new use cases and access methods.
This document discusses Fred Hutchinson Cancer Research Center's implementation of the OpenStack Swift object storage system managed by SwiftStack to build a large-scale active archive. Researchers were concerned about rising storage costs and the need for cheaper storage for large, old files. Finance was concerned about predictable costs as data growth increased costs by $1M per year. SwiftStack was implemented in 2014 and saw strong growth as researchers sought to avoid expensive enterprise storage costs after chargebacks were introduced. SwiftStack provides automated management of Swift clusters and supports scientific computing workloads through tools like Swift Commander and integration with HPC systems and Galaxy for scientific workflows.
BIOIT14: Deploying very low cost cloud storage technology in a traditional re...Dirk Petersen
When implementing storage charge backs we wanted to offer researchers an alternative storage solution that would not cost more than AWS Glacier. We also wanted it to be long term durable, self-protecting, easy to manage, store petabytes, survive the loss of an entire data center and deliver predictable performance. Learn how to avoid pitfalls and be able to determine if a solution like this makes sense for your organization.
Those who out-compute can many times out-compete. The cloud gives you access to a massive amount of compute power when you need it. This talk will present an introduction to HPC in the cloud, including, the benefits of HPC in the cloud, how to get started, some tools to use, and how you can manage data. We will showcase several examples of HPC in the cloud by a number of public sector and commercial customers.
Created by: Dr. Jeff Layton, Principal, Solutions Architect
Openstack Swift is a very powerful object storage that is used in several of the largest object storage deployments around the globe. It ensures a very high level of data durability and can withstand epic disasters if setup in the right way.
The Six Highest Performing B2B Blog Post FormatsBarry Feldman
If your B2B blogging goals include earning social media shares and backlinks to boost your search rankings, this infographic lists the size best approaches.
1) The document discusses the opportunity for technology to improve organizational efficiency and transition economies into a "smart and clean world."
2) It argues that aggregate efficiency has stalled at around 22% for 30 years due to limitations of the Second Industrial Revolution, but that digitizing transport, energy, and communication through technologies like blockchain can help manage resources and increase efficiency.
3) Technologies like precision agriculture, cloud computing, robotics, and autonomous vehicles may allow for "dematerialization" and do more with fewer physical resources through effects like reduced waste and need for transportation/logistics infrastructure.
Inroduction to grid computing by gargi shankar vermagargishankar1981
Grid computing allows for sharing and coordination of distributed computer resources to address large-scale computation problems. It enables dynamic, scalable, and inexpensive access to computing power by connecting computers and other resources together with open standards. Key aspects of grid computing include dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities through coordination of distributed and often heterogeneous resources not subject to centralized control.
The Open Grid Services Architecture (OGSA) defines a set of standards for building grid systems. It has four main layers:
1) The application layer which includes physical resources like servers and storage, and logical resources like database and workflow managers.
2) A web services layer which defines how resources and services can interact using Open Grid Services Infrastructure (OGSI) and grid services.
3) OGSI specifies five interfaces for grid services: Factory, Life Cycle, State Management, Service Groups, and Notification.
4) Together these layers define a standardized architecture for building grid systems using web services and interfaces to manage resources and their interactions.
This document provides an overview of grid computing. It defines a grid as a collection of distributed heterogeneous computing and data resources available through network tools and protocols. It discusses several examples of grid computing projects like SETI@home, Distributed.net, and virtual organizations. It also covers types of grids based on shared resources, topology, and behavior. The document outlines the layered structure of a grid and standards like OGSA, OGSI, and GSI that enable interoperability. It provides descriptions of key grid components like resource brokers, information services, security, data transfer, job submission, and problem solving environments.
An Exploration of Grid Computing to be Utilized in Teaching and Research at TUEswar Publications
This document describes building a simple grid computing environment from existing computing resources at Taiz University in Yemen. It outlines:
1) Installing and configuring software like Globus Toolkit, Tomcat, and OGCE portal on three machines to set up basic grid services like a certificate authority server, MyProxy server, and portal server.
2) Configuring the hardware nodes, installing the portal server, setting up the certificate authority server, and MyProxy server.
3) Testing basic grid services like credential delegation to MyProxy, retrieval from MyProxy, and GridFTP file transfers.
The results indicate the proposed grid model is promising for teaching and research at Taiz University and could serve as a
The document discusses Grid Computing, which uses distributed computing resources like computer clusters connected via high-speed networks to provide high computational power. It describes the Globus Toolkit, an open-source software toolkit that provides basic services for building Grids. Key components of the Globus Toolkit allow for resource management, security, data management, and communication. The document also discusses parallel programming using MPI (Message Passing Interface) and potential applications of Grid Computing such as distributed supercomputing, real-time systems, and data-intensive processing.
The document discusses grid computing in remote sensing data processing. It describes how grid computing can help process huge amounts of remote sensing data in real-time by distributing processing across networked computers. Key requirements for a grid environment for remote sensing include sharing computational and software resources, managing resources, and supporting different data formats. Case studies demonstrate how grid middleware can improve the efficiency of tasks like image deblurring and generating lookup tables for aerosol analysis.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
cuSTINGER: Supporting Dynamic Graph Aigorithms for GPUs : NOTESSubhajit Sahu
Highlighted notes on cuSTINGER: Supporting Dynamic Graph Aigorithms for GPUs (compressed).
While doing research work under Prof. Kishore Kothapalli.
These people wrote the first data structure for maintaining dynamic graphs with NVIDIA CUDA GPUs. Unlike STINGER which uses a block linked-list and GT-STINGER which uses Array of Structures (AoS), here they instead used Structure of Arrays (SoA) which is not only more suitable to GPU memory access, but also allows smaller allocations modes (memory is a premium in GPU).
They use a custom memory manager which speeds up memory management (instead of using system memory manager). The data structure used is the standard adjacency list (CSR) and pointers are updated when edge list has to grow. It could handle
upto 10 million updates per second (large batch sizes), and ran a static graph algorithm (triangle counting) 1-10% slower. This shows that cuSTINGER can also be used with static graph algorithms. Dynamic graph algorithms should run much faster.
cuSTINGER: Supporting Dynamic Graph Aigorithms for GPUs (NOTES)Subhajit Sahu
Highlighted notes on cuSTINGER: Supporting Dynamic Graph Aigorithms for GPUs.
While doing research work under Prof. Kishore Kothapalli.
These people wrote the first data structure for maintaining dynamic graphs with NVIDIA CUDA GPUs. Unlike STINGER which uses a block linked-list and GT-STINGER which uses Array of Structures (AoS), here they instead used Structure of Arrays (SoA) which is not only more suitable to GPU memory access, but also allows smaller allocations modes (memory is a premium in GPU).
They use a custom memory manager which speeds up memory management (instead of using system memory manager). The data structure used is the standard adjacency list (CSR) and pointers are updated when edge list has to grow. It could handle
upto 10 million updates per second (large batch sizes), and ran a static graph algorithm (triangle counting) 1-10% slower. This shows that cuSTINGER can also be used with static graph algorithms. Dynamic graph algorithms should run much faster.
A CLOUD BASED ARCHITECTURE FOR WORKING ON BIG DATA WITH WORKFLOW MANAGEMENTIJwest
In real environment there is a collection of many noisy and vague data, called Big Data. On the other hand,
to work on the data middleware have been developed and is now very widely used. The challenge of
working on Big Data is its processing and management. Here, integrated management system is required
to provide a solution for integrating data from multiple sensors and maximize the target success. This is in
situation that the system has constant time constrains for processing, and real-time decision-making
processes. A reliable data fusion model must meet this requirement and steadily let the user monitor data
stream. With widespread using of workflow interfaces, this requirement can be addressed. But, the work
with Big Data is also challenging. We provide a multi-agent cloud-based architecture for a higher vision to
solve this problem. This architecture provides the ability to Big Data Fusion using a workflow management
interface. The proposed system is capable of self-repair in the presence of risks and its risk is low.
The document discusses the Open Grid Services Architecture (OGSA) and Open Grid Services Infrastructure (OGSI) standards for grid computing. OGSA defines the overall structure and services for grid environments using a distributed computing model. OGSI specifies a set of service primitives and behaviors for grid services. These standards leverage existing web service standards like WSDL to provide interfaces for grid services.
This document discusses architectural and security management for grid computing. It begins by defining grid computing as an environment that enables sharing of distributed resources across organizations to achieve common goals. It then describes the key components of a grid, including computation resources, storage, communications, software/licenses, and special equipment. The document outlines a four-level grid architecture including a fabric level, core middleware level, user middleware level, and application level. It also discusses important aspects of grid computing such as resource balancing, reliability through distribution, parallel CPU capacity, and management of different projects. Finally, it emphasizes that security is a major concern for grid computing due to the open nature of sharing resources across organizational boundaries.
Grid computing is the sharing of computer resources from multiple administrative domains to achieve common goals. It allows for independent, inexpensive access to high-end computational capabilities. Grid computing federates resources like computers, data, software and other devices. It provides a single login for users to access distributed resources for tasks like drug discovery, climate modeling and other data-intensive applications. Current grids are used for distributed supercomputing, high-throughput computing, on-demand computing and other methods. Grids benefit scientists, engineers and other users who need to solve large problems or collaborate globally.
Grid computing is a model of distributed computing that uses geographically and administratively disparate resources to solve large problems. It involves sharing computing power, data, and other resources across organizational boundaries. Key aspects include applying resources from many computers to a single problem, combining resources from multiple administrative domains for tasks requiring large processing power or data, and using middleware to coordinate resources as a virtual system. The document then discusses definitions of grid computing from various organizations and the core functional requirements and characteristics needed for grid applications and users.
The document provides an overview of grid computing, including:
1) Grid computing involves sharing distributed computational resources over a network and providing single login access for users. Resources may be owned by different organizations.
2) Examples of current grids discussed include the NSF PACI/NCSA Alliance Grid, the NSF PACI/SDSC NPACI Grid, and the NASA Information Power Grid.
3) The document also discusses various grid middleware tools and projects for using grid resources, such as Globus, Condor, Legion, Harness, and the Internet Backplane Protocol.
This document discusses grid computing and provides an overview of the topic. It begins with an introduction to grid computing, explaining that it utilizes distributed resources over a network to solve large computational problems. It then covers aspects of grid computing such as data, computation, types of grids, how grid computing works, and the grid architecture with different layers. The document also discusses applications of grid computing, advantages, limitations, and provides a case study on using a grid-like approach for weather prediction.
Grid computing enables sharing of geographically distributed computing resources through a network. It allows for virtual organizations to collaborate on common goals without central control. The document discusses the types of grid computing including computational, data, and scavenging grids. It also outlines the key components of a grid including protocols, architecture, security, and resource management. Examples of existing grid projects are provided such as SETI@Home, EGEE, and BeINGrid.
Various embodiments allow Grid applications to access resources shared in communication network domains. Grid Proxy Architecture for Network Resources (GPAN) bridges Grid services serving user applications and network services controlling network devices through proxy functions. At times, GPAN employs distributed network service peers (NSP) in network domains to discover, negotiate and allocate network resources for Grid applications. An elected master NSP is the unique Grid node that runs GPAN and represents the whole network to share network resources to Grids without Grid involvement of network devices. GPAN provides the Grid Proxy service (GPS) to interface with Grid services and applications, and the Grid Delegation service (GDS) to interface with network services to utilize network resources. In some cases, resource-based XML messaging can be employed for the GPAN proxy communication.
Implementation of p pic algorithm in map reduce to handle big dataeSAT Publishing House
This document presents an implementation of the p-PIC clustering algorithm using the MapReduce framework to handle big data. P-PIC is a parallel version of the Power Iteration Clustering (PIC) algorithm that is able to cluster large datasets in a distributed environment. The document first provides background on PIC and challenges with scaling to big data. It then describes how p-PIC addresses these challenges using MPI for parallelization. The design of implementing p-PIC within MapReduce is presented, including the map and reduce functions. Experimental results on synthetic datasets up to 100,000 records show that p-PIC using MapReduce has increased performance and scalability compared to the original p-PIC implementation using MPI.
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...IRJET Journal
This document presents two variations of a job-driven scheduling scheme called JOSS for efficiently executing MapReduce jobs on remote outsourced data across multiple data centers. The goal of JOSS is to improve data locality for map and reduce tasks, avoid job starvation, and improve job performance. Extensive experiments show that the two JOSS variations, called JOSS-T and JOSS-J, outperform other scheduling algorithms in terms of data locality and network overhead without significant overhead. JOSS-T performs best for workloads of small jobs, while JOSS-J provides the shortest workload time for jobs of varying sizes distributed across data centers.
Similar to ANG-GridWay-Poster-Final-Colorful-Bright-Final0 (20)
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
ANG-GridWay-Poster-Final-Colorful-Bright-Final0
1. Metascheduling on the Grid
In the grid environment, a metascheduler is also known as global scheduler, which
coordinates communications between multiple heterogeneous schedulers (local
schedulers) that operate at the local or cluster level.
By cooperating with Grid Information Services, which abstract the information of
all the available resources on the grid into a sing le resource perspective, the grid
metascheduler presents the end user a single virtual resource pool, which hides all
the details of scheduling and monitoring jobs in the dynamic and heterogeneous
environment.
From the end user's viewpoint, a fully virtualized grid is like a single
supercomputer, and as a proxy or entry point to the grid, the metascheduler should
hide gridspecific technical details to the most extent. A good metascheduler should
only ask the end user (without any grid specific knowledge) three questions (see
Figure 3 for an example description)
WHAT DO YOU NEED? (Requirements)WHAT DO YOU NEED? (Requirements) This usually involves
requirement matchmaking between known resources on the grid and the resources
required by the end user, such as a particular software, disk space, and the forth.
WHAT DO YOU HAVE? (Input)WHAT DO YOU HAVE? (Input) This usually needs the end user to
specify the input of the job, such as input files on the local or remote machine, or
just the arguments of the executable, or both.
WHO ARE YOU? (VO Identity*)WHO ARE YOU? (VO Identity*) This usually needs the user to declare
his or her VO identity, which enables the scheduler to analyse resource information
from the user's view point.
* A grid user's VO identity is not used for authentication or authorization, instead, by providing
the VO identity, we can retrieve userspecific resource information like the accessible free cups,
disk space and software tools.
Improving the Functionality and Customization of Scheduling
Services for Grid Computing
Jingjing Sun, Supervised by
Dr. Paul Coddington, Dr. Andrew Wendelborn
School of Computer Science, University of Adelaide
Figure 1. Computing resources across ANG Grid, discovered by the customized metascheduler Figure 3. A typical customized job template
including all kinds of user requirements
Acknowledgements
This project would not have been possible without Dr. Paul Coddington and Dr. Andrew
Wendelborn's wise and patient guidance. Special thanks to the cooperation of the excellent
SAPAC staff: Daniel Cox, Gerson Galang and Shunde Zhang, they made the whole things
happen.
For more information about our project, please refer to the ANGGridWay wiki:
http://www.grid.apac.edu.au/repository/trac/APACGridway/wiki
Overview and Aims
By aggregating computing power, software tools, data storage systems and scientific
instruments that are distributed in heterogeneous systems across multiple locations,
Grid Computing promises a global virtual supercomputer where users at different
physical locations can cooperate for a specific problem in a high performance, secure,
reliable and costeffective way.
Via grid resource virtualization technologies, Grid hides the largescale, distributed
and dynamic nature of the grid computing environment, thus creating a single system
image, i.e., a single yet powerful virtual computer, and enabling the endusers (i.e.,
domain experts) to fully focus on problem solving rather than the underlying
technical details.
As the underlying infrastructure of the Australian National Grid (ANG) is becoming
increasingly complex and dynamic, it is no longer suitable and efficient for its users to
manually perform computational tasks on the largescale and heterogeneous ANG
Grid, therefore a metascheduling system is needed by the ANG Grid to provided its
endusers with an easytouse and automatic job execution environment. However,
none of current major metaschedulers (such as CondorG, GridBus Broker and
GridWay, etc) support the latest grid information standard, i.e., Globus MDS 4
(Monitoring and Discovery Services) with GLUE Schema Specification 1.2, which is
used by ANG to describe and publish resource information. Moreover, none of the
these metaschedulers offer the functionalities that enable the endusers to specify
software requirements and other desired resources that they are allowed to use
according to the local domain policies (which can be considered as a quota system),
and the ability which helps the grid administrator to appropriately allocate the
resource such as software and storage usage priority to different ANG user group.
These functionalities are highly desirable for building a completely virtualized grid
environment in which the endusers are able to fully focus on problem solving rather
than the gridspecific technical details.
This project aims to customize and deploy a metascheduling system (GridWay) for
the ANG Grid by cooperating with the ANG Grid resource information service, thus
virtualizing the high performance computing resources across the ANG Grid and
providing its users with an automatic and intelligent job execution environment.
Based on GridWay's basic scheduling framework, we added very important features
to achieve the real virtualization of the ANG Grid resource(see Section
Implementation for details). By using the GridWay modified and customized for
ANG(i.e., ANGGridWay), the end user just needs to specify the input data (what I
have), the required resources (What I need) and optionally the user's user group(i.e.,
Virtual Organization, a.k.a., VO*) identity (Who I am), the scheduler then
appropriately schedules the tasks through ANG grid and hides all the technical
details such as requirements matchmaking, VO View checking, and failure handling.
Our scheduling policies also allows the scheduler to choose a “preferred” queue for a
specific software. *VO: An administrative domain with a separate and distinct set of
administrative policies such as access control and resource usage quota allocation
Implementation
As mentioned in the first section, our implementation focused on adapting the original GridWay scheduling framework to
current ANG Grid infrastructure, and virtualizing all computing resources across the ANG Grid to a unified resource pool.
Based on GridWay’s original scheduling framework, we developed advanced functionalities to provide our endusers with an
easytouse and intelligent job execution environment. We summarized our work that has been done so far into the following
six categories:
Information ModelInformation Model Work in this part mainly focused on adapting GridWay to smoothly cooperate with the information
published by the ANG MDS information service. The ANGGridWay was customized to cooperate with an extended version of
MDS4(GLUE1.2) Schema (see Figure 4 for the information model) according to real experience of ANG users, the extended
MDS4(GLUE1.2) resource information model allows our scheduler to accurately locate the target resource according to user
requirements and grid domain policies. Figure 1 and Figure 2 show all the ANG computing resources discovered by the
customized scheduler and a complete view of a particular resource, respectively.
Software RequirementsSoftware Requirements One of the most important functionalities the ANGGridWay provides is enabling the end
user to specify required software (then generating corresponding module extension in the corresponding RSL file).
VOView RequirementsVOView Requirements The VOView entity of GLUE Schema describes the resource information from a specific grid
user's viewpoint, by making use of the VOView information, our scheduler presents the end user a “personalized” resource
perspective, enabling more precise metascheduling.
Requirement Matchmaking AlgorithmRequirement Matchmaking Algorithm Based on the above work, an advanced matchmaking algorithm was
developed to enable the scheduler to automatically search the resources that accurately satisfy user requirements. Our
matchmaking algorithm allows user to specify multiple software requirements in a single task, which is a highly desirable
feature which enables our user to design simple workflow requiring multiple softwares without the need of knowing where
these softwares are located on the grid.
Software Priority on the ResourceSoftware Priority on the Resource According to the real cases of resource allocation and software usage on the
ANG Grid, ANGGridWay introduces a special functionality which enables software to run on a “preferred” resource according
to a specified priority assigned to the given grid user.
Resource Failure HandlingResource Failure Handling Our new failure handling policy performs on queue level for a given user, which avoids
waste of resources and enables finegrained failure handling control on the resources.
The Grid and Grid Information Service
The most common description of Grid Computing is often compared with an electric
power grid, through which we consume the electrical power on demand, without
knowing where and how the energy is generated. Similarly, Grid Computing
technologies hide the details of the underlying computing resources and the
complexity of how these resources are organized and how computation jobs are
scheduled, thus creating a single and unified system image, as a result, the end
users are able to perform computational tasks on the grid as if they were using a
single yet powerful virtual computer.
The Globus MDS services provide a standardized approach for grid resource
discovery and monitoring, thus playing a key role in computing resource
virtualization. More detailly, Grid Laboratory Uniform Environment (GLUE)
schema is used by MDS service to describe the grid resources in a precise and
systematic manner, thus enabling them to be discovered for subsequent
management or use such as task scheduling. By defining an information model at
the conceptual level, GLUE schema specification abstracts the real world computing
resources into constructs which can be represented in computer systems.
Our metascheduler was customized to cooperate with MDS4(GLUE1.2) Schema
extended by the ANG Grid according to the real user experience. The extended
model of resource information is illustrated in Figure 4, the minimum discovery unit
is SubCluster (a homogeneous computing environment), which help the
metascheduler precisely locate the target resource according to user requirements. A
complete view of a SubCluster (from ANGGridWay's view point) is also shown in
Figure 4.
Conclusion and Future Work
We successfully adapted GridWay to current ANG Grid infrastructure, our customizations
have enabled GridWay to smoothly cooperate with the ANG MDS information service. The
customized scheduler has very important and useful features that are not provided by the
original one but needed by the ANG Grid, such as supporting the latest information
standards and the heterogeneous information model, multiple software requirements, VO
policies, software priority and finegrained failure handling.
The result of our work is a metascheduling system which fully virtualizes resources
across the ANG Grid. All the details of resource discovery, requirements matchmaking, job
submission and execution, usage and scheduling policy control and failure handling are
successfully hidden from the ANG users. The bottom line of what a user needs to do is just
to declare the required software(s) and input data through the grid portal. Through the
resource virtualization of our metascheduling system, the ANG Grid presents its end users
a unified super virtual computer behind which are various dynamic, heterogeneous and
geographically distributed grid sites.
As the underlying grid infrastructure is becoming increasingly complex and the MDS
GLUE is evolving towards a mature and flexible resource description model, there are a lot
of potential improvements based on current ANGGridWay, like adaptive scheduling by
using the VOView information, advanced resource reservation and scientific workflow
planning and scheduling.
Figure 2. The complete view of a homogeneous
cluster presented by ANGGridWay, including
environment information, queues and user VO
views, available software tools and storage
information.
Figure 4. The whole picture: The information model used by the ANG Grid and the
resource virtualization via ANGGridWay