SlideShare a Scribd company logo
MataNui – Building a Grid Data Infrastructure that “doesn’t suck!”
G. K. Kloss and M. J. Johnson
{G.Kloss | M.J.Johnson}@massey.ac.nz
Institute of Information and Mathematical Sciences, Massey University, Auckland
Introduction
MOA (Microlensing Observations in Astrophysics) [1]
is a Japan/New Zealand collaboration project. It
makes observations on dark matter, extra-solar plan-
ets and stellar atmospheres using the gravitational
microlensing technique, and is one of the few tech-
niques to detect low-mass extra-solar planets. The
technique is working by analysing large quantities of
imagery from optical telescopes.
The Problem
Astronomers world-wide are producing telescopic im-
ages. Teams across New Zealand and Japan are ac-
cessing these for their research, creating higher level
data products. All these data files need to be stored
and retrieved. Currently they are either stored on re-
motely accessible servers, or they are transferred via
removable offline media. Additionally, every stored
item is annotated with potentially extensive sets of
meta-data. Researchers commonly keep essential
parts of this meta-data separately on their system,
so that they can identify particular items to retrieve
for their work.
This process is tedious and manually cumbersome.
Especially, as not all data files are online available and
may require accessing various forms of offline media.
If available online, they have to be retrieved from
remote systems through potentially slow connections.
Direct and homogeneous access patterns for all data
files and their associated meta-data does not exist.
Data management is not a new topic, and there are
many solutions for it available already. Many of them
are hand knit, and many are commercial and poten-
tially very expensive. But even more important, they
usually do not work well together with current Grid
infrastructures, as they were not designed to be “Grid
ready.” They are often complicated and require al-
tering the research workflow to suit the needs of the
system. Lastly, the ones meeting most of the require-
ments commonly do not provide graphical end-user
tools to support data intensive research.
Requirements
The envisioned Grid enabled data management sys-
tem has to meet a few requirements. But most of all,
it should be implementable without having to “re-
invent the wheel.” It should be possible to source
large portions of its essential components from exist-
ing (free) tools, and “just” require some “plumbing”
to join them to meet these requirements:
• Handle large amounts of data
• Arbitrary amounts of meta-data
• Manage storage/access from remote locations
• Use local access/storage through replication
• Perform (server side) queries on the meta-data
• Be robust, easy to deploy and easy to use
• Performance on larger data collections
• Use Grid Computing standards/practices
Abstract
In science and engineering the problem of being troubled by data management is quite common. Particularly, if partners within a project are
geographically distributed, and require fast access to data. These partners would ideally like to access or store data on local servers only, still retaining
access for remote partners without manual intervention. This project is attempting to solve such data management problems in an international
research collaboration (in astrophysics) with the participation of several New Zealand universities. Data is to be accessed and managed along with its
meta-data in several distributed locations, and it has to integrate with the infrastructure provided by the BeSTGRID project. Researchers also need
to be able to use simple but powerful graphical user interface for data management. This poster outlines the requirements, implementation and tools
involved for such a Grid data infrastructure.
Keywords: Grid Computing; Data Fabric; distributed; meta-data; data replication; GUI client; DataFinder.
Front Ends
GridFTP is the most common way to integrate data
services into a Grid environment. It is commonly used
for scripts and automation. GridFTP is the most
common denominator for compatibility with the Grid,
and it features (among others) the capability of using
Grid certificate based authentication and third-party
transfers.
File System Mounts are a common way to inte-
grate externally stored file systems directly into the
host system of compute resources (e. g. compute clus-
ter, high performance computing servers). This en-
ables scripts and applications to use the data simply
and directly without an additional retrieval or upload
step.
Figure 1: Concept DataFinder:
Data modelling and storage.
The DataFinder GUI Client [2] is an application
researchers can use as an easy to use end-user tool
supporting their data intensive needs (Fig. 2 and 4).
It has been developed as open source software by the
German Aerospace Centre to support internal projects
and external partners. The application allows easy
and flexible access to remote data repositories with
associated meta-data. The DataFinder is designed
for scientific and engineering purposes, and it assists
in this through the following:
• Handles access/transfer to/from data server(s)
• Retrieval and modification of meta-data
• Extensive (server-side) queries on all meta-data
• Support for project specific policies:
– Data hierarchy definition
– Enforcement of workflows
– Meta-data specification
• Scripting used to automate reoccurring tasks
• Can integrate 3rd party (GUI) tools
The DataFinder can act as a universal Grid/storage
system client [3] client (Fig. 1), as it is easily extensi-
ble to connect to further storage sub-systems (beyond
the list of ones already available).
Figure 2: Integration of GUI
applications with DataFinder.
Implementation
See Fig. 3.
Storage back-end (GridFS on MongoDB) –
For a straight forward implementation, a suitable data
storage server was sought. We chose the “NoSQL”
database MongoDB [4]. It features the “GridFS”
storage mode, capable of storing file-like data (in
large numbers and sizes) along with its meta-data.
MongoDB cam work in federation with distributed
servers, and data being automatically replicated to
the other instances. Therefore, every site can oper-
ate their own local MongoDB server, keeping data
access latencies low and performance high.
Native file system mount (GridFS FUSE) –
A GridFS FUSE driver [5] is already available. So
a remote GridFS can be mounted into a local Linux
system.
Grid front-end (GridFTP) – To provisions access
through Grid means, the Griffin GridFTP server [6]
by the Australian Research Collaboration Service
(ARCS) is equipped with a GridFS storage back-end.
Through this, every Grid capable tool can be used
to store/retrieve files with any of the MongoDB in-
stances interfaced by a Griffin server. This access
method also allows Grid applications to access the
storage server using the commonly used certificates.
Figure 3: Overview Grid data infrastructure.
Figure 4: Turbine simulation
workflow with DataFinder
(with custom GUI dialogues).
GUI front-end (DataFinder) – The DataFinder
is to be interfaced with the GridFS storage back-
end. To avoid giving a remote end user client full
access to the MongoDB server, a server interface
layer is introduced. For this, a RESTful web ser-
vice authenticating against a Grid certificate is imple-
mented. The implementation is based on the Apache
web server through the WSGI interface layer [7]. On
the client side, the DataFinder is facilitated with a
storage back-end accessing this web service. The
DataFinder is currently the only client fully capable
of making use of the available meta-data (creating,
modifying and accessing meta-data, as well as per-
forming efficient server-side queries on it). Particu-
larly server-side queries reduce data access latencies
significantly and improve query performance.
WebDAV front-end (Catacomb) – A potential
future pathway to access GridFS content is the Cat-
acomb WebDAV server [8]. It can be modified to use
GridFS/MongoDB as a storage back-end instead of
the currently used MySQL relational database.
Results
By choosing suitable existing building blocks, it be-
comes comparably simple to implement a consistent
Grid data infrastructure with the desired features.
The implementation makes currently good progress,
and is expected to be simple to deploy and configure,
as well as integrate seamlessly into the BeSTGRID or
other projects’ infrastructures. Particularly the prob-
lems of operating on large amounts of annotated data
from astrophysics research seem to benefit from this
research significantly. Data can be stored and ac-
cessed from geographically remote partners equally
fast, and processing on the data can be performed
locally. Data processing can easily be conducted on
sets returned as the results of queries (e. g. of par-
ticular spacial regions, indicating specific phenomena
indicated in the meta-data, produced by certain tele-
scopes, in given time frames, etc.).
References
[1] I. A. Bond, F. Abe, R. Dodd, et al., “Real-time difference imaging analysis of MOA Galactic bulge observations during 2000,” Monthly
Notices of the Royal Astronomical Society, vol. 327, pp. 868–880, 2001.
[2] T. Schlauch and A. Schreiber, “DataFinder – A Scientific Data Management Solution,” in Proceedings of Symposium for Ensuring
Long-Term Preservation and Adding Value to Scientific and Technical Data 2007 (PV 2007), Munich, Germany, October 2007.
[3] T. Schlauch, A. Eifer, T. Soddemann, and A. Schreiber, “A Data Management System for UNICORE 6,” in Proceedings of EuroPar
Workshops – UNICORE Summit, ser. Lecture Notes in Computer Science (LNCS). Delft, Netherlands: Springer, August 2009.
[4] “MongoDB Project,” http://www.mongodb.org/.
[5] M. Stephens, “GridFS FUSE Project,” http://github.com/ mikejs/ gridfs-fuse.
[6] S. Zhang, P. Coddington, and A. Wendelborn, “Connecting arbitrary data resources to the Grid,” in Proceedings of the 11th International
Conference on Grid Computing (Grid 2010). Brussels, Belgium: ACM/IEEE, October 2010.
[7] N. Pi¨el, “Benchmark of Python WSGI Servers,” http://nichol.as/ benchmark-of-python-web-servers, March 2010.
[8] M. Litz, “Catacomb WebDAV Server,” in UpTimes – German Unix User Group (GUUG) Members’ Magazine, April 2006, pp. 16–19.

More Related Content

What's hot

Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)
Robert Grossman
 
Ijcatr04071003
Ijcatr04071003Ijcatr04071003
Ijcatr04071003
Editor IJCATR
 
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
IOSR Journals
 
Cloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for MapreduceCloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for Mapreduce
AIRCC Publishing Corporation
 
Data Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with CloudData Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with Cloud
IJAAS Team
 
A cyber physical stream algorithm for intelligent software defined storage
A cyber physical stream algorithm for intelligent software defined storageA cyber physical stream algorithm for intelligent software defined storage
A cyber physical stream algorithm for intelligent software defined storage
Made Artha
 
Towards a low cost etl system
Towards a low cost etl systemTowards a low cost etl system
Towards a low cost etl system
ijdms
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)
Robert Grossman
 
MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKS
MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKSMULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKS
MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKS
ijcses
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)
Robert Grossman
 
11. grid scheduling and resource managament
11. grid scheduling and resource managament11. grid scheduling and resource managament
11. grid scheduling and resource managament
Dr Sandeep Kumar Poonia
 
DISTRIBUTED AND BIG DATA STORAGE MANAGEMENT IN GRID COMPUTING
DISTRIBUTED AND BIG DATA STORAGE MANAGEMENT IN GRID COMPUTINGDISTRIBUTED AND BIG DATA STORAGE MANAGEMENT IN GRID COMPUTING
DISTRIBUTED AND BIG DATA STORAGE MANAGEMENT IN GRID COMPUTING
ijgca
 
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
neirew J
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
Geoffrey Fox
 
ANG-GridWay-Poster-Final-Colorful-Bright-Final0
ANG-GridWay-Poster-Final-Colorful-Bright-Final0ANG-GridWay-Poster-Final-Colorful-Bright-Final0
ANG-GridWay-Poster-Final-Colorful-Bright-Final0
Jingjing Sun
 

What's hot (15)

Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)
 
Ijcatr04071003
Ijcatr04071003Ijcatr04071003
Ijcatr04071003
 
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
 
Cloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for MapreduceCloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for Mapreduce
 
Data Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with CloudData Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with Cloud
 
A cyber physical stream algorithm for intelligent software defined storage
A cyber physical stream algorithm for intelligent software defined storageA cyber physical stream algorithm for intelligent software defined storage
A cyber physical stream algorithm for intelligent software defined storage
 
Towards a low cost etl system
Towards a low cost etl systemTowards a low cost etl system
Towards a low cost etl system
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)
 
MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKS
MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKSMULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKS
MULTIDIMENSIONAL ANALYSIS FOR QOS IN WIRELESS SENSOR NETWORKS
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)
 
11. grid scheduling and resource managament
11. grid scheduling and resource managament11. grid scheduling and resource managament
11. grid scheduling and resource managament
 
DISTRIBUTED AND BIG DATA STORAGE MANAGEMENT IN GRID COMPUTING
DISTRIBUTED AND BIG DATA STORAGE MANAGEMENT IN GRID COMPUTINGDISTRIBUTED AND BIG DATA STORAGE MANAGEMENT IN GRID COMPUTING
DISTRIBUTED AND BIG DATA STORAGE MANAGEMENT IN GRID COMPUTING
 
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
 
ANG-GridWay-Poster-Final-Colorful-Bright-Final0
ANG-GridWay-Poster-Final-Colorful-Bright-Final0ANG-GridWay-Poster-Final-Colorful-Bright-Final0
ANG-GridWay-Poster-Final-Colorful-Bright-Final0
 

Viewers also liked

Chemistry is not zero exposure
Chemistry is not zero exposureChemistry is not zero exposure
Chemistry is not zero exposure
DIv CHAS
 
Research operations at ornl
Research operations at ornlResearch operations at ornl
Research operations at ornl
DIv CHAS
 
Klingner
KlingnerKlingner
Klingner
DIv CHAS
 
DIVCHAS, our early history
DIVCHAS, our early historyDIVCHAS, our early history
DIVCHAS, our early history
DIv CHAS
 
Building a (Really) Secure Cloud Product
Building a (Really) Secure Cloud ProductBuilding a (Really) Secure Cloud Product
Building a (Really) Secure Cloud Product
Guy K. Kloss
 
WTF is Blockchain???
WTF is Blockchain???WTF is Blockchain???
WTF is Blockchain???
Guy K. Kloss
 

Viewers also liked (6)

Chemistry is not zero exposure
Chemistry is not zero exposureChemistry is not zero exposure
Chemistry is not zero exposure
 
Research operations at ornl
Research operations at ornlResearch operations at ornl
Research operations at ornl
 
Klingner
KlingnerKlingner
Klingner
 
DIVCHAS, our early history
DIVCHAS, our early historyDIVCHAS, our early history
DIVCHAS, our early history
 
Building a (Really) Secure Cloud Product
Building a (Really) Secure Cloud ProductBuilding a (Really) Secure Cloud Product
Building a (Really) Secure Cloud Product
 
WTF is Blockchain???
WTF is Blockchain???WTF is Blockchain???
WTF is Blockchain???
 

Similar to MataNui - Building a Grid Data Infrastructure that "doesn't suck!"

Grid Computing
Grid ComputingGrid Computing
Grid Computing
sharmili priyadarsini
 
A Reconfigurable Component-Based Problem Solving Environment
A Reconfigurable Component-Based Problem Solving EnvironmentA Reconfigurable Component-Based Problem Solving Environment
A Reconfigurable Component-Based Problem Solving Environment
Sheila Sinclair
 
Dq36708711
Dq36708711Dq36708711
Dq36708711
IJERA Editor
 
1771 1775
1771 17751771 1775
1771 1775
Editor IJARCET
 
1771 1775
1771 17751771 1775
1771 1775
Editor IJARCET
 
H017144148
H017144148H017144148
H017144148
IOSR Journals
 
IRJET- Cost Effective Workflow Scheduling in Bigdata
IRJET-  	  Cost Effective Workflow Scheduling in BigdataIRJET-  	  Cost Effective Workflow Scheduling in Bigdata
IRJET- Cost Effective Workflow Scheduling in Bigdata
IRJET Journal
 
A Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid SystemsA Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid Systems
Editor IJCATR
 
A Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid SystemsA Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid Systems
Editor IJCATR
 
Unit i introduction to grid computing
Unit i   introduction to grid computingUnit i   introduction to grid computing
Unit i introduction to grid computing
sudha kar
 
E018142329
E018142329E018142329
E018142329
IOSR Journals
 
A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...
A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...
A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...
Tal Lavian Ph.D.
 
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET Journal
 
Journals analysis ppt
Journals analysis pptJournals analysis ppt
Journals analysis ppt
Muhammad Heikal
 
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseA Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
Ridwan Fadjar
 
Anonymization of data using mapreduce on cloud
Anonymization of data using mapreduce on cloudAnonymization of data using mapreduce on cloud
Anonymization of data using mapreduce on cloud
eSAT Journals
 
GridComputing-an introduction.ppt
GridComputing-an introduction.pptGridComputing-an introduction.ppt
GridComputing-an introduction.ppt
NileshkuGiri
 
Analysis of SOFTWARE DEFINED STORAGE (SDS)
Analysis of SOFTWARE DEFINED STORAGE (SDS)Analysis of SOFTWARE DEFINED STORAGE (SDS)
Analysis of SOFTWARE DEFINED STORAGE (SDS)
Kaushik Rajan
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
acijjournal
 
Study on potential capabilities of a nodb system
Study on potential capabilities of a nodb systemStudy on potential capabilities of a nodb system
Study on potential capabilities of a nodb system
ijitjournal
 

Similar to MataNui - Building a Grid Data Infrastructure that "doesn't suck!" (20)

Grid Computing
Grid ComputingGrid Computing
Grid Computing
 
A Reconfigurable Component-Based Problem Solving Environment
A Reconfigurable Component-Based Problem Solving EnvironmentA Reconfigurable Component-Based Problem Solving Environment
A Reconfigurable Component-Based Problem Solving Environment
 
Dq36708711
Dq36708711Dq36708711
Dq36708711
 
1771 1775
1771 17751771 1775
1771 1775
 
1771 1775
1771 17751771 1775
1771 1775
 
H017144148
H017144148H017144148
H017144148
 
IRJET- Cost Effective Workflow Scheduling in Bigdata
IRJET-  	  Cost Effective Workflow Scheduling in BigdataIRJET-  	  Cost Effective Workflow Scheduling in Bigdata
IRJET- Cost Effective Workflow Scheduling in Bigdata
 
A Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid SystemsA Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid Systems
 
A Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid SystemsA Survey of File Replication Techniques In Grid Systems
A Survey of File Replication Techniques In Grid Systems
 
Unit i introduction to grid computing
Unit i   introduction to grid computingUnit i   introduction to grid computing
Unit i introduction to grid computing
 
E018142329
E018142329E018142329
E018142329
 
A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...
A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...
A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...
 
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...IRJET-  	  Improving Data Availability by using VPC Strategy in Cloud Environ...
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
 
Journals analysis ppt
Journals analysis pptJournals analysis ppt
Journals analysis ppt
 
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseA Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
 
Anonymization of data using mapreduce on cloud
Anonymization of data using mapreduce on cloudAnonymization of data using mapreduce on cloud
Anonymization of data using mapreduce on cloud
 
GridComputing-an introduction.ppt
GridComputing-an introduction.pptGridComputing-an introduction.ppt
GridComputing-an introduction.ppt
 
Analysis of SOFTWARE DEFINED STORAGE (SDS)
Analysis of SOFTWARE DEFINED STORAGE (SDS)Analysis of SOFTWARE DEFINED STORAGE (SDS)
Analysis of SOFTWARE DEFINED STORAGE (SDS)
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
 
Study on potential capabilities of a nodb system
Study on potential capabilities of a nodb systemStudy on potential capabilities of a nodb system
Study on potential capabilities of a nodb system
 

More from Guy K. Kloss

Kauri ID - A Self-Sovereign, Blockchain-based Identity System
Kauri ID - A Self-Sovereign, Blockchain-based Identity SystemKauri ID - A Self-Sovereign, Blockchain-based Identity System
Kauri ID - A Self-Sovereign, Blockchain-based Identity System
Guy K. Kloss
 
Qrious about Insights -- Big Data in the Real World
Qrious about Insights -- Big Data in the Real WorldQrious about Insights -- Big Data in the Real World
Qrious about Insights -- Big Data in the Real World
Guy K. Kloss
 
Representational State Transfer (REST) and HATEOAS
Representational State Transfer (REST) and HATEOASRepresentational State Transfer (REST) and HATEOAS
Representational State Transfer (REST) and HATEOAS
Guy K. Kloss
 
Introduction to LaTeX (For Word users)
 Introduction to LaTeX (For Word users) Introduction to LaTeX (For Word users)
Introduction to LaTeX (For Word users)
Guy K. Kloss
 
Operations Research and Optimization in Python using PuLP
Operations Research and Optimization in Python using PuLPOperations Research and Optimization in Python using PuLP
Operations Research and Optimization in Python using PuLP
Guy K. Kloss
 
Python Data Plotting and Visualisation Extravaganza
Python Data Plotting and Visualisation ExtravaganzaPython Data Plotting and Visualisation Extravaganza
Python Data Plotting and Visualisation Extravaganza
Guy K. Kloss
 
Lecture "Open Source and Open Content"
Lecture "Open Source and Open Content"Lecture "Open Source and Open Content"
Lecture "Open Source and Open Content"
Guy K. Kloss
 
Version Control with Subversion
Version Control with SubversionVersion Control with Subversion
Version Control with Subversion
Guy K. Kloss
 
Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Beating the (sh** out of the) GIL - Multithreading vs. MultiprocessingBeating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Guy K. Kloss
 
Thinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ IntegrationThinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ Integration
Guy K. Kloss
 
Thinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ IntegrationThinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ Integration
Guy K. Kloss
 
Gaining Colour Stability in Live Image Capturing
Gaining Colour Stability in Live Image CapturingGaining Colour Stability in Live Image Capturing
Gaining Colour Stability in Live Image Capturing
Guy K. Kloss
 
LaTeX Introduction for Word Users
LaTeX Introduction for Word UsersLaTeX Introduction for Word Users
LaTeX Introduction for Word Users
Guy K. Kloss
 
Thinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ IntegrationThinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ Integration
Guy K. Kloss
 

More from Guy K. Kloss (14)

Kauri ID - A Self-Sovereign, Blockchain-based Identity System
Kauri ID - A Self-Sovereign, Blockchain-based Identity SystemKauri ID - A Self-Sovereign, Blockchain-based Identity System
Kauri ID - A Self-Sovereign, Blockchain-based Identity System
 
Qrious about Insights -- Big Data in the Real World
Qrious about Insights -- Big Data in the Real WorldQrious about Insights -- Big Data in the Real World
Qrious about Insights -- Big Data in the Real World
 
Representational State Transfer (REST) and HATEOAS
Representational State Transfer (REST) and HATEOASRepresentational State Transfer (REST) and HATEOAS
Representational State Transfer (REST) and HATEOAS
 
Introduction to LaTeX (For Word users)
 Introduction to LaTeX (For Word users) Introduction to LaTeX (For Word users)
Introduction to LaTeX (For Word users)
 
Operations Research and Optimization in Python using PuLP
Operations Research and Optimization in Python using PuLPOperations Research and Optimization in Python using PuLP
Operations Research and Optimization in Python using PuLP
 
Python Data Plotting and Visualisation Extravaganza
Python Data Plotting and Visualisation ExtravaganzaPython Data Plotting and Visualisation Extravaganza
Python Data Plotting and Visualisation Extravaganza
 
Lecture "Open Source and Open Content"
Lecture "Open Source and Open Content"Lecture "Open Source and Open Content"
Lecture "Open Source and Open Content"
 
Version Control with Subversion
Version Control with SubversionVersion Control with Subversion
Version Control with Subversion
 
Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Beating the (sh** out of the) GIL - Multithreading vs. MultiprocessingBeating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
Beating the (sh** out of the) GIL - Multithreading vs. Multiprocessing
 
Thinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ IntegrationThinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ Integration
 
Thinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ IntegrationThinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ Integration
 
Gaining Colour Stability in Live Image Capturing
Gaining Colour Stability in Live Image CapturingGaining Colour Stability in Live Image Capturing
Gaining Colour Stability in Live Image Capturing
 
LaTeX Introduction for Word Users
LaTeX Introduction for Word UsersLaTeX Introduction for Word Users
LaTeX Introduction for Word Users
 
Thinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ IntegrationThinking Hybrid - Python/C++ Integration
Thinking Hybrid - Python/C++ Integration
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 

MataNui - Building a Grid Data Infrastructure that "doesn't suck!"

  • 1. MataNui – Building a Grid Data Infrastructure that “doesn’t suck!” G. K. Kloss and M. J. Johnson {G.Kloss | M.J.Johnson}@massey.ac.nz Institute of Information and Mathematical Sciences, Massey University, Auckland Introduction MOA (Microlensing Observations in Astrophysics) [1] is a Japan/New Zealand collaboration project. It makes observations on dark matter, extra-solar plan- ets and stellar atmospheres using the gravitational microlensing technique, and is one of the few tech- niques to detect low-mass extra-solar planets. The technique is working by analysing large quantities of imagery from optical telescopes. The Problem Astronomers world-wide are producing telescopic im- ages. Teams across New Zealand and Japan are ac- cessing these for their research, creating higher level data products. All these data files need to be stored and retrieved. Currently they are either stored on re- motely accessible servers, or they are transferred via removable offline media. Additionally, every stored item is annotated with potentially extensive sets of meta-data. Researchers commonly keep essential parts of this meta-data separately on their system, so that they can identify particular items to retrieve for their work. This process is tedious and manually cumbersome. Especially, as not all data files are online available and may require accessing various forms of offline media. If available online, they have to be retrieved from remote systems through potentially slow connections. Direct and homogeneous access patterns for all data files and their associated meta-data does not exist. Data management is not a new topic, and there are many solutions for it available already. Many of them are hand knit, and many are commercial and poten- tially very expensive. But even more important, they usually do not work well together with current Grid infrastructures, as they were not designed to be “Grid ready.” They are often complicated and require al- tering the research workflow to suit the needs of the system. Lastly, the ones meeting most of the require- ments commonly do not provide graphical end-user tools to support data intensive research. Requirements The envisioned Grid enabled data management sys- tem has to meet a few requirements. But most of all, it should be implementable without having to “re- invent the wheel.” It should be possible to source large portions of its essential components from exist- ing (free) tools, and “just” require some “plumbing” to join them to meet these requirements: • Handle large amounts of data • Arbitrary amounts of meta-data • Manage storage/access from remote locations • Use local access/storage through replication • Perform (server side) queries on the meta-data • Be robust, easy to deploy and easy to use • Performance on larger data collections • Use Grid Computing standards/practices Abstract In science and engineering the problem of being troubled by data management is quite common. Particularly, if partners within a project are geographically distributed, and require fast access to data. These partners would ideally like to access or store data on local servers only, still retaining access for remote partners without manual intervention. This project is attempting to solve such data management problems in an international research collaboration (in astrophysics) with the participation of several New Zealand universities. Data is to be accessed and managed along with its meta-data in several distributed locations, and it has to integrate with the infrastructure provided by the BeSTGRID project. Researchers also need to be able to use simple but powerful graphical user interface for data management. This poster outlines the requirements, implementation and tools involved for such a Grid data infrastructure. Keywords: Grid Computing; Data Fabric; distributed; meta-data; data replication; GUI client; DataFinder. Front Ends GridFTP is the most common way to integrate data services into a Grid environment. It is commonly used for scripts and automation. GridFTP is the most common denominator for compatibility with the Grid, and it features (among others) the capability of using Grid certificate based authentication and third-party transfers. File System Mounts are a common way to inte- grate externally stored file systems directly into the host system of compute resources (e. g. compute clus- ter, high performance computing servers). This en- ables scripts and applications to use the data simply and directly without an additional retrieval or upload step. Figure 1: Concept DataFinder: Data modelling and storage. The DataFinder GUI Client [2] is an application researchers can use as an easy to use end-user tool supporting their data intensive needs (Fig. 2 and 4). It has been developed as open source software by the German Aerospace Centre to support internal projects and external partners. The application allows easy and flexible access to remote data repositories with associated meta-data. The DataFinder is designed for scientific and engineering purposes, and it assists in this through the following: • Handles access/transfer to/from data server(s) • Retrieval and modification of meta-data • Extensive (server-side) queries on all meta-data • Support for project specific policies: – Data hierarchy definition – Enforcement of workflows – Meta-data specification • Scripting used to automate reoccurring tasks • Can integrate 3rd party (GUI) tools The DataFinder can act as a universal Grid/storage system client [3] client (Fig. 1), as it is easily extensi- ble to connect to further storage sub-systems (beyond the list of ones already available). Figure 2: Integration of GUI applications with DataFinder. Implementation See Fig. 3. Storage back-end (GridFS on MongoDB) – For a straight forward implementation, a suitable data storage server was sought. We chose the “NoSQL” database MongoDB [4]. It features the “GridFS” storage mode, capable of storing file-like data (in large numbers and sizes) along with its meta-data. MongoDB cam work in federation with distributed servers, and data being automatically replicated to the other instances. Therefore, every site can oper- ate their own local MongoDB server, keeping data access latencies low and performance high. Native file system mount (GridFS FUSE) – A GridFS FUSE driver [5] is already available. So a remote GridFS can be mounted into a local Linux system. Grid front-end (GridFTP) – To provisions access through Grid means, the Griffin GridFTP server [6] by the Australian Research Collaboration Service (ARCS) is equipped with a GridFS storage back-end. Through this, every Grid capable tool can be used to store/retrieve files with any of the MongoDB in- stances interfaced by a Griffin server. This access method also allows Grid applications to access the storage server using the commonly used certificates. Figure 3: Overview Grid data infrastructure. Figure 4: Turbine simulation workflow with DataFinder (with custom GUI dialogues). GUI front-end (DataFinder) – The DataFinder is to be interfaced with the GridFS storage back- end. To avoid giving a remote end user client full access to the MongoDB server, a server interface layer is introduced. For this, a RESTful web ser- vice authenticating against a Grid certificate is imple- mented. The implementation is based on the Apache web server through the WSGI interface layer [7]. On the client side, the DataFinder is facilitated with a storage back-end accessing this web service. The DataFinder is currently the only client fully capable of making use of the available meta-data (creating, modifying and accessing meta-data, as well as per- forming efficient server-side queries on it). Particu- larly server-side queries reduce data access latencies significantly and improve query performance. WebDAV front-end (Catacomb) – A potential future pathway to access GridFS content is the Cat- acomb WebDAV server [8]. It can be modified to use GridFS/MongoDB as a storage back-end instead of the currently used MySQL relational database. Results By choosing suitable existing building blocks, it be- comes comparably simple to implement a consistent Grid data infrastructure with the desired features. The implementation makes currently good progress, and is expected to be simple to deploy and configure, as well as integrate seamlessly into the BeSTGRID or other projects’ infrastructures. Particularly the prob- lems of operating on large amounts of annotated data from astrophysics research seem to benefit from this research significantly. Data can be stored and ac- cessed from geographically remote partners equally fast, and processing on the data can be performed locally. Data processing can easily be conducted on sets returned as the results of queries (e. g. of par- ticular spacial regions, indicating specific phenomena indicated in the meta-data, produced by certain tele- scopes, in given time frames, etc.). References [1] I. A. Bond, F. Abe, R. Dodd, et al., “Real-time difference imaging analysis of MOA Galactic bulge observations during 2000,” Monthly Notices of the Royal Astronomical Society, vol. 327, pp. 868–880, 2001. [2] T. Schlauch and A. Schreiber, “DataFinder – A Scientific Data Management Solution,” in Proceedings of Symposium for Ensuring Long-Term Preservation and Adding Value to Scientific and Technical Data 2007 (PV 2007), Munich, Germany, October 2007. [3] T. Schlauch, A. Eifer, T. Soddemann, and A. Schreiber, “A Data Management System for UNICORE 6,” in Proceedings of EuroPar Workshops – UNICORE Summit, ser. Lecture Notes in Computer Science (LNCS). Delft, Netherlands: Springer, August 2009. [4] “MongoDB Project,” http://www.mongodb.org/. [5] M. Stephens, “GridFS FUSE Project,” http://github.com/ mikejs/ gridfs-fuse. [6] S. Zhang, P. Coddington, and A. Wendelborn, “Connecting arbitrary data resources to the Grid,” in Proceedings of the 11th International Conference on Grid Computing (Grid 2010). Brussels, Belgium: ACM/IEEE, October 2010. [7] N. Pi¨el, “Benchmark of Python WSGI Servers,” http://nichol.as/ benchmark-of-python-web-servers, March 2010. [8] M. Litz, “Catacomb WebDAV Server,” in UpTimes – German Unix User Group (GUUG) Members’ Magazine, April 2006, pp. 16–19.