1. BIO MAJ
WORKFLOW ENGINE DEDICATED TO BIO-DATA
SYNCHRONIZATION AND
PROCESSING.
Sana Anam Roll # 3003
Bs(Hons) Botany 3rd semester Eve
Submitted to Inam ul Haq
University of Education
2. CONTENT
INTRODUCTION
BACKGROUND OF BIOMAJ
APPLICATION
BIOMAJ PROVIDE
CONCLUSION
REFRENCES
University of Education
3. INTRODUCTION
In biocomputing,
analyses are almost systematically reliant on
databanks.
Any biocomputing site therefore needs to manage
these invaluable databanks that hold a huge
amount of information usually several terabytes,
spread over various international sites and in a
consistent format (there are still several different
standards currently).
University of Education
4. BACKGROUND OF BIOMAJ
The BioMAJ project came out of the work of three teams in 2005: INRIA
Rennes and INRA
Toulouse and JouyenJosas.
At the time, no free applications met users’ requirements. The closest
application was citrina, developed by Josh Goodman (from Washington
University’s gmod project).
This was a promising prototype – nonetheless quite far from the
application required – and it had
not been updated since 2004.
In 2006, these teams (INRIA and INRA) developed a new engine called
BioMAJ1. Based on
citrina 0.51, nearly all the code was rewritten and the application’s
architecture and functions were
completely rethought and considerably extended.
During 2007, the application was tested on the three sites involved in the
project to make it
more robust and suitable
University of Education
5. APPLICATION
Synchronization :
Multiple remote protocols (ftp, sftp, http, rsync,
local copy)
Data transfers integrity check
Release versioning using a incremental
approach
Multi threading
Data extraction (gzip, tar, bzip)
Data tree directory normalization
University of Education
6. Pre &Post processing :
Advanced workflow description (D.A.G) using
Easy normalized syntax language
Post-process indexation for various
bioinformatics software (blast, srs, fastacmd,
readseq, etc…)
Easy integration of personal scripts for bank
post-processing automation
University of Education
7. Supervision :
Administration web interface
Repository statistics
Mail alerts for the update cycle supervision
University of Education
8. BIOMAJ PROVIDE
A reliable workflow engine that can download
remote data automatically and intelligently
(error correction, synchronization of local and
remote data), apply formatting to this data and
put it into production (make the data available for all
users and/or applications).
A group of predefined workflows for the main
biological banks.
An indexing scripts library (formatting for
biological data)
University of Education
9. CONCLUSION
BioMAJ provides flexibility in managing banks of
sequences on a site while allowing for rapid
implementation of new workflows by simply creating
a bank description file.
University of Education
10. REFERENCES
Website: http://biomaj.genouest.org/
University of Education
Authors: David Allouche, Olivier Filangi , Romaric Sabas,
Olivier Sallou
(olivier.sallou@irisa.fr)