SlideShare a Scribd company logo
1 of 10
BIO MAJ 
WORKFLOW ENGINE DEDICATED TO BIO-DATA 
SYNCHRONIZATION AND 
PROCESSING. 
Sana Anam Roll # 3003 
Bs(Hons) Botany 3rd semester Eve 
Submitted to Inam ul Haq 
University of Education
CONTENT 
 INTRODUCTION 
 BACKGROUND OF BIOMAJ 
 APPLICATION 
 BIOMAJ PROVIDE 
 CONCLUSION 
 REFRENCES 
University of Education
INTRODUCTION 
 In biocomputing, 
 analyses are almost systematically reliant on 
databanks. 
 Any biocomputing site therefore needs to manage 
these invaluable databanks that hold a huge 
amount of information usually several terabytes, 
spread over various international sites and in a 
consistent format (there are still several different 
standards currently). 
University of Education
BACKGROUND OF BIOMAJ 
 The BioMAJ project came out of the work of three teams in 2005: INRIA 
Rennes and INRA 
 Toulouse and JouyenJosas. 
 At the time, no free applications met users’ requirements. The closest 
 application was citrina, developed by Josh Goodman (from Washington 
University’s gmod project). 
 This was a promising prototype – nonetheless quite far from the 
application required – and it had 
 not been updated since 2004. 
 In 2006, these teams (INRIA and INRA) developed a new engine called 
BioMAJ1. Based on 
 citrina 0.51, nearly all the code was rewritten and the application’s 
architecture and functions were 
 completely rethought and considerably extended. 
 During 2007, the application was tested on the three sites involved in the 
project to make it 
 more robust and suitable 
University of Education
APPLICATION 
 Synchronization : 
 Multiple remote protocols (ftp, sftp, http, rsync, 
local copy) 
 Data transfers integrity check 
 Release versioning using a incremental 
approach 
 Multi threading 
 Data extraction (gzip, tar, bzip) 
 Data tree directory normalization 
University of Education
 Pre &Post processing : 
 Advanced workflow description (D.A.G) using 
Easy normalized syntax language 
 Post-process indexation for various 
bioinformatics software (blast, srs, fastacmd, 
readseq, etc…) 
 Easy integration of personal scripts for bank 
post-processing automation 
University of Education
 Supervision : 
 Administration web interface 
 Repository statistics 
 Mail alerts for the update cycle supervision 
University of Education
BIOMAJ PROVIDE 
 A reliable workflow engine that can download 
remote data automatically and intelligently 
 (error correction, synchronization of local and 
remote data), apply formatting to this data and 
 put it into production (make the data available for all 
users and/or applications). 
 A group of predefined workflows for the main 
biological banks. 
 An indexing scripts library (formatting for 
biological data) 
University of Education
CONCLUSION 
 BioMAJ provides flexibility in managing banks of 
sequences on a site while allowing for rapid 
implementation of new workflows by simply creating 
a bank description file. 
University of Education
REFERENCES 
 Website: http://biomaj.genouest.org/ 
University of Education 
Authors: David Allouche, Olivier Filangi , Romaric Sabas, 
Olivier Sallou 
(olivier.sallou@irisa.fr)

More Related Content

What's hot

ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaStuart Chalk
 
Features of biological databases
Features of biological databasesFeatures of biological databases
Features of biological databasesCharu Sharma
 
bio data
bio databio data
bio data007dcp
 
Proteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASyProteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASyChrist College, Rajkot
 
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...Vivek Krishnakumar
 
American Society for Mass Spectrometry Conference 2013
American Society for Mass Spectrometry Conference 2013American Society for Mass Spectrometry Conference 2013
American Society for Mass Spectrometry Conference 2013Dmitry Grapov
 
exFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics ExperimentsexFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics ExperimentsTim Clark
 
eXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic ExperimentseXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic ExperimentsTim Clark
 
Model repositories and standard formats for model reusability
Model repositories and standard formats for model reusabilityModel repositories and standard formats for model reusability
Model repositories and standard formats for model reusabilityUniversity Medicine Greifswald
 
Reference Management Tools
Reference Management ToolsReference Management Tools
Reference Management Toolsmiraclejishnu
 
Fairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology viewsFairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology viewsTim Clark
 

What's hot (18)

ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
 
Features of biological databases
Features of biological databasesFeatures of biological databases
Features of biological databases
 
Biological Database
Biological DatabaseBiological Database
Biological Database
 
bio data
bio databio data
bio data
 
The Chemtools LaBLog
The Chemtools LaBLogThe Chemtools LaBLog
The Chemtools LaBLog
 
Proteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASyProteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASy
 
Genome Database Systems
Genome Database Systems Genome Database Systems
Genome Database Systems
 
FAIR data management in biomedicine
FAIR data management  in biomedicineFAIR data management  in biomedicine
FAIR data management in biomedicine
 
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
 
American Society for Mass Spectrometry Conference 2013
American Society for Mass Spectrometry Conference 2013American Society for Mass Spectrometry Conference 2013
American Society for Mass Spectrometry Conference 2013
 
exFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics ExperimentsexFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics Experiments
 
eXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic ExperimentseXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic Experiments
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Model repositories and standard formats for model reusability
Model repositories and standard formats for model reusabilityModel repositories and standard formats for model reusability
Model repositories and standard formats for model reusability
 
FuGE Update
FuGE UpdateFuGE Update
FuGE Update
 
Reference Management Tools
Reference Management ToolsReference Management Tools
Reference Management Tools
 
Fairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology viewsFairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology views
 
Powerpoint BinoM
Powerpoint BinoMPowerpoint BinoM
Powerpoint BinoM
 

Similar to 3003 eve 1

FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
Software Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglySoftware Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglyJoão André Carriço
 
Ogce Workflow Suite
Ogce Workflow SuiteOgce Workflow Suite
Ogce Workflow Suitesmarru
 
grid mining
grid mininggrid mining
grid miningARNOLD
 
Enabling Large Scale Sequencing Studies through Science as a Service
Enabling Large Scale Sequencing Studies through Science as a ServiceEnabling Large Scale Sequencing Studies through Science as a Service
Enabling Large Scale Sequencing Studies through Science as a ServiceJustin Johnson
 
Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Jeffrey Sica
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
 
Closing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real ScienceClosing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real ScienceJustin Johnson
 
A Web Services Based Framework For Uniform Integration Of Command-Line Bioinf...
A Web Services Based Framework For Uniform Integration Of Command-Line Bioinf...A Web Services Based Framework For Uniform Integration Of Command-Line Bioinf...
A Web Services Based Framework For Uniform Integration Of Command-Line Bioinf...Kim Daniels
 
Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Dag Endresen
 
Book of abstract volume 8 no 9 ijcsis december 2010
Book of abstract volume 8 no 9 ijcsis december 2010Book of abstract volume 8 no 9 ijcsis december 2010
Book of abstract volume 8 no 9 ijcsis december 2010Oladokun Sulaiman
 
An Ad-hoc Smart Gateway Platform for the Web of Things (IEEE iThings 2013 Bes...
An Ad-hoc Smart Gateway Platform for the Web of Things (IEEE iThings 2013 Bes...An Ad-hoc Smart Gateway Platform for the Web of Things (IEEE iThings 2013 Bes...
An Ad-hoc Smart Gateway Platform for the Web of Things (IEEE iThings 2013 Bes...Darren Carlson
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer OverlordsIan Foster
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011Ian Foster
 
Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious DiseaseJoão André Carriço
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...CSCJournals
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows Carole Goble
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsSrinath Perera
 

Similar to 3003 eve 1 (20)

FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
Software Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglySoftware Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The Ugly
 
Ogce Workflow Suite
Ogce Workflow SuiteOgce Workflow Suite
Ogce Workflow Suite
 
grid mining
grid mininggrid mining
grid mining
 
Enabling Large Scale Sequencing Studies through Science as a Service
Enabling Large Scale Sequencing Studies through Science as a ServiceEnabling Large Scale Sequencing Studies through Science as a Service
Enabling Large Scale Sequencing Studies through Science as a Service
 
Grid Computing
Grid ComputingGrid Computing
Grid Computing
 
Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Closing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real ScienceClosing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real Science
 
A Web Services Based Framework For Uniform Integration Of Command-Line Bioinf...
A Web Services Based Framework For Uniform Integration Of Command-Line Bioinf...A Web Services Based Framework For Uniform Integration Of Command-Line Bioinf...
A Web Services Based Framework For Uniform Integration Of Command-Line Bioinf...
 
Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)
 
Book of abstract volume 8 no 9 ijcsis december 2010
Book of abstract volume 8 no 9 ijcsis december 2010Book of abstract volume 8 no 9 ijcsis december 2010
Book of abstract volume 8 no 9 ijcsis december 2010
 
An Ad-hoc Smart Gateway Platform for the Web of Things (IEEE iThings 2013 Bes...
An Ad-hoc Smart Gateway Platform for the Web of Things (IEEE iThings 2013 Bes...An Ad-hoc Smart Gateway Platform for the Web of Things (IEEE iThings 2013 Bes...
An Ad-hoc Smart Gateway Platform for the Web of Things (IEEE iThings 2013 Bes...
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer Overlords
 
Overview of the Data Processing Error Analysis System (DPEAS)
Overview of the Data Processing Error Analysis System (DPEAS)Overview of the Data Processing Error Analysis System (DPEAS)
Overview of the Data Processing Error Analysis System (DPEAS)
 
Rpi talk foster september 2011
Rpi talk foster september 2011Rpi talk foster september 2011
Rpi talk foster september 2011
 
Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious Disease
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and Applications
 

More from university of education,Lahore

More from university of education,Lahore (20)

Activites and Time Planning
 Activites and Time Planning Activites and Time Planning
Activites and Time Planning
 
Steganography
SteganographySteganography
Steganography
 
Classical Encryption Techniques
Classical Encryption TechniquesClassical Encryption Techniques
Classical Encryption Techniques
 
Activites and Time Planning
Activites and Time PlanningActivites and Time Planning
Activites and Time Planning
 
OSI Security Architecture
OSI Security ArchitectureOSI Security Architecture
OSI Security Architecture
 
Network Security Terminologies
Network Security TerminologiesNetwork Security Terminologies
Network Security Terminologies
 
Project Scheduling, Planning and Risk Management
Project Scheduling, Planning and Risk ManagementProject Scheduling, Planning and Risk Management
Project Scheduling, Planning and Risk Management
 
Software Testing and Debugging
Software Testing and DebuggingSoftware Testing and Debugging
Software Testing and Debugging
 
ePayment Methods
ePayment MethodsePayment Methods
ePayment Methods
 
SEO
SEOSEO
SEO
 
A Star Search
A Star SearchA Star Search
A Star Search
 
Enterprise Application Integration
Enterprise Application IntegrationEnterprise Application Integration
Enterprise Application Integration
 
Uml Diagrams
Uml DiagramsUml Diagrams
Uml Diagrams
 
eDras Max
eDras MaxeDras Max
eDras Max
 
RAD Model
RAD ModelRAD Model
RAD Model
 
Microsoft Project
Microsoft ProjectMicrosoft Project
Microsoft Project
 
Itertaive Process Development
Itertaive Process DevelopmentItertaive Process Development
Itertaive Process Development
 
Computer Aided Software Engineering Nayab Awan
Computer Aided Software Engineering Nayab AwanComputer Aided Software Engineering Nayab Awan
Computer Aided Software Engineering Nayab Awan
 
Lect 2 assessing the technology landscape
Lect 2 assessing the technology landscapeLect 2 assessing the technology landscape
Lect 2 assessing the technology landscape
 
system level requirements gathering and analysis
system level requirements gathering and analysissystem level requirements gathering and analysis
system level requirements gathering and analysis
 

3003 eve 1

  • 1. BIO MAJ WORKFLOW ENGINE DEDICATED TO BIO-DATA SYNCHRONIZATION AND PROCESSING. Sana Anam Roll # 3003 Bs(Hons) Botany 3rd semester Eve Submitted to Inam ul Haq University of Education
  • 2. CONTENT  INTRODUCTION  BACKGROUND OF BIOMAJ  APPLICATION  BIOMAJ PROVIDE  CONCLUSION  REFRENCES University of Education
  • 3. INTRODUCTION  In biocomputing,  analyses are almost systematically reliant on databanks.  Any biocomputing site therefore needs to manage these invaluable databanks that hold a huge amount of information usually several terabytes, spread over various international sites and in a consistent format (there are still several different standards currently). University of Education
  • 4. BACKGROUND OF BIOMAJ  The BioMAJ project came out of the work of three teams in 2005: INRIA Rennes and INRA  Toulouse and JouyenJosas.  At the time, no free applications met users’ requirements. The closest  application was citrina, developed by Josh Goodman (from Washington University’s gmod project).  This was a promising prototype – nonetheless quite far from the application required – and it had  not been updated since 2004.  In 2006, these teams (INRIA and INRA) developed a new engine called BioMAJ1. Based on  citrina 0.51, nearly all the code was rewritten and the application’s architecture and functions were  completely rethought and considerably extended.  During 2007, the application was tested on the three sites involved in the project to make it  more robust and suitable University of Education
  • 5. APPLICATION  Synchronization :  Multiple remote protocols (ftp, sftp, http, rsync, local copy)  Data transfers integrity check  Release versioning using a incremental approach  Multi threading  Data extraction (gzip, tar, bzip)  Data tree directory normalization University of Education
  • 6.  Pre &Post processing :  Advanced workflow description (D.A.G) using Easy normalized syntax language  Post-process indexation for various bioinformatics software (blast, srs, fastacmd, readseq, etc…)  Easy integration of personal scripts for bank post-processing automation University of Education
  • 7.  Supervision :  Administration web interface  Repository statistics  Mail alerts for the update cycle supervision University of Education
  • 8. BIOMAJ PROVIDE  A reliable workflow engine that can download remote data automatically and intelligently  (error correction, synchronization of local and remote data), apply formatting to this data and  put it into production (make the data available for all users and/or applications).  A group of predefined workflows for the main biological banks.  An indexing scripts library (formatting for biological data) University of Education
  • 9. CONCLUSION  BioMAJ provides flexibility in managing banks of sequences on a site while allowing for rapid implementation of new workflows by simply creating a bank description file. University of Education
  • 10. REFERENCES  Website: http://biomaj.genouest.org/ University of Education Authors: David Allouche, Olivier Filangi , Romaric Sabas, Olivier Sallou (olivier.sallou@irisa.fr)