SlideShare a Scribd company logo
BioStatFlow –© INRA DJ 2014
PMFB –UMR 1332, INRA, F-33140 Villenave d’Ornon
djacob@bordeaux.inra.fr
http://biostatflow.org
BioStatFlow –© INRA DJ 2014
BioStatFlow is a web application designed for the analysis of "omics", including
metabolomics, data with statistical methods. It deals with the analysis of data sets
generated from experiments.
Omics experiments yield large amounts of data, too much to be interpreted by the human
eye. A combination of multivariate and univariate data analyses are therefore essential to
extract and visualize the information of interest. Biologists need to gain basic knowledge
about the statistics employed to critically contribute to and evaluate their experimental
design, protocols, and results.
Nevertheless, there is still a lack of useful, fast, and easy online statistical tools for those
who are not experts in statistics. BioStatFlow has been developed to meet this need.
A web-based tool for Statistical Analysis
BioStatFlow –© INRA DJ 2014
Motivation of the design of BioStatFlow
1. The main goal of BioStatFlow is to facilitate the access to statistical tools for biologists that are
not specialists. It has been designed to execute statistical analyses sequentially, i.e. a linear chain
of statistical processing, so-called workflow in BioStatFlow. From a set of use cases identified
(mainly around OMICS data), BioStatFlow is based on the typical workflow as shown below:
A set of analysis is first proposed as a static sequence in order to normalize the
dataset. At this stage, users have to follow the order of the sequence. Because
of experimental issues in the technical equipment, the levels of some
analytical variables (features) cannot be determined or that different
experiments need to be compared, missing value estimation and data scaling
are helpful pre-processing steps. This is the default use case (default
workflow). Then, users can choose any of additional methods depending on
the dataset and the corresponding experimental design (i.e. factors), in order i)
to visualize the whole data, ii) to reveal biomarkers, iii) to analyse interactions
between factors, iv) to discriminate groups, and so on.
The entrance to each treatment takes the output of previous treatment.
If a treatment generates a data table (matrix) as an output, it will be used as
input to the next step. Otherwise, if the treatment only generates results (texts
and images) but does not change the input array, this latter will be directly
taken as output.
Each treatment can be written as an R script (most common) or as a PERL
script, embedding binary tools (like Matlab compiled scripts).
BioStatFlow –© INRA DJ 2014
http://biostatflow.org/doc/pg?id=tutorial:startTutorial:
Overview of how to use BioStatFlow
BioStatFlow –© INRA DJ 2014
STEP1: Input Dataset :
Provided by user, by uploading a dataset file
correctly formatted, then « Next Step »
BioStatFlow –© INRA DJ 2014
STEP2: Workflow selection
Modify parameters and/or add another
analysis, then "Launch"
BioStatFlow –© INRA DJ 2014
STEP3: Visualization of Results
Select a result, Zoom In/Out, or Download
BioStatFlow –© INRA DJ 2014
2. BioStatFlow allows bioinformaticians to easily integrate a new method of statistical
analysis in a workflow, or even create their own workflows. Thus, the analysis scripts and
the workflow definition files are stored in separate catalogs of the application; some
configuration files enabling integration without modify the application source code.
Motivation of the design of BioStatFlow
BioStatFlow –© INRA DJ 2014
The BioStatFlow software components consist of:
1. The BioStatFlow core, which is responsible for:
• managing the input-output through the GUI (datasets, workflows, parameters of each analysis, and results),
• creating batch scripts, from the workflow definition files,
• launching the analysis scripts,
• managing the persistent sessions (including access management)
2. The workflow and statistical analysis catalogs. These catalogs may be enriched at any time by adding either some statistical
analyses or even a new workflow.
3. The repository of persistent sessions. To save your work in a persistent session, you have to register before.
Architecture
1
2
3
BioStatFlow –© INRA DJ 2014
Workflow and Statistical Analysis catalogs
Catalog’s Root
Workflow 1
Workflow 2
Workflow n
…
def doc scripts
PCA.def PCA.xml PCA.R
…
…
…
…
…
…
Definition
files
Documentation
files
Scripts
files
workflow.def
Workflow
definition
files
•A Workflow is implemented as a directory containing itself three sub-directories, plus one definition file.
•the ‘def’ sub-directory:
•contains the analysis definition files which serve to automatically build the GUI of input masks
of the analysis parameters with some default values, and also the the header of R scripts taken
into account the initialization of parameters with the values given by the user.
•the 'doc' sub-directory:
•contains the analysis documentation files describing the the analysis parameters within the
input mask.
•the 'scripts' sub-directory:
•contains the analysis scripts themselves (not including the initialisation part of their
parameters, given that the header of each script, automatically generated, takes into account
this part )
•the 'workflow.def‘ file:
•contains the list of all analyses within the workflow
BioStatFlow –© INRA DJ 2014
PCA.def
Header of the R script
(automatically generated)
The R script
(written by the provider)
dataInMat dataInFact
dataOutMat dataOutFact
PCA.R
Params
Results
PCA.xml
An example: PCA
Overview of the interaction mechanism of
the different file types
BioStatFlow –© INRA DJ 2014
An example: PCA
PCA.def
GUI
(automatically
generated)
Header of
the R code
(automatically
generated)
BioStatFlow –© INRA DJ 2014
An example: PCA
…
…
PCA.R :
R code
written by
the provider
BioStatFlow –© INRA DJ 2014
An example: PCA
Results
BioStatFlow –© INRA DJ 2014
Repository of persistent sessions
Repository’s Root
Session 1
Session 2
Session n
…
query
bswf
imported_matrix_file.csv
p0 : Data Formatting
p1 : Split names
Sub-directory of Input data
Sub-directory of the analysis results
p5 : Scaling
…
…
sessparams : session parameters
BioStatFlow –© INRA DJ 2014
3. BioStatFlow helps disseminate the results of statistical analyzes by saving them in a
persistent session so that they can be fully restored. One can thus provide the session
identifier when publishing results (see the tutorial).
To disseminate your data and their associated statistical analysis, communicate the URL formed as:
http://biostatflow.org/view/<SESSION ID>
Motivation of the design of BioStatFlow
BioStatFlow –© INRA DJ 2014
Example of Session ID: http://biostatflow.org/view/G633
Results of statistical analyzes
Motivation of the design of BioStatFlow: Dissemination
Datasets
R code
BioStatFlow –© INRA DJ 2014
Some Links
A Spotlight on BioStatFlow in MetaboNews
http://www.metabonews.ca/Feb2015/MetaboNews_Feb2015.htm#spotlight
BioStatFlow is available online:
http://biostatflow.org
A Tutorial on BioStatFlow
http://biostatflow.org/doc/pg?id=tutorial:start
BioStatFlow –© INRA DJ 2014
Some references
BioStatFlow –© INRA DJ 2014
experiment
Data preprocessing
BioStatFlow –© INRA DJ 2014
BioStatFlow –© INRA DJ 2014
BioStatFlow –© INRA DJ 2014

More Related Content

Viewers also liked

Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...
Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...
Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...
IRIB
 
Registro eletrônico e a privacidade de dados
Registro eletrônico e a privacidade de dadosRegistro eletrônico e a privacidade de dados
Registro eletrônico e a privacidade de dados
IRIB
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and Mining
Daniel JACOB
 
Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...
Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...
Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...
Judith Eckle-Kohler
 

Viewers also liked (7)

Bio ent47
Bio ent47Bio ent47
Bio ent47
 
Blog
BlogBlog
Blog
 
Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...
Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...
Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...
 
Registro eletrônico e a privacidade de dados
Registro eletrônico e a privacidade de dadosRegistro eletrônico e a privacidade de dados
Registro eletrônico e a privacidade de dados
 
04 e
04 e04 e
04 e
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and Mining
 
Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...
Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...
Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...
 

Similar to Biostatflow

Computers in management
Computers in managementComputers in management
Computers in management
Kinshook Chaturvedi
 
Association Rule Mining Scheme for Software Failure Analysis
Association Rule Mining Scheme for Software Failure AnalysisAssociation Rule Mining Scheme for Software Failure Analysis
Association Rule Mining Scheme for Software Failure Analysis
Editor IJMTER
 
Data modelling tool in CASE
Data modelling tool in CASEData modelling tool in CASE
Data modelling tool in CASE
Manju Pillai
 
Log Analysis Engine with Integration of Hadoop and Spark
Log Analysis Engine with Integration of Hadoop and SparkLog Analysis Engine with Integration of Hadoop and Spark
Log Analysis Engine with Integration of Hadoop and Spark
IRJET Journal
 
Algorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code MiningAlgorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code Mining
IRJET Journal
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
Michael Häusler
 
Data Gaurd Final Thesis for University in Progress (2).docx
Data Gaurd Final Thesis for University in Progress (2).docxData Gaurd Final Thesis for University in Progress (2).docx
Data Gaurd Final Thesis for University in Progress (2).docx
MohdKashif82
 
IJET-V2I6P28
IJET-V2I6P28IJET-V2I6P28
Function points and elements
Function points and elementsFunction points and elements
Function points and elements
Busi Sreedhaar Reddy
 
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
ChemAxon
 
BIO
BIOBIO
Generalized audit-software
Generalized audit-softwareGeneralized audit-software
Generalized audit-software
kzoe1996
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
OSTHUS
 
A Survey on Bug Tracking System for Effective Bug Clearance
A Survey on Bug Tracking System for Effective Bug ClearanceA Survey on Bug Tracking System for Effective Bug Clearance
A Survey on Bug Tracking System for Effective Bug Clearance
IRJET Journal
 
Practical operability techniques for teams - Matthew Skelton - Agile in the C...
Practical operability techniques for teams - Matthew Skelton - Agile in the C...Practical operability techniques for teams - Matthew Skelton - Agile in the C...
Practical operability techniques for teams - Matthew Skelton - Agile in the C...
Skelton Thatcher Consulting Ltd
 
Maximizing SAP ABAP Performance
Maximizing SAP ABAP PerformanceMaximizing SAP ABAP Performance
Maximizing SAP ABAP Performance
PeterHBrown
 
Role of Computers in Research, Data Processing, Data Analysis
Role of Computers in Research, Data Processing, Data AnalysisRole of Computers in Research, Data Processing, Data Analysis
Role of Computers in Research, Data Processing, Data Analysis
RKavithamani
 
Sd Revision
Sd RevisionSd Revision
Sd Revision
mrsmackenzie
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
Massimiliano Di Penta
 
Leveraging Python Telemetry, Azure Application Logging, and Performance Testi...
Leveraging Python Telemetry, Azure Application Logging, and Performance Testi...Leveraging Python Telemetry, Azure Application Logging, and Performance Testi...
Leveraging Python Telemetry, Azure Application Logging, and Performance Testi...
Stackify
 

Similar to Biostatflow (20)

Computers in management
Computers in managementComputers in management
Computers in management
 
Association Rule Mining Scheme for Software Failure Analysis
Association Rule Mining Scheme for Software Failure AnalysisAssociation Rule Mining Scheme for Software Failure Analysis
Association Rule Mining Scheme for Software Failure Analysis
 
Data modelling tool in CASE
Data modelling tool in CASEData modelling tool in CASE
Data modelling tool in CASE
 
Log Analysis Engine with Integration of Hadoop and Spark
Log Analysis Engine with Integration of Hadoop and SparkLog Analysis Engine with Integration of Hadoop and Spark
Log Analysis Engine with Integration of Hadoop and Spark
 
Algorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code MiningAlgorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code Mining
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
 
Data Gaurd Final Thesis for University in Progress (2).docx
Data Gaurd Final Thesis for University in Progress (2).docxData Gaurd Final Thesis for University in Progress (2).docx
Data Gaurd Final Thesis for University in Progress (2).docx
 
IJET-V2I6P28
IJET-V2I6P28IJET-V2I6P28
IJET-V2I6P28
 
Function points and elements
Function points and elementsFunction points and elements
Function points and elements
 
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
 
BIO
BIOBIO
BIO
 
Generalized audit-software
Generalized audit-softwareGeneralized audit-software
Generalized audit-software
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
 
A Survey on Bug Tracking System for Effective Bug Clearance
A Survey on Bug Tracking System for Effective Bug ClearanceA Survey on Bug Tracking System for Effective Bug Clearance
A Survey on Bug Tracking System for Effective Bug Clearance
 
Practical operability techniques for teams - Matthew Skelton - Agile in the C...
Practical operability techniques for teams - Matthew Skelton - Agile in the C...Practical operability techniques for teams - Matthew Skelton - Agile in the C...
Practical operability techniques for teams - Matthew Skelton - Agile in the C...
 
Maximizing SAP ABAP Performance
Maximizing SAP ABAP PerformanceMaximizing SAP ABAP Performance
Maximizing SAP ABAP Performance
 
Role of Computers in Research, Data Processing, Data Analysis
Role of Computers in Research, Data Processing, Data AnalysisRole of Computers in Research, Data Processing, Data Analysis
Role of Computers in Research, Data Processing, Data Analysis
 
Sd Revision
Sd RevisionSd Revision
Sd Revision
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
 
Leveraging Python Telemetry, Azure Application Logging, and Performance Testi...
Leveraging Python Telemetry, Azure Application Logging, and Performance Testi...Leveraging Python Telemetry, Azure Application Logging, and Performance Testi...
Leveraging Python Telemetry, Azure Application Logging, and Performance Testi...
 

Recently uploaded

Communication-Skills-An-Essential-Toolkit.pptx
Communication-Skills-An-Essential-Toolkit.pptxCommunication-Skills-An-Essential-Toolkit.pptx
Communication-Skills-An-Essential-Toolkit.pptx
sanketdhavale23di
 
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdfCMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
IndranilDasgupta19
 
Cyber Insurance Mathematical Model & Pricing
Cyber Insurance Mathematical Model & PricingCyber Insurance Mathematical Model & Pricing
Cyber Insurance Mathematical Model & Pricing
BaraDaniel1
 
History and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big DataHistory and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big Data
Jongwook Woo
 
Celebrity Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...Celebrity Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
arti singh$A17
 
Water bodies of India - Shubham 8 b.pptx
Water bodies of India - Shubham 8 b.pptxWater bodies of India - Shubham 8 b.pptx
Water bodies of India - Shubham 8 b.pptx
manshinain08
 
Aws MLOps Interview Questions with answers
Aws MLOps Interview Questions  with answersAws MLOps Interview Questions  with answers
Aws MLOps Interview Questions with answers
Sathiakumar Chandr
 
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdfThe Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
Riya Sen
 
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
6459astrid
 
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
Grant McAlister
 
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
sukaniyasunnu
 
Histology of Muscle types histology o.ppt
Histology of Muscle types histology o.pptHistology of Muscle types histology o.ppt
Histology of Muscle types histology o.ppt
SamanArshad11
 
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
satpalsheravatmumbai
 
PTT of AI Bots, Avatar, business continuity software.
PTT of AI Bots, Avatar, business continuity software.PTT of AI Bots, Avatar, business continuity software.
PTT of AI Bots, Avatar, business continuity software.
arash8484
 
UNITEC Institute of Technology diploma
UNITEC Institute of Technology diplomaUNITEC Institute of Technology diploma
UNITEC Institute of Technology diploma
oyhka
 
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdfWhy_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Alexander Teggin
 
Semantic Web and organizational data .pptx
Semantic Web and organizational data .pptxSemantic Web and organizational data .pptx
Semantic Web and organizational data .pptx
Kanchana Weerasinghe
 
Self-healing Security Systems - CloudIOTEnterpriseSystems-Group5.pptx
Self-healing Security Systems - CloudIOTEnterpriseSystems-Group5.pptxSelf-healing Security Systems - CloudIOTEnterpriseSystems-Group5.pptx
Self-healing Security Systems - CloudIOTEnterpriseSystems-Group5.pptx
BiplabRoy71
 
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
revolutionary575
 
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion dataTowards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Samuel Jackson
 

Recently uploaded (20)

Communication-Skills-An-Essential-Toolkit.pptx
Communication-Skills-An-Essential-Toolkit.pptxCommunication-Skills-An-Essential-Toolkit.pptx
Communication-Skills-An-Essential-Toolkit.pptx
 
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdfCMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
 
Cyber Insurance Mathematical Model & Pricing
Cyber Insurance Mathematical Model & PricingCyber Insurance Mathematical Model & Pricing
Cyber Insurance Mathematical Model & Pricing
 
History and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big DataHistory and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big Data
 
Celebrity Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...Celebrity Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
 
Water bodies of India - Shubham 8 b.pptx
Water bodies of India - Shubham 8 b.pptxWater bodies of India - Shubham 8 b.pptx
Water bodies of India - Shubham 8 b.pptx
 
Aws MLOps Interview Questions with answers
Aws MLOps Interview Questions  with answersAws MLOps Interview Questions  with answers
Aws MLOps Interview Questions with answers
 
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdfThe Rise of Python in Finance,Automating Trading Strategies: _.pdf
The Rise of Python in Finance,Automating Trading Strategies: _.pdf
 
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
 
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
 
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
 
Histology of Muscle types histology o.ppt
Histology of Muscle types histology o.pptHistology of Muscle types histology o.ppt
Histology of Muscle types histology o.ppt
 
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
 
PTT of AI Bots, Avatar, business continuity software.
PTT of AI Bots, Avatar, business continuity software.PTT of AI Bots, Avatar, business continuity software.
PTT of AI Bots, Avatar, business continuity software.
 
UNITEC Institute of Technology diploma
UNITEC Institute of Technology diplomaUNITEC Institute of Technology diploma
UNITEC Institute of Technology diploma
 
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdfWhy_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
Why_are_we_hypnotizing_ourselves-_ATeggin-1.pdf
 
Semantic Web and organizational data .pptx
Semantic Web and organizational data .pptxSemantic Web and organizational data .pptx
Semantic Web and organizational data .pptx
 
Self-healing Security Systems - CloudIOTEnterpriseSystems-Group5.pptx
Self-healing Security Systems - CloudIOTEnterpriseSystems-Group5.pptxSelf-healing Security Systems - CloudIOTEnterpriseSystems-Group5.pptx
Self-healing Security Systems - CloudIOTEnterpriseSystems-Group5.pptx
 
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
 
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion dataTowards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
 

Biostatflow

  • 1. BioStatFlow –© INRA DJ 2014 PMFB –UMR 1332, INRA, F-33140 Villenave d’Ornon djacob@bordeaux.inra.fr http://biostatflow.org
  • 2. BioStatFlow –© INRA DJ 2014 BioStatFlow is a web application designed for the analysis of "omics", including metabolomics, data with statistical methods. It deals with the analysis of data sets generated from experiments. Omics experiments yield large amounts of data, too much to be interpreted by the human eye. A combination of multivariate and univariate data analyses are therefore essential to extract and visualize the information of interest. Biologists need to gain basic knowledge about the statistics employed to critically contribute to and evaluate their experimental design, protocols, and results. Nevertheless, there is still a lack of useful, fast, and easy online statistical tools for those who are not experts in statistics. BioStatFlow has been developed to meet this need. A web-based tool for Statistical Analysis
  • 3. BioStatFlow –© INRA DJ 2014 Motivation of the design of BioStatFlow 1. The main goal of BioStatFlow is to facilitate the access to statistical tools for biologists that are not specialists. It has been designed to execute statistical analyses sequentially, i.e. a linear chain of statistical processing, so-called workflow in BioStatFlow. From a set of use cases identified (mainly around OMICS data), BioStatFlow is based on the typical workflow as shown below: A set of analysis is first proposed as a static sequence in order to normalize the dataset. At this stage, users have to follow the order of the sequence. Because of experimental issues in the technical equipment, the levels of some analytical variables (features) cannot be determined or that different experiments need to be compared, missing value estimation and data scaling are helpful pre-processing steps. This is the default use case (default workflow). Then, users can choose any of additional methods depending on the dataset and the corresponding experimental design (i.e. factors), in order i) to visualize the whole data, ii) to reveal biomarkers, iii) to analyse interactions between factors, iv) to discriminate groups, and so on. The entrance to each treatment takes the output of previous treatment. If a treatment generates a data table (matrix) as an output, it will be used as input to the next step. Otherwise, if the treatment only generates results (texts and images) but does not change the input array, this latter will be directly taken as output. Each treatment can be written as an R script (most common) or as a PERL script, embedding binary tools (like Matlab compiled scripts).
  • 4. BioStatFlow –© INRA DJ 2014 http://biostatflow.org/doc/pg?id=tutorial:startTutorial: Overview of how to use BioStatFlow
  • 5. BioStatFlow –© INRA DJ 2014 STEP1: Input Dataset : Provided by user, by uploading a dataset file correctly formatted, then « Next Step »
  • 6. BioStatFlow –© INRA DJ 2014 STEP2: Workflow selection Modify parameters and/or add another analysis, then "Launch"
  • 7. BioStatFlow –© INRA DJ 2014 STEP3: Visualization of Results Select a result, Zoom In/Out, or Download
  • 8. BioStatFlow –© INRA DJ 2014 2. BioStatFlow allows bioinformaticians to easily integrate a new method of statistical analysis in a workflow, or even create their own workflows. Thus, the analysis scripts and the workflow definition files are stored in separate catalogs of the application; some configuration files enabling integration without modify the application source code. Motivation of the design of BioStatFlow
  • 9. BioStatFlow –© INRA DJ 2014 The BioStatFlow software components consist of: 1. The BioStatFlow core, which is responsible for: • managing the input-output through the GUI (datasets, workflows, parameters of each analysis, and results), • creating batch scripts, from the workflow definition files, • launching the analysis scripts, • managing the persistent sessions (including access management) 2. The workflow and statistical analysis catalogs. These catalogs may be enriched at any time by adding either some statistical analyses or even a new workflow. 3. The repository of persistent sessions. To save your work in a persistent session, you have to register before. Architecture 1 2 3
  • 10. BioStatFlow –© INRA DJ 2014 Workflow and Statistical Analysis catalogs Catalog’s Root Workflow 1 Workflow 2 Workflow n … def doc scripts PCA.def PCA.xml PCA.R … … … … … … Definition files Documentation files Scripts files workflow.def Workflow definition files •A Workflow is implemented as a directory containing itself three sub-directories, plus one definition file. •the ‘def’ sub-directory: •contains the analysis definition files which serve to automatically build the GUI of input masks of the analysis parameters with some default values, and also the the header of R scripts taken into account the initialization of parameters with the values given by the user. •the 'doc' sub-directory: •contains the analysis documentation files describing the the analysis parameters within the input mask. •the 'scripts' sub-directory: •contains the analysis scripts themselves (not including the initialisation part of their parameters, given that the header of each script, automatically generated, takes into account this part ) •the 'workflow.def‘ file: •contains the list of all analyses within the workflow
  • 11. BioStatFlow –© INRA DJ 2014 PCA.def Header of the R script (automatically generated) The R script (written by the provider) dataInMat dataInFact dataOutMat dataOutFact PCA.R Params Results PCA.xml An example: PCA Overview of the interaction mechanism of the different file types
  • 12. BioStatFlow –© INRA DJ 2014 An example: PCA PCA.def GUI (automatically generated) Header of the R code (automatically generated)
  • 13. BioStatFlow –© INRA DJ 2014 An example: PCA … … PCA.R : R code written by the provider
  • 14. BioStatFlow –© INRA DJ 2014 An example: PCA Results
  • 15. BioStatFlow –© INRA DJ 2014 Repository of persistent sessions Repository’s Root Session 1 Session 2 Session n … query bswf imported_matrix_file.csv p0 : Data Formatting p1 : Split names Sub-directory of Input data Sub-directory of the analysis results p5 : Scaling … … sessparams : session parameters
  • 16. BioStatFlow –© INRA DJ 2014 3. BioStatFlow helps disseminate the results of statistical analyzes by saving them in a persistent session so that they can be fully restored. One can thus provide the session identifier when publishing results (see the tutorial). To disseminate your data and their associated statistical analysis, communicate the URL formed as: http://biostatflow.org/view/<SESSION ID> Motivation of the design of BioStatFlow
  • 17. BioStatFlow –© INRA DJ 2014 Example of Session ID: http://biostatflow.org/view/G633 Results of statistical analyzes Motivation of the design of BioStatFlow: Dissemination Datasets R code
  • 18. BioStatFlow –© INRA DJ 2014 Some Links A Spotlight on BioStatFlow in MetaboNews http://www.metabonews.ca/Feb2015/MetaboNews_Feb2015.htm#spotlight BioStatFlow is available online: http://biostatflow.org A Tutorial on BioStatFlow http://biostatflow.org/doc/pg?id=tutorial:start
  • 19. BioStatFlow –© INRA DJ 2014 Some references
  • 20. BioStatFlow –© INRA DJ 2014 experiment Data preprocessing