SlideShare a Scribd company logo
BioStatFlow –© INRA DJ 2014
PMFB –UMR 1332, INRA, F-33140 Villenave d’Ornon
djacob@bordeaux.inra.fr
http://biostatflow.org
BioStatFlow –© INRA DJ 2014
BioStatFlow is a web application designed for the analysis of "omics", including
metabolomics, data with statistical methods. It deals with the analysis of data sets
generated from experiments.
Omics experiments yield large amounts of data, too much to be interpreted by the human
eye. A combination of multivariate and univariate data analyses are therefore essential to
extract and visualize the information of interest. Biologists need to gain basic knowledge
about the statistics employed to critically contribute to and evaluate their experimental
design, protocols, and results.
Nevertheless, there is still a lack of useful, fast, and easy online statistical tools for those
who are not experts in statistics. BioStatFlow has been developed to meet this need.
A web-based tool for Statistical Analysis
BioStatFlow –© INRA DJ 2014
Motivation of the design of BioStatFlow
1. The main goal of BioStatFlow is to facilitate the access to statistical tools for biologists that are
not specialists. It has been designed to execute statistical analyses sequentially, i.e. a linear chain
of statistical processing, so-called workflow in BioStatFlow. From a set of use cases identified
(mainly around OMICS data), BioStatFlow is based on the typical workflow as shown below:
A set of analysis is first proposed as a static sequence in order to normalize the
dataset. At this stage, users have to follow the order of the sequence. Because
of experimental issues in the technical equipment, the levels of some
analytical variables (features) cannot be determined or that different
experiments need to be compared, missing value estimation and data scaling
are helpful pre-processing steps. This is the default use case (default
workflow). Then, users can choose any of additional methods depending on
the dataset and the corresponding experimental design (i.e. factors), in order i)
to visualize the whole data, ii) to reveal biomarkers, iii) to analyse interactions
between factors, iv) to discriminate groups, and so on.
The entrance to each treatment takes the output of previous treatment.
If a treatment generates a data table (matrix) as an output, it will be used as
input to the next step. Otherwise, if the treatment only generates results (texts
and images) but does not change the input array, this latter will be directly
taken as output.
Each treatment can be written as an R script (most common) or as a PERL
script, embedding binary tools (like Matlab compiled scripts).
BioStatFlow –© INRA DJ 2014
http://biostatflow.org/doc/pg?id=tutorial:startTutorial:
Overview of how to use BioStatFlow
BioStatFlow –© INRA DJ 2014
STEP1: Input Dataset :
Provided by user, by uploading a dataset file
correctly formatted, then « Next Step »
BioStatFlow –© INRA DJ 2014
STEP2: Workflow selection
Modify parameters and/or add another
analysis, then "Launch"
BioStatFlow –© INRA DJ 2014
STEP3: Visualization of Results
Select a result, Zoom In/Out, or Download
BioStatFlow –© INRA DJ 2014
2. BioStatFlow allows bioinformaticians to easily integrate a new method of statistical
analysis in a workflow, or even create their own workflows. Thus, the analysis scripts and
the workflow definition files are stored in separate catalogs of the application; some
configuration files enabling integration without modify the application source code.
Motivation of the design of BioStatFlow
BioStatFlow –© INRA DJ 2014
The BioStatFlow software components consist of:
1. The BioStatFlow core, which is responsible for:
• managing the input-output through the GUI (datasets, workflows, parameters of each analysis, and results),
• creating batch scripts, from the workflow definition files,
• launching the analysis scripts,
• managing the persistent sessions (including access management)
2. The workflow and statistical analysis catalogs. These catalogs may be enriched at any time by adding either some statistical
analyses or even a new workflow.
3. The repository of persistent sessions. To save your work in a persistent session, you have to register before.
Architecture
1
2
3
BioStatFlow –© INRA DJ 2014
Workflow and Statistical Analysis catalogs
Catalog’s Root
Workflow 1
Workflow 2
Workflow n
…
def doc scripts
PCA.def PCA.xml PCA.R
…
…
…
…
…
…
Definition
files
Documentation
files
Scripts
files
workflow.def
Workflow
definition
files
•A Workflow is implemented as a directory containing itself three sub-directories, plus one definition file.
•the ‘def’ sub-directory:
•contains the analysis definition files which serve to automatically build the GUI of input masks
of the analysis parameters with some default values, and also the the header of R scripts taken
into account the initialization of parameters with the values given by the user.
•the 'doc' sub-directory:
•contains the analysis documentation files describing the the analysis parameters within the
input mask.
•the 'scripts' sub-directory:
•contains the analysis scripts themselves (not including the initialisation part of their
parameters, given that the header of each script, automatically generated, takes into account
this part )
•the 'workflow.def‘ file:
•contains the list of all analyses within the workflow
BioStatFlow –© INRA DJ 2014
PCA.def
Header of the R script
(automatically generated)
The R script
(written by the provider)
dataInMat dataInFact
dataOutMat dataOutFact
PCA.R
Params
Results
PCA.xml
An example: PCA
Overview of the interaction mechanism of
the different file types
BioStatFlow –© INRA DJ 2014
An example: PCA
PCA.def
GUI
(automatically
generated)
Header of
the R code
(automatically
generated)
BioStatFlow –© INRA DJ 2014
An example: PCA
…
…
PCA.R :
R code
written by
the provider
BioStatFlow –© INRA DJ 2014
An example: PCA
Results
BioStatFlow –© INRA DJ 2014
Repository of persistent sessions
Repository’s Root
Session 1
Session 2
Session n
…
query
bswf
imported_matrix_file.csv
p0 : Data Formatting
p1 : Split names
Sub-directory of Input data
Sub-directory of the analysis results
p5 : Scaling
…
…
sessparams : session parameters
BioStatFlow –© INRA DJ 2014
3. BioStatFlow helps disseminate the results of statistical analyzes by saving them in a
persistent session so that they can be fully restored. One can thus provide the session
identifier when publishing results (see the tutorial).
To disseminate your data and their associated statistical analysis, communicate the URL formed as:
http://biostatflow.org/view/<SESSION ID>
Motivation of the design of BioStatFlow
BioStatFlow –© INRA DJ 2014
Example of Session ID: http://biostatflow.org/view/G633
Results of statistical analyzes
Motivation of the design of BioStatFlow: Dissemination
Datasets
R code
BioStatFlow –© INRA DJ 2014
Some Links
A Spotlight on BioStatFlow in MetaboNews
http://www.metabonews.ca/Feb2015/MetaboNews_Feb2015.htm#spotlight
BioStatFlow is available online:
http://biostatflow.org
A Tutorial on BioStatFlow
http://biostatflow.org/doc/pg?id=tutorial:start
BioStatFlow –© INRA DJ 2014
Some references
BioStatFlow –© INRA DJ 2014
experiment
Data preprocessing
BioStatFlow –© INRA DJ 2014
BioStatFlow –© INRA DJ 2014
BioStatFlow –© INRA DJ 2014

More Related Content

Viewers also liked

Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...
Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...
Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...
IRIB
 
Registro eletrônico e a privacidade de dados
Registro eletrônico e a privacidade de dadosRegistro eletrônico e a privacidade de dados
Registro eletrônico e a privacidade de dados
IRIB
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and Mining
Daniel JACOB
 
Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...
Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...
Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...
Judith Eckle-Kohler
 

Viewers also liked (7)

Bio ent47
Bio ent47Bio ent47
Bio ent47
 
Blog
BlogBlog
Blog
 
Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...
Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...
Propostas de atendimento aos cartórios de Registro de Imóveis - Desenvolvedor...
 
Registro eletrônico e a privacidade de dados
Registro eletrônico e a privacidade de dadosRegistro eletrônico e a privacidade de dados
Registro eletrônico e a privacidade de dados
 
04 e
04 e04 e
04 e
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and Mining
 
Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...
Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...
Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentatio...
 

Similar to Biostatflow

Computers in management
Computers in managementComputers in management
Computers in management
Kinshook Chaturvedi
 
Association Rule Mining Scheme for Software Failure Analysis
Association Rule Mining Scheme for Software Failure AnalysisAssociation Rule Mining Scheme for Software Failure Analysis
Association Rule Mining Scheme for Software Failure Analysis
Editor IJMTER
 
Data modelling tool in CASE
Data modelling tool in CASEData modelling tool in CASE
Data modelling tool in CASE
Manju Pillai
 
Log Analysis Engine with Integration of Hadoop and Spark
Log Analysis Engine with Integration of Hadoop and SparkLog Analysis Engine with Integration of Hadoop and Spark
Log Analysis Engine with Integration of Hadoop and Spark
IRJET Journal
 
Algorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code MiningAlgorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code Mining
IRJET Journal
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
Michael Häusler
 
Data Gaurd Final Thesis for University in Progress (2).docx
Data Gaurd Final Thesis for University in Progress (2).docxData Gaurd Final Thesis for University in Progress (2).docx
Data Gaurd Final Thesis for University in Progress (2).docx
MohdKashif82
 
IJET-V2I6P28
IJET-V2I6P28IJET-V2I6P28
Function points and elements
Function points and elementsFunction points and elements
Function points and elements
Busi Sreedhaar Reddy
 
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
ChemAxon
 
BIO
BIOBIO
Generalized audit-software
Generalized audit-softwareGeneralized audit-software
Generalized audit-software
kzoe1996
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
OSTHUS
 
A Survey on Bug Tracking System for Effective Bug Clearance
A Survey on Bug Tracking System for Effective Bug ClearanceA Survey on Bug Tracking System for Effective Bug Clearance
A Survey on Bug Tracking System for Effective Bug Clearance
IRJET Journal
 
Practical operability techniques for teams - Matthew Skelton - Agile in the C...
Practical operability techniques for teams - Matthew Skelton - Agile in the C...Practical operability techniques for teams - Matthew Skelton - Agile in the C...
Practical operability techniques for teams - Matthew Skelton - Agile in the C...
Skelton Thatcher Consulting Ltd
 
Maximizing SAP ABAP Performance
Maximizing SAP ABAP PerformanceMaximizing SAP ABAP Performance
Maximizing SAP ABAP Performance
PeterHBrown
 
Role of Computers in Research, Data Processing, Data Analysis
Role of Computers in Research, Data Processing, Data AnalysisRole of Computers in Research, Data Processing, Data Analysis
Role of Computers in Research, Data Processing, Data Analysis
RKavithamani
 
Sd Revision
Sd RevisionSd Revision
Sd Revision
mrsmackenzie
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
Massimiliano Di Penta
 
Leveraging Python Telemetry, Azure Application Logging, and Performance Testi...
Leveraging Python Telemetry, Azure Application Logging, and Performance Testi...Leveraging Python Telemetry, Azure Application Logging, and Performance Testi...
Leveraging Python Telemetry, Azure Application Logging, and Performance Testi...
Stackify
 

Similar to Biostatflow (20)

Computers in management
Computers in managementComputers in management
Computers in management
 
Association Rule Mining Scheme for Software Failure Analysis
Association Rule Mining Scheme for Software Failure AnalysisAssociation Rule Mining Scheme for Software Failure Analysis
Association Rule Mining Scheme for Software Failure Analysis
 
Data modelling tool in CASE
Data modelling tool in CASEData modelling tool in CASE
Data modelling tool in CASE
 
Log Analysis Engine with Integration of Hadoop and Spark
Log Analysis Engine with Integration of Hadoop and SparkLog Analysis Engine with Integration of Hadoop and Spark
Log Analysis Engine with Integration of Hadoop and Spark
 
Algorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code MiningAlgorithm Procedure and Pseudo Code Mining
Algorithm Procedure and Pseudo Code Mining
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
 
Data Gaurd Final Thesis for University in Progress (2).docx
Data Gaurd Final Thesis for University in Progress (2).docxData Gaurd Final Thesis for University in Progress (2).docx
Data Gaurd Final Thesis for University in Progress (2).docx
 
IJET-V2I6P28
IJET-V2I6P28IJET-V2I6P28
IJET-V2I6P28
 
Function points and elements
Function points and elementsFunction points and elements
Function points and elements
 
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
 
BIO
BIOBIO
BIO
 
Generalized audit-software
Generalized audit-softwareGeneralized audit-software
Generalized audit-software
 
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...Revolutionizing Laboratory  Instrument Data for the  Pharmaceutical Industry:...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
 
A Survey on Bug Tracking System for Effective Bug Clearance
A Survey on Bug Tracking System for Effective Bug ClearanceA Survey on Bug Tracking System for Effective Bug Clearance
A Survey on Bug Tracking System for Effective Bug Clearance
 
Practical operability techniques for teams - Matthew Skelton - Agile in the C...
Practical operability techniques for teams - Matthew Skelton - Agile in the C...Practical operability techniques for teams - Matthew Skelton - Agile in the C...
Practical operability techniques for teams - Matthew Skelton - Agile in the C...
 
Maximizing SAP ABAP Performance
Maximizing SAP ABAP PerformanceMaximizing SAP ABAP Performance
Maximizing SAP ABAP Performance
 
Role of Computers in Research, Data Processing, Data Analysis
Role of Computers in Research, Data Processing, Data AnalysisRole of Computers in Research, Data Processing, Data Analysis
Role of Computers in Research, Data Processing, Data Analysis
 
Sd Revision
Sd RevisionSd Revision
Sd Revision
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
 
Leveraging Python Telemetry, Azure Application Logging, and Performance Testi...
Leveraging Python Telemetry, Azure Application Logging, and Performance Testi...Leveraging Python Telemetry, Azure Application Logging, and Performance Testi...
Leveraging Python Telemetry, Azure Application Logging, and Performance Testi...
 

Recently uploaded

Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 

Recently uploaded (20)

Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 

Biostatflow

  • 1. BioStatFlow –© INRA DJ 2014 PMFB –UMR 1332, INRA, F-33140 Villenave d’Ornon djacob@bordeaux.inra.fr http://biostatflow.org
  • 2. BioStatFlow –© INRA DJ 2014 BioStatFlow is a web application designed for the analysis of "omics", including metabolomics, data with statistical methods. It deals with the analysis of data sets generated from experiments. Omics experiments yield large amounts of data, too much to be interpreted by the human eye. A combination of multivariate and univariate data analyses are therefore essential to extract and visualize the information of interest. Biologists need to gain basic knowledge about the statistics employed to critically contribute to and evaluate their experimental design, protocols, and results. Nevertheless, there is still a lack of useful, fast, and easy online statistical tools for those who are not experts in statistics. BioStatFlow has been developed to meet this need. A web-based tool for Statistical Analysis
  • 3. BioStatFlow –© INRA DJ 2014 Motivation of the design of BioStatFlow 1. The main goal of BioStatFlow is to facilitate the access to statistical tools for biologists that are not specialists. It has been designed to execute statistical analyses sequentially, i.e. a linear chain of statistical processing, so-called workflow in BioStatFlow. From a set of use cases identified (mainly around OMICS data), BioStatFlow is based on the typical workflow as shown below: A set of analysis is first proposed as a static sequence in order to normalize the dataset. At this stage, users have to follow the order of the sequence. Because of experimental issues in the technical equipment, the levels of some analytical variables (features) cannot be determined or that different experiments need to be compared, missing value estimation and data scaling are helpful pre-processing steps. This is the default use case (default workflow). Then, users can choose any of additional methods depending on the dataset and the corresponding experimental design (i.e. factors), in order i) to visualize the whole data, ii) to reveal biomarkers, iii) to analyse interactions between factors, iv) to discriminate groups, and so on. The entrance to each treatment takes the output of previous treatment. If a treatment generates a data table (matrix) as an output, it will be used as input to the next step. Otherwise, if the treatment only generates results (texts and images) but does not change the input array, this latter will be directly taken as output. Each treatment can be written as an R script (most common) or as a PERL script, embedding binary tools (like Matlab compiled scripts).
  • 4. BioStatFlow –© INRA DJ 2014 http://biostatflow.org/doc/pg?id=tutorial:startTutorial: Overview of how to use BioStatFlow
  • 5. BioStatFlow –© INRA DJ 2014 STEP1: Input Dataset : Provided by user, by uploading a dataset file correctly formatted, then « Next Step »
  • 6. BioStatFlow –© INRA DJ 2014 STEP2: Workflow selection Modify parameters and/or add another analysis, then "Launch"
  • 7. BioStatFlow –© INRA DJ 2014 STEP3: Visualization of Results Select a result, Zoom In/Out, or Download
  • 8. BioStatFlow –© INRA DJ 2014 2. BioStatFlow allows bioinformaticians to easily integrate a new method of statistical analysis in a workflow, or even create their own workflows. Thus, the analysis scripts and the workflow definition files are stored in separate catalogs of the application; some configuration files enabling integration without modify the application source code. Motivation of the design of BioStatFlow
  • 9. BioStatFlow –© INRA DJ 2014 The BioStatFlow software components consist of: 1. The BioStatFlow core, which is responsible for: • managing the input-output through the GUI (datasets, workflows, parameters of each analysis, and results), • creating batch scripts, from the workflow definition files, • launching the analysis scripts, • managing the persistent sessions (including access management) 2. The workflow and statistical analysis catalogs. These catalogs may be enriched at any time by adding either some statistical analyses or even a new workflow. 3. The repository of persistent sessions. To save your work in a persistent session, you have to register before. Architecture 1 2 3
  • 10. BioStatFlow –© INRA DJ 2014 Workflow and Statistical Analysis catalogs Catalog’s Root Workflow 1 Workflow 2 Workflow n … def doc scripts PCA.def PCA.xml PCA.R … … … … … … Definition files Documentation files Scripts files workflow.def Workflow definition files •A Workflow is implemented as a directory containing itself three sub-directories, plus one definition file. •the ‘def’ sub-directory: •contains the analysis definition files which serve to automatically build the GUI of input masks of the analysis parameters with some default values, and also the the header of R scripts taken into account the initialization of parameters with the values given by the user. •the 'doc' sub-directory: •contains the analysis documentation files describing the the analysis parameters within the input mask. •the 'scripts' sub-directory: •contains the analysis scripts themselves (not including the initialisation part of their parameters, given that the header of each script, automatically generated, takes into account this part ) •the 'workflow.def‘ file: •contains the list of all analyses within the workflow
  • 11. BioStatFlow –© INRA DJ 2014 PCA.def Header of the R script (automatically generated) The R script (written by the provider) dataInMat dataInFact dataOutMat dataOutFact PCA.R Params Results PCA.xml An example: PCA Overview of the interaction mechanism of the different file types
  • 12. BioStatFlow –© INRA DJ 2014 An example: PCA PCA.def GUI (automatically generated) Header of the R code (automatically generated)
  • 13. BioStatFlow –© INRA DJ 2014 An example: PCA … … PCA.R : R code written by the provider
  • 14. BioStatFlow –© INRA DJ 2014 An example: PCA Results
  • 15. BioStatFlow –© INRA DJ 2014 Repository of persistent sessions Repository’s Root Session 1 Session 2 Session n … query bswf imported_matrix_file.csv p0 : Data Formatting p1 : Split names Sub-directory of Input data Sub-directory of the analysis results p5 : Scaling … … sessparams : session parameters
  • 16. BioStatFlow –© INRA DJ 2014 3. BioStatFlow helps disseminate the results of statistical analyzes by saving them in a persistent session so that they can be fully restored. One can thus provide the session identifier when publishing results (see the tutorial). To disseminate your data and their associated statistical analysis, communicate the URL formed as: http://biostatflow.org/view/<SESSION ID> Motivation of the design of BioStatFlow
  • 17. BioStatFlow –© INRA DJ 2014 Example of Session ID: http://biostatflow.org/view/G633 Results of statistical analyzes Motivation of the design of BioStatFlow: Dissemination Datasets R code
  • 18. BioStatFlow –© INRA DJ 2014 Some Links A Spotlight on BioStatFlow in MetaboNews http://www.metabonews.ca/Feb2015/MetaboNews_Feb2015.htm#spotlight BioStatFlow is available online: http://biostatflow.org A Tutorial on BioStatFlow http://biostatflow.org/doc/pg?id=tutorial:start
  • 19. BioStatFlow –© INRA DJ 2014 Some references
  • 20. BioStatFlow –© INRA DJ 2014 experiment Data preprocessing