SlideShare a Scribd company logo
Non-Cognitive Predictors of Student Success:
A Predictive Validity Comparison Between Domestic and International Students
Creating an in-silico dilution series
for software testing
Combine either a Water or E.coli
background, with peptides extracted from
HEK293 foreground
and analyse with OpenMSToffee
Peptide-centric deep learning
pipeline
C++ RTree enables efficient peptide-
centric access to raw data.
Constructing large training sets of images
for deep learning is trivial.
A tool for manual annotation of peaks is
included in python library.
Target
Decoy
Lossless, open, efficient DIA-MS
file format, same size as vendor
files.
OpenMSToffee direct
replacement for OpenSWATH
Enables Deep Learning pipeline
Non-Cognitive Predictors of Student Success:
A Predictive Validity Comparison Between Domestic and International Students
INTRO
• Yet another file format!
• Optimised for storage and random IO.
• HDF5 backed. Lossless compression.
• RTree for spatial access queries.
• Python & C++. MIT licensed.
conda install –c cmriprocan –c plotly toffee
docker pull cmriprocan/toffee:<version>
METHODS
1. Convert m/z to uint32 using Maxwell eqn
2. Each measurement is a uint32 triplet of
retention time, m/z, intensity
3. Store data as compressed row storage
sparse matrix in HDF5
4. OpenMSToffee gives drop-in
replacement for OpenSWATH and
retains PyProphet workflow
RESULTS
• Same size as wiff file.
• No difference in peptide matrix
• Create in-silico dilution series by
selectively combining two toffee files
• Analyse data as image:
• Crane – noise removal (see poster)
• Deep Learning for peptide
identification
Toffee: A highly compressed,
efficient file format for DIA
David Clarke, Akila Seneviratne,
Brett Tully
Children’s Medical Research Institute
0 1000 2000 3000 4000 5000
0
5k
10k
Background E.coli: SKPGAAMVEMADGYAVDR_3
Retention Time (s)
XICMassovercharge
MS1: 623.29
MS2: 680.34
MS2: 795.36
MS2: 460.25
MS2: 866.40
MS2: 871.43
MS2: 997.44
0 1000 2000 3000 4000 5000
0
1000
2000
3000
4000
Foreground HEK293: SKPGAAMVEMADGYAVDR_3
Retention Time (s)
XIC
Massovercharge
MS1: 623.29
MS2: 680.34
MS2: 795.36
MS2: 460.25
MS2: 866.40
MS2: 871.43
MS2: 997.44
0 1000 2000 3000 4000 5000
0
5k
10k
E.coli + HEK293: SKPGAAMVEMADGYAVDR_3
Retention Time (s)
XICMassovercharge
MS1: 623.29
MS2: 680.34
MS2: 795.36
MS2: 460.25
MS2: 866.40
MS2: 871.43
MS2: 997.44

More Related Content

Similar to Toffee – A highly efficient, lossless file format for DIA-MS

Predictive churn h20_dsx
Predictive churn h20_dsxPredictive churn h20_dsx
Predictive churn h20_dsx
Ndjido Ardo BAR
 
Researh toolbox - Data analysis with python
Researh toolbox  - Data analysis with pythonResearh toolbox  - Data analysis with python
Researh toolbox - Data analysis with python
Umair ul Hassan
 
Researh toolbox-data-analysis-with-python
Researh toolbox-data-analysis-with-pythonResearh toolbox-data-analysis-with-python
Researh toolbox-data-analysis-with-python
Waternomics
 
Primers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewPrimers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code Review
Delft University of Technology
 
Hambug R Meetup - Intro to H2O
Hambug R Meetup - Intro to H2OHambug R Meetup - Intro to H2O
Hambug R Meetup - Intro to H2O
Sri Ambati
 
Data platforms 2017
Data platforms 2017Data platforms 2017
Data platforms 2017
Kellyn Pot'Vin-Gorman
 
GUODA: A Unified Platform for Large-Scale Computational Research on Open-Acce...
GUODA: A Unified Platform for Large-Scale Computational Research on Open-Acce...GUODA: A Unified Platform for Large-Scale Computational Research on Open-Acce...
GUODA: A Unified Platform for Large-Scale Computational Research on Open-Acce...
Matthew J Collins
 
Anand madhab c linux
Anand madhab c linuxAnand madhab c linux
Anand madhab c linux
Anand Madhab
 
Introduction to HPC Programming Models - EUDAT Summer School (Stefano Markidi...
Introduction to HPC Programming Models - EUDAT Summer School (Stefano Markidi...Introduction to HPC Programming Models - EUDAT Summer School (Stefano Markidi...
Introduction to HPC Programming Models - EUDAT Summer School (Stefano Markidi...
EUDAT
 
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with ConcourseContinuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
VMware Tanzu
 
EclipseCon France 2015 - Science Track
EclipseCon France 2015 - Science TrackEclipseCon France 2015 - Science Track
EclipseCon France 2015 - Science Track
Boris Adryan
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
Bonnie Hurwitz
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Reproducibility in artificial intelligence
Reproducibility in artificial intelligenceReproducibility in artificial intelligence
Reproducibility in artificial intelligence
Carlos Toxtli
 
Python webinar 4th june
Python webinar 4th junePython webinar 4th june
Python webinar 4th june
Edureka!
 
ownR platform extended technical introduction
ownR platform extended technical introductionownR platform extended technical introduction
ownR platform extended technical introduction
Functional Analytics
 
ownR extended technical introduction
ownR extended technical introductionownR extended technical introduction
ownR extended technical introduction
Functional Analytics
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Florian Wilhelm
 

Similar to Toffee – A highly efficient, lossless file format for DIA-MS (20)

Predictive churn h20_dsx
Predictive churn h20_dsxPredictive churn h20_dsx
Predictive churn h20_dsx
 
Researh toolbox - Data analysis with python
Researh toolbox  - Data analysis with pythonResearh toolbox  - Data analysis with python
Researh toolbox - Data analysis with python
 
Researh toolbox-data-analysis-with-python
Researh toolbox-data-analysis-with-pythonResearh toolbox-data-analysis-with-python
Researh toolbox-data-analysis-with-python
 
Primers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewPrimers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code Review
 
Hambug R Meetup - Intro to H2O
Hambug R Meetup - Intro to H2OHambug R Meetup - Intro to H2O
Hambug R Meetup - Intro to H2O
 
Data platforms 2017
Data platforms 2017Data platforms 2017
Data platforms 2017
 
GUODA: A Unified Platform for Large-Scale Computational Research on Open-Acce...
GUODA: A Unified Platform for Large-Scale Computational Research on Open-Acce...GUODA: A Unified Platform for Large-Scale Computational Research on Open-Acce...
GUODA: A Unified Platform for Large-Scale Computational Research on Open-Acce...
 
Anand madhab c linux
Anand madhab c linuxAnand madhab c linux
Anand madhab c linux
 
Introduction to HPC Programming Models - EUDAT Summer School (Stefano Markidi...
Introduction to HPC Programming Models - EUDAT Summer School (Stefano Markidi...Introduction to HPC Programming Models - EUDAT Summer School (Stefano Markidi...
Introduction to HPC Programming Models - EUDAT Summer School (Stefano Markidi...
 
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with ConcourseContinuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
 
Mannu_Kumar_CV
Mannu_Kumar_CVMannu_Kumar_CV
Mannu_Kumar_CV
 
EclipseCon France 2015 - Science Track
EclipseCon France 2015 - Science TrackEclipseCon France 2015 - Science Track
EclipseCon France 2015 - Science Track
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Reproducibility in artificial intelligence
Reproducibility in artificial intelligenceReproducibility in artificial intelligence
Reproducibility in artificial intelligence
 
Python webinar 4th june
Python webinar 4th junePython webinar 4th june
Python webinar 4th june
 
Pine education-platform
Pine education-platformPine education-platform
Pine education-platform
 
ownR platform extended technical introduction
ownR platform extended technical introductionownR platform extended technical introduction
ownR platform extended technical introduction
 
ownR extended technical introduction
ownR extended technical introductionownR extended technical introduction
ownR extended technical introduction
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Recently uploaded

GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
alishadewangan1
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 

Recently uploaded (20)

GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 

Toffee – A highly efficient, lossless file format for DIA-MS

  • 1. Non-Cognitive Predictors of Student Success: A Predictive Validity Comparison Between Domestic and International Students Creating an in-silico dilution series for software testing Combine either a Water or E.coli background, with peptides extracted from HEK293 foreground and analyse with OpenMSToffee Peptide-centric deep learning pipeline C++ RTree enables efficient peptide- centric access to raw data. Constructing large training sets of images for deep learning is trivial. A tool for manual annotation of peaks is included in python library. Target Decoy Lossless, open, efficient DIA-MS file format, same size as vendor files. OpenMSToffee direct replacement for OpenSWATH Enables Deep Learning pipeline Non-Cognitive Predictors of Student Success: A Predictive Validity Comparison Between Domestic and International Students INTRO • Yet another file format! • Optimised for storage and random IO. • HDF5 backed. Lossless compression. • RTree for spatial access queries. • Python & C++. MIT licensed. conda install –c cmriprocan –c plotly toffee docker pull cmriprocan/toffee:<version> METHODS 1. Convert m/z to uint32 using Maxwell eqn 2. Each measurement is a uint32 triplet of retention time, m/z, intensity 3. Store data as compressed row storage sparse matrix in HDF5 4. OpenMSToffee gives drop-in replacement for OpenSWATH and retains PyProphet workflow RESULTS • Same size as wiff file. • No difference in peptide matrix • Create in-silico dilution series by selectively combining two toffee files • Analyse data as image: • Crane – noise removal (see poster) • Deep Learning for peptide identification Toffee: A highly compressed, efficient file format for DIA David Clarke, Akila Seneviratne, Brett Tully Children’s Medical Research Institute 0 1000 2000 3000 4000 5000 0 5k 10k Background E.coli: SKPGAAMVEMADGYAVDR_3 Retention Time (s) XICMassovercharge MS1: 623.29 MS2: 680.34 MS2: 795.36 MS2: 460.25 MS2: 866.40 MS2: 871.43 MS2: 997.44 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 Foreground HEK293: SKPGAAMVEMADGYAVDR_3 Retention Time (s) XIC Massovercharge MS1: 623.29 MS2: 680.34 MS2: 795.36 MS2: 460.25 MS2: 866.40 MS2: 871.43 MS2: 997.44 0 1000 2000 3000 4000 5000 0 5k 10k E.coli + HEK293: SKPGAAMVEMADGYAVDR_3 Retention Time (s) XICMassovercharge MS1: 623.29 MS2: 680.34 MS2: 795.36 MS2: 460.25 MS2: 866.40 MS2: 871.43 MS2: 997.44