Typical Mass-use Pipelines Complex Challenges and Workflows
NGS (Next Generation Sequencing)
1. Total-RNA Analysis (RNA-seq, Non-Coding RNA, Repeats)
2. Epigenetics (CHiP-seq and Bisulfate-Seq)
3. Variant Calling
4. Microbiome (Metagenomics)
Mass Spec
1. Proteomics
2. Metabolomics
Structural Biology
1. Libraries of Small Molecules (Query, Clustering)
2. Docking (Including large molecules)
Machine Learning
1. Phenotypic Analysis and Modeling
2. Analysis of visual data
3. Standard Statistical methods
4. Integration of heterogenous data sets
CirSeq Mutation Analysis
1. Analysis of viral CirSeq data for precise mutation identification
2. Fitness of mutations reflecting viral adaptation
3. Identification of viral quasi-species
Mass Spec
1. Protein-protein Interactions between host and viral
proteins
2. Post translational modifications of host proteins
Structural Biology
1. Libraries of Small Molecules (Query, Clustering)
2. Docking (Including large molecules)
NGS host data
1. Host gene expression variations in response to
infectious quasi-species
T-BioInfo is a user-friendly computational platform that enables analysis and integration of big data.The challenge of mining -omics data for meaningful
patters that can be applied in biomedical and agricultural research as sequencing becomes cheaper and more precise. On the other hand, complex
networks of dependencies that define many conditions tend to require integration of huge heterogenous data sets from SNPs, gene expression, epigenetic
markers, proteomic and metabolomic profiles, even structural biology data. Our company has developed innovative and user friendly workflows for analysis
and integration of these different datasets. Now we are looking to test and commercialize a platform that provides web access to the platform.
Simple, Flexible and Consistent Interface Across All Sections
Integration of
analysis types
One environment
for all types of data
and analysis
“one-button”
approach to most
areas of analysis
• Flexible	analysis	pipelines	in	the	pla/orm	sec4ons	and	easy	to	perform	data	input	
• A	user	is	assisted	by	the	pla/orm	in	construc4ng	meaningful	algorithmic	pipelines	for	processing	data:	
modules	for	pipeline	con4nua4on	are	highlighted	by	black	background	and	yellow	4tle.
Analysis of Total RNA
Concept:	raw	total	transcriptome	reads	contain	informa4on	not	only	about	
expressed	splice	variants	(isoforms)	of	genes,	but	also	about	expressed	
transposons	and	regulatory	non-coding	RNAs.	The	complete	analysis	consists	
of	three	steps.	First,	the	reads	are	mapped	on	isoforms	in	order	to	get	
isoform	expression	levels.	Second,	previously	unmapped	reads	are	mapped	
on	known	repe44ve	elements	(RE)	and	non-coding	RNAs	in	order	to	get	their	
expression	levels.	Third,	the	rest	of	reads	are	processed	by	special	clustering	
(BiClustering)	in	order	to	get	new	expressed	RE	and	non-coding	RNAs	as	well	
as	their	expression	levels	under	applied	biological	condi4ons.	On	the	next	
stage,	data	integra4on	can	be	performed:	interplay	between	expressed	
isoforms,	transposons,	and	regulatory	RNAs.	
1
Detec4on	of	expressed	isoforms	and	their	expression	levels	
by	mapping	the	reads	on	constructed	transcripts
√
2 For	unmapped	reads: √
3
Detec4on	of	most	expressed	repeats	and	regulatory	
RNA	from	databases
√
4
BiClustering:	associa4ons	of	kmers	and	reads	as	a	
bicluster,		and	genera4on	of	Kchains	of	biclusters	
√
5 Extensions	of	Kchains	 ±
6
Mapping	of	NGS	reads	on	found	Kchains:	detec4on	of	
most	expressed	novel	transposons	and	regulatory	
RNAs
√
T-Bioinfo RNA-seq/chip section
Example: Expression of RepeatsAlgorithmic Approaches:
Analysis of
“Junk” RNA
Epigenetic Analysis: Bisulfite DNA Methylation and CHiP-Seq
Bisulfite	Concept:	bisulfite	sequencing	shows	T	instead	of	C	in	a	read	if	C	of	a	genomics	site	
(like	CpG)	is	methylated.	Thus,	detec4on	of	methylated	sites	and	genome	fragments	
enriched/depleted	by	methyla4on	is	based	on	special	type	of	read	mapping,	and	
segmenta4on	of	the	whole	genome	methyla4on	profile.	The	analysis	objec4ves	include	
special	mapping	algorithms	with	tolerance	of	the	T-to-C	mismatch,	sta4s4cal	es4ma4on	of	
the	per-site	methyla4on	level,	allele	specificity	of	DNA	methyla4on,	as	well	as	detec4on	of	
the	over-methylated		and	under-methylated	genomic	regions.	
CHiP-Seq	Concept:	detec4on	of	epigene4c	signals	such	as	histone	modifica4ons	of	different	types	and	DNA	
methyla4on	events	as	well	as	determining	protein/DNA	binding	sites	(TF	binding	sites)	are	performed	by	CHiP-seq	
and	CHiP-chip	experiments.	Analysis	of	profiles	of	these	whole	genome	signals	is	performed	by	the	genome	
segmenta4on	algorithms.	The	analysis	objec4ves	include	iden4fying	signal	enriched	genome	fragments	as	puta4ve	
epigene4c	events,	and	a	combina4on	of	enriched	fragments	on	posi4ve	and	nega4ve	strands	with	a	certain	
distance	between	them	as	the	TF	binding	event.	On	the	next	analysis	stage,	the	data	integra4on	can	be	performed:	
interplay	between	genome	muta4ons	and	epigene4c	signals	on	one	side	and	expressed	isoforms,	transposons,	and	
regulatory	RNAs	on	the	other	side.	The	network	of	gene	regula4on	by	a	transcrip4on	factor	can	be	reconstructed	
from	the	whole	genome	TF	binding	posi4ons	and	expressions	of	the	down-stream	genes.	Microarray	datasets	are	
transformed	into	pseudo	NGS	reads	and	are	analyzed	by	the	same	CHiP-seq	pipelines.	
T-Bioinfo CHiP-seq section
1 Preprocessing	of	raw	data	 √
2
Mapping	of	NGS	reads	by	bisulfite	mapping	algorithms:	no	penalty	
for	T(read)-to-C(genome)	mismatches
√
3
Detec4on	of	the	DNA	methylated	posi4ons	and	their	scores	by	the	
confidence	interval	method
√
4 Allele	specificity	of	the	methyla4on	in	a	posi4on. -
5
Detec4on	of	over-methylated	and	under-methylated	genomic	
intervals	by	the	segmenta4on	algorithms
±
6
Detec4on	of	differen4al	DNA	methyla4ons	(individual	posi4ons	and	
intervals)	between	contras4ng	condi4ons
±
Virology Pipeline
Mutation Fitness
Genome-wide fitness calculations enabled by CirSeq,
combined with structural information, can provide
high-definition, bias-free insights into structure-function
relationships, potentially revealing novel functions for
viral proteins and RNA structures, as well as nuanced
insights into a viral genome’s phenotypic space. Such
analyses have the power to reveal protein residues or
domains that directly correspond to viral functional
plasticity and may significantly inform our structural
and mechanistic understanding of host–pathogen
interactions.
Integration of Heterogenous Data sets
Concept:	mutual	associa4on	of	features	of	biological	datasets	is	most	substan4al	part	for	
integra4on	of	several	analyses	of	biological	projects	in	one	story.	We	are	sugges4ng	several	
techniques	for	such	associa4ons.			
Matching	of	metabolite	and	SNP	profiles	
according	to	LB’s	selection	of	SNPs
Patent Pending Technology
for Drug Discovery
Fast screening and clustering of small molecules based on
physico-chemical similarity (70-100 times faster than industry
standard)
Small Molecule Candidate
Identifying a biologically active molecule
(Polio)
Patent Pending: Ref. P-78368-US | App. No. 14/625,785 entitled
SYSTEMS AND METHODS OF IMPROVED MOLECULE SCREENING
Computational analysis of small molecules can be roughly divided into three sections: pre-
processing analysis, virtual screening methods, and clustering.The aim of the conformer
generation process is to build a set of representative conformers that covers the conformational
space of a given molecule.There are two main classes of virtual screening methods: similarity-
based methods (descriptor-based screening; geometric querying; shape-based querying;
fingerprints) and receptor-based methods (docking). One of the greatest challenges of docking
software is to consider protein flexibility.These macromolecules are not static objects and
conformational changes are often key elements in ligand binding. T-Bioinfo provides a number of
proprietary methods that can be combined into pipelines for drug discovery.
Tauber Bioinformatics
Research Center
Tauber Bioinformatics Research Center at the University of Haifa
has a proven track record in Bioinformatics with scientific
collaborations with Hospitals, top US Universities, involvement in
government-funded projects, and multiple publications in
leading journals such as Science and Nature.
Pine Biotech holds an exclusive license for commercialization of
tools developed at the TBRC for research, industry applications
and education. The startup is located at the BioInnovation Center
in New Orleans, LA. In collaboration with TBRC staff, Pine Biotech
is completing several pilot projects to validate our approach.
Aleph
Therapeutics‫א‬
Early Adopters and Collaborators:

T-bioinfo overview

  • 2.
    Typical Mass-use PipelinesComplex Challenges and Workflows NGS (Next Generation Sequencing) 1. Total-RNA Analysis (RNA-seq, Non-Coding RNA, Repeats) 2. Epigenetics (CHiP-seq and Bisulfate-Seq) 3. Variant Calling 4. Microbiome (Metagenomics) Mass Spec 1. Proteomics 2. Metabolomics Structural Biology 1. Libraries of Small Molecules (Query, Clustering) 2. Docking (Including large molecules) Machine Learning 1. Phenotypic Analysis and Modeling 2. Analysis of visual data 3. Standard Statistical methods 4. Integration of heterogenous data sets CirSeq Mutation Analysis 1. Analysis of viral CirSeq data for precise mutation identification 2. Fitness of mutations reflecting viral adaptation 3. Identification of viral quasi-species Mass Spec 1. Protein-protein Interactions between host and viral proteins 2. Post translational modifications of host proteins Structural Biology 1. Libraries of Small Molecules (Query, Clustering) 2. Docking (Including large molecules) NGS host data 1. Host gene expression variations in response to infectious quasi-species T-BioInfo is a user-friendly computational platform that enables analysis and integration of big data.The challenge of mining -omics data for meaningful patters that can be applied in biomedical and agricultural research as sequencing becomes cheaper and more precise. On the other hand, complex networks of dependencies that define many conditions tend to require integration of huge heterogenous data sets from SNPs, gene expression, epigenetic markers, proteomic and metabolomic profiles, even structural biology data. Our company has developed innovative and user friendly workflows for analysis and integration of these different datasets. Now we are looking to test and commercialize a platform that provides web access to the platform.
  • 3.
    Simple, Flexible andConsistent Interface Across All Sections Integration of analysis types One environment for all types of data and analysis “one-button” approach to most areas of analysis • Flexible analysis pipelines in the pla/orm sec4ons and easy to perform data input • A user is assisted by the pla/orm in construc4ng meaningful algorithmic pipelines for processing data: modules for pipeline con4nua4on are highlighted by black background and yellow 4tle.
  • 4.
    Analysis of TotalRNA Concept: raw total transcriptome reads contain informa4on not only about expressed splice variants (isoforms) of genes, but also about expressed transposons and regulatory non-coding RNAs. The complete analysis consists of three steps. First, the reads are mapped on isoforms in order to get isoform expression levels. Second, previously unmapped reads are mapped on known repe44ve elements (RE) and non-coding RNAs in order to get their expression levels. Third, the rest of reads are processed by special clustering (BiClustering) in order to get new expressed RE and non-coding RNAs as well as their expression levels under applied biological condi4ons. On the next stage, data integra4on can be performed: interplay between expressed isoforms, transposons, and regulatory RNAs. 1 Detec4on of expressed isoforms and their expression levels by mapping the reads on constructed transcripts √ 2 For unmapped reads: √ 3 Detec4on of most expressed repeats and regulatory RNA from databases √ 4 BiClustering: associa4ons of kmers and reads as a bicluster, and genera4on of Kchains of biclusters √ 5 Extensions of Kchains ± 6 Mapping of NGS reads on found Kchains: detec4on of most expressed novel transposons and regulatory RNAs √ T-Bioinfo RNA-seq/chip section Example: Expression of RepeatsAlgorithmic Approaches: Analysis of “Junk” RNA
  • 5.
    Epigenetic Analysis: BisulfiteDNA Methylation and CHiP-Seq Bisulfite Concept: bisulfite sequencing shows T instead of C in a read if C of a genomics site (like CpG) is methylated. Thus, detec4on of methylated sites and genome fragments enriched/depleted by methyla4on is based on special type of read mapping, and segmenta4on of the whole genome methyla4on profile. The analysis objec4ves include special mapping algorithms with tolerance of the T-to-C mismatch, sta4s4cal es4ma4on of the per-site methyla4on level, allele specificity of DNA methyla4on, as well as detec4on of the over-methylated and under-methylated genomic regions. CHiP-Seq Concept: detec4on of epigene4c signals such as histone modifica4ons of different types and DNA methyla4on events as well as determining protein/DNA binding sites (TF binding sites) are performed by CHiP-seq and CHiP-chip experiments. Analysis of profiles of these whole genome signals is performed by the genome segmenta4on algorithms. The analysis objec4ves include iden4fying signal enriched genome fragments as puta4ve epigene4c events, and a combina4on of enriched fragments on posi4ve and nega4ve strands with a certain distance between them as the TF binding event. On the next analysis stage, the data integra4on can be performed: interplay between genome muta4ons and epigene4c signals on one side and expressed isoforms, transposons, and regulatory RNAs on the other side. The network of gene regula4on by a transcrip4on factor can be reconstructed from the whole genome TF binding posi4ons and expressions of the down-stream genes. Microarray datasets are transformed into pseudo NGS reads and are analyzed by the same CHiP-seq pipelines. T-Bioinfo CHiP-seq section 1 Preprocessing of raw data √ 2 Mapping of NGS reads by bisulfite mapping algorithms: no penalty for T(read)-to-C(genome) mismatches √ 3 Detec4on of the DNA methylated posi4ons and their scores by the confidence interval method √ 4 Allele specificity of the methyla4on in a posi4on. - 5 Detec4on of over-methylated and under-methylated genomic intervals by the segmenta4on algorithms ± 6 Detec4on of differen4al DNA methyla4ons (individual posi4ons and intervals) between contras4ng condi4ons ±
  • 6.
    Virology Pipeline Mutation Fitness Genome-widefitness calculations enabled by CirSeq, combined with structural information, can provide high-definition, bias-free insights into structure-function relationships, potentially revealing novel functions for viral proteins and RNA structures, as well as nuanced insights into a viral genome’s phenotypic space. Such analyses have the power to reveal protein residues or domains that directly correspond to viral functional plasticity and may significantly inform our structural and mechanistic understanding of host–pathogen interactions.
  • 8.
    Integration of HeterogenousData sets Concept: mutual associa4on of features of biological datasets is most substan4al part for integra4on of several analyses of biological projects in one story. We are sugges4ng several techniques for such associa4ons. Matching of metabolite and SNP profiles according to LB’s selection of SNPs
  • 9.
    Patent Pending Technology forDrug Discovery Fast screening and clustering of small molecules based on physico-chemical similarity (70-100 times faster than industry standard) Small Molecule Candidate Identifying a biologically active molecule (Polio) Patent Pending: Ref. P-78368-US | App. No. 14/625,785 entitled SYSTEMS AND METHODS OF IMPROVED MOLECULE SCREENING Computational analysis of small molecules can be roughly divided into three sections: pre- processing analysis, virtual screening methods, and clustering.The aim of the conformer generation process is to build a set of representative conformers that covers the conformational space of a given molecule.There are two main classes of virtual screening methods: similarity- based methods (descriptor-based screening; geometric querying; shape-based querying; fingerprints) and receptor-based methods (docking). One of the greatest challenges of docking software is to consider protein flexibility.These macromolecules are not static objects and conformational changes are often key elements in ligand binding. T-Bioinfo provides a number of proprietary methods that can be combined into pipelines for drug discovery.
  • 10.
    Tauber Bioinformatics Research Center TauberBioinformatics Research Center at the University of Haifa has a proven track record in Bioinformatics with scientific collaborations with Hospitals, top US Universities, involvement in government-funded projects, and multiple publications in leading journals such as Science and Nature. Pine Biotech holds an exclusive license for commercialization of tools developed at the TBRC for research, industry applications and education. The startup is located at the BioInnovation Center in New Orleans, LA. In collaboration with TBRC staff, Pine Biotech is completing several pilot projects to validate our approach. Aleph Therapeutics‫א‬ Early Adopters and Collaborators: