SlideShare a Scribd company logo
Market Report: The Bioinformatics Services
Industry
October 15th
2016
Table of Contents
I. Introduction
i. General overview
II. Industry Overview
i. Table of bioinformatics companies
III. Categories of Bioinformatics Companies
i. Drug Discovery
ii. Personalized Medicine
iii. Preventative Medicine
iv. Data Management
IV. Macroscopic Trends
i. Summary of market reports
V. Current Adhesive Customer Profile
i. Overview of customer types
VI. Top Company Overviews
i. Illumina
ii. Accelrys
iii Schrödinger
VII. Preliminary Recommendations
i. Summary
ii. Recommendation on how resources should be distributed
I.
Introduction:
In the world economy, there has been an increase in bioinformatics companies. This has
most obviously occurred due to the need for improve research methods, improving drug
development efficiency and, generally, the advancement human knowledge. Part of this growth
is due to the ability for the human genome to be sequenced using Next Generation Sequencing
(NGS). With the abundance of information such as billions of base pairs, it is no wonder that
bioinformatics is a growing industry both in size and complexity. And base pair sequences are
not just the input to this algorithm— we have recently seen a shift from genomics to proteomics.
This is exemplified by the ENCODE project which seems to find out the regulatory proteins that
exist in a living organism, which is equal or of greater importance than just coming to understand
simple sequences. And there are of course countless other inputs that are used in research today
that have informational outputs. For example: protein sequence and conformation, chemical
formulas and structure; regulatory proteins and binding sites, and differential genomes and
differently expressed genes. In short, there is an increase in the complexity in this bioinformatics
sector. It is therefore the purpose of this paper to give an overview of the highly diverse
bioinformatics sector in the hopes that some patterns can be elucidated and suggestions to current
issues can be addressed. In effect, this report should give someone who has relatively no
knowledge of the industry enough information for him or her to construct a very good picture of
the past, current, a future face of bioinformatics.
First of all, let us define bioinformatics. The most basic definition is the use of computers
to manage or manipulate biological information. We are not talking about simple calculations in
this context, but the use of computers that have unique algorithms with unique inputs and unique
outputs to typically deal with information that cannot be handled by a single person. And so we
see that bioinformatics is not well defined by this definition— there are infinite processes that
fall under the described categories. And so, we tend to talk about bioinformatics in a general
sense, not in the particular. But we must remember at the same time the particular is important as
well. The particular is where innovation comes from and it is therefore very important. This
paper cannot talk about all particulars, but it will mention those that are important. The general
trends, although connected to the particular, are what this paper will focus on.
The different general group of bioinformatics classifications are: personalized medicine,
drug discovery, preventative medicine, and agronomics. Within these categories are sequence
analysis, genome analysis, pharmacodynamics, structure activity analysis, clinical trials
management, and diagnostics. There are a variety of categories that exist. Below is a diagram of
the major categories to the biotechnology sector adapted from Markets and Markets.
Figure 1: Bioinformatics Industry Breakup
The obvious benefits to bioinformatics is a decrease in the cost to do research and
development. By optimizing processes and the algorithms that form he basis of bioinformatics,
cost to research and development can be greatly reduced. Many processes can be streamlined.
Though, the degree of efficiency that is obtained is completely dependent on the validity of the
algorithms. This puts a high amount of dependency on new, efficient, and accurate algorithms.
There has been a steady increase in the type of algorithms and the number of algorithms in the
bioinformatics sector, as evidences by the increase of the number of products and the increase in
the number of tracks these products can perform.
In the context of the drug development industry, we see that there are massive research
and development costs for a single drug. This is due to the elimination of a high degree of
variables to find the specific small molecule that exists to perform a function. Money is spent on
target identification and target validation. After that, a molecule needs to be found that binds
directly to the target. And the that drug need to be safe and effective. And so we see that the
main functions of the drug discovery process is to cut down the variation that exist in an
organism— to take the discrete out of the continuous. And this is very much done via trail and
error and via empirical methods. There really is no rational method to drug design. Even in using
the current bioinformatics tools for structure analysis, the algorithms are not accurate to a high
degree. And so, once again the effectiveness is dependent on the validity of the algorithms. And
so there must be a more accurate algorithm that could furthermore increase the efficiency.
Another issue that comes up in the drug companies is the development of novel
therapeutics that bring a substantial value to the market. It seems at though the number of drugs
that are approved currently are for diseases that are already well managed. And it seems that the
“new” meaning in the term “new drugs” referred merely to the drug itself, not the effect of the
Bioinformatics Industry Breakup
NGS Cheminformatics Scientific Software MicroArray Biological Data Others
drug on the patient. And so there must be a method in which we can use bioinformatics to help
this dilemma. There must be a way that new, novel, and revolutionary drugs are discovered
verses the traditional methods. The answer may depend on organizing signaling cascades,
finding new regulatory proteins, or using bioinformatics to obtain information from raw
sequences. In these contexts, accomplishing these goal just comes down to the input or the
correct data efficiency of the algorithms. Below is a diagram from ResearchGate.
There is also a bubble that does exist in the market— an astonishing overvalue of some
companies that have little to know value. This in turn, multiplied by a variety of companies,
gives the illusion that the market is growing. This overvaluation is quite necessary though for
development as investors would not be able to put their money down if there were not optimistic
about the company’s ability. And progress would not be made. A simple way to measure the
ability for pharmaceutical and medical device company’s ability to perform would be a year after
year look at the fraction of money invested and the revenue generated. In so much we can do this
and as we see below, that this ratio tends to remain constant. There is therefore little growth of
the industry in this context. The diagram is adapted from Statista.com.
The objective of this article is to once again to give an overview of the bioinformatics
industry and pinpoint certain pressure points that needs to be considered. The alleviation of stress
at these pressure point should make the system more efficient, or at least give a foundation for
new innovation. The industry is like an artist who paints a painting. In order to sell the painting
certain requirements must be made such as sufficient quality an innovation. The only way that
the paintings can reach this quality is through either experience or knowledge. And so this paper
seeks to give the painter knowledge so he can know what to paint and now how to do it. This
way, the artist fulfills his role as an artist and a valid painting is created. Similarly, the
bioinformatics industry has many projects that need to be completed and certain projects hold
more weight than others.
II. The bioinformatics industry overview
There are a variety of bioinformatics companies that exist. This paper focuses on
companies who offer at lead on bioinformatics service to a customer. That means that we are not
focusing on internal bioinformatics processes that occur out of a company’s own research and
development budget. Open source bioinformatics resources are not considered part of the biotech
industry. They are, however, of importance. The makeup of this sector will be mentioned later in
the article in the context of discussion innovations in bioinformatics. The bioinformatics
companies are organized alphabetically. Certain categories that exist are: database management
and maintenance, biobank maintenance, NGS diagnostics, lab management systems, sequence
analysis, work flow development, comparative genomics, pathway analysis, life science research
software, proteomics, modeling of complex systems, cheminformatics, drug discovery, SAR,
clinical trials management, and imaging software.
Company
Name
Year
Established
Main
Competencies
Estimated
Revenue/
Public or
Private
Notes
0
10
20
30
40
50
60
70
Year 2006 2007 2008 2009 2010 2011 2012 2013 2014
Phizer Revenue vs. R & D Costs
Revenue M USD Research and Development in M USD
Valuation
Accelrys 2001 Workflow
Integration/
Automation/ Data
Sharing/Enterprise
Management
155 m/
750 m
Private Acquired by
Dassault in
2014
Advaita 2014
Genetic Analysis,
Pathway Analysis,
Variant Analysis
Unreported/
Unreported
Private National
Cancer
Institute
Adapts
Pathway
Program
Advanced
Brain
Monitoring
2001
Monitoring Brain
Function
Unreported/
Unreported
Private Heavily
funded by
NIH
Agile Bio 2002
Laboratory
Management and
Collaboration
Platform
Unreported/
Unreported
Private
Annai Systems 2010
NGS analysis Unreported/
Unreported
Private
Applied Maths 1992 Analyses of a
laboratory assays
Unreported/
Unreported
Private BioNumerics
Suite
ArcherDx 2013
NGS for cancer
treatment
Unreported/
Unreported
Private
Astrid
Bioscience
2014
Research Data
Management,
NGS, Banking
Unreported/
Unreported
Private
Atomwise 2012
AI to improve
drug discovery
Unreported/
Unreported
Private Ebola
treatment
BaseHealth 2011
Predictive
analytics based on
compiled
scientific data
Unreported/
Unreported
Private
Berg 2014
Healthcare
optimization
through AI
Unreported/
Unreported
Private
Biobam 2011 Functional
analysis of
genomes
Unreported/
Unreported
Private
Biobase 1997 Molecular
Biology Databases
Unreported/
Unreported
Private Knowledge
process
outsourcing
Biodiscovery 1997
Genomic Analysis Unreported/
Unreported
Private
BioMax 1997 Tissue Arrays Unreported/
Unreported
Private
Bioxing 2013
Library for DNA
and protein
Microarrays
Unreported/
Unreported
Private
Cellcion 2016
Flo Cytometry
Software
Unreported/
Unreported
Private
Certara 2008
PK/PD Analysis Unreported/
Unreported
Private
CLC BIO 2005
Microbial genetics
and sequence
analysis
Unreported/
Unreported
Private
Clondiag Chip
Technologies
2006
Microarrys Unreported/
Unreported
Private
Cognuse
Neuroscience
2013
Cognitive rehab
program
Unreported/
Unreported
Private
Compugen 1993
Drug discovery
through data
analytics
Unreported/
Unreported
Public Focus on
cancer and
autoimmunity
Cytel 1987
Statistical
software
Unreported/
Unreported
Private
Data MATRIX 2009
Clinical Research
Management
Software
Unreported/
Unreported
Private
Definiens 1994
Tissue Image
Analysis
Unreported/
Unreported
Private
Desktop
Genetics
2012
CRISPR Design
and Analysis
Unreported/
Unreported
Private
DNA Star 1984 Sequence analysis Unreported/
Unreported
Private
Eidogen-
Sertanty
2003
Software and
database for drug
discovery
Unreported/
Unreported
Private
eneData 1997
Life Science
Research
Unreported/
Unreported
Private
Envisagenics 2014
Predictive
Analytics
Unreported/
Unreported
Private
ePitope
Informatics
2001
Antibody epitope
discovery
Unreported/
Unreported
Private
FreeThink
Technologies
2011
Stability Studies Unreported/
Unreported
Private
Genamics 2003
Online journals Unreported/
Unreported
Private Largest free
journal
database
GenCAD 1997 Compilation of
annotated
genomes
Unreported/
Unreported
Private
Gene Codes 1988 Sequence and
microarray
Unreported/
Unreported
Private
Gene Data 1997 Automated
workflows
Unreported/
Unreported
Private
Gene Talk 2011 Sequence analysis Unreported/
Unreported
Private
Genestack 2011
Bioinformatics
organizer
Unreported/
Unreported
Private
Geneva
Bioinformatics
(GeneBio)
1997
Life Science
Software
Unreported/
Unreported
Private
GenoLogics
(Illumina)
2002
Laboratory
Management
System
20m/Unrep
orted
Private
Genomatica,
Inc.
2000
Manufacting
process
Unreported/
Unreported
Private Gene BDO
licensed to
BASF
Genomatix
Software
GmbH
1997
Literature and
genome editing
Unreported/
Unreported
Private
Genome Life
Sciences
2004
Biomarker
Discovery
Unreported/
Unreported
Private
Genostarr 2004 Microbial
Research
Unreported/
Unreported
Public-private
Geospiza, Inc. 1997
Genetic Analysis
Workflow
4.5m/
Unreported
Private
goBalto 2008
Clinical Trials
Mangemnt
Unreported/
Unreported
Private
Health
Discovery
Corporation
Home Page
2004
Diagnostics vis
Artificial
Intelligence
10.5m
/Unreported
Public
HeartVista 2006
Imaging
Technologies
Unreported/
Unreported
Private
Illumia 1998 NGS Systems 2.2 B /15 B Public
INCOGEN,
Inc.
2000
Workflows and
Data Management
Unreported/
Unreported
Private
Informagen 1992
Sequence
Analysis
Unreported/
Unreported
Private
Ingenuity
Pathways
Analysis
1998
Pathway analysis Unreported/
Unreported
Private IPA System
Inte:Ligand 2003 Drug Discovery Unreported/
Unreported
Private
Integrated
Proteomics
Applications
2008
Analysis of
proteins
Unreported/
Unreported
Private
Integromics 2002 Data management
for proteomics or
genomics
Unreported/
Unreported
Private
inviCRO 1964
Diagnostics
through MRI
Unreported/
Unreported
Public
InVitrogen 1987 Drug discovery
and genomics
Unreported/
Unreported
Public
IO Informatics 2003
Data Management Unreported/
Unreported
Private
Kelaroo 2000
Inventory
Management
Unreported/
Unreported
Private
Lab7 2012
Enterprise System
for NGS
Unreported/
Unreported
Private
LabVantage
Solutions
1981
Laboratory
Management
Unreported/
Unreported
Private
MAcVector 2007 Sequence design
and analysis
Unreported/
Unreported
Private
MasterControl 1993
Enterprise
management
Unreported/
Unreported
Private
Metabolic
Explorer
2007
Bacteria
manufacturing
53m/Unrep
orted
Public
MetaHelix Life
Sciences
2001
Agricultural/GMO Unreported/
Unreported
Private
Metalife AG 2000
Compilation of
data
Unreported/
Unreported
Private
MicroDiscover
y
2000
Personalized
medicine and
research
Unreported/
Unreported
Private
Molecular
Connections
2001
Drug discovery
via database
Unreported/
Unreported
Private Over 1000
employees
NextBio 2005
Knowledge
sharing
Unreported/
Unreported
Private
NimbleGen
Systems, Inc.
1999
Next gen
optimization
Unreported/
Unreported
Private
NorayBio 2002
Bioscience
software
Unreported/
Unreported
Private
Novocraft 2008
NGS Analysis Unreported/
Unreported
Private
Ocimum Bio
SOlutions
2000 Genomics and
laboratory
management
Unreported/
Unreported
Private
Optibrium 2009
Small Molecule
Design
Unreported/
Unreported
Private
Origent Data
Sciences
2013
Data Analytics Unreported/
Unreported
Private
Premeir
Biosoft
International
1994 Life science
research solutions
Unreported/
Unreported
Private
Qlucore 2007 Data Analysis Unreported/
Unreported
Private
Quaigen 1984 Molecular
Diagnostics
210m/Unre
ported
Private
Quantiom 2003
Statistical data
analysis
Unreported/
Unreported
Private
QuantumBio 2004
Data analysis Unreported/
Unreported
Private
Quartzy 2009
Lab Inventory Unreported/
Unreported
Private
QuesGen
Systems, Inc.
2004
Clinical Trails
Management
Unreported/
Unreported
Private
Sapio Sciences 2004
Laboratory
Management
System
Unreported/
Unreported
Private
Schrödinger 1990
Molecular
Simulation
Unreported/
Unreported
Private
Seascape
Learning
1997
Molecular
modeling
Unreported/
Unreported
Private
Sequencing.co
m
2014
Storage of genetic
data and analysis
Unreported/
Unreported
Private
SilcsBio 2012
Protein Modeling
and Drug
Discovery
Unreported/
Unreported
Private
SimBioSys 2001 3D Modeling Unreported/
Unreported
Private
SoftGenetics 2001
Genetic Analysis Unreported/
Unreported
Private
SOLABS 1999
Quality
Management
System
Unreported/
Unreported
Private
SolveBio 2013
Precision genetics,
personalized
medicine
Unreported/
Unreported
Private
SRA
International
1978 IT for government Unreported/
Unreported
Public
Station X 2010
Enterprise
software
Unreported/
Unreported
Private
Strand Life
Sciences
2000 Sequence analysis Unreported/
Unreported
Private
Synamatix Sdn
Bhd
2001
Sequence analysis Unreported/
Unreported
Private
Synergistix 1997
Customer
relationship
management
Unreported/
Unreported
Private
TimeLogic 1981 Sequence
Analysis
Unreported/
Unreported
Private
twoXAR 2014
Drug Discovery Unreported/
Unreported
Private
Veeva Systems 2007
Cloud computing Unreported/
5B
Public
VeraChem 2000
Drug Discovery
Via Modeling
Unreported/
Unreported
Private
Verily
(Google)
2015
Big Data for Life
Sciences
Unreported/
Unreported
Public
Veritomyx 2009
Spectral Analysis Unreported/
Unreported
Private
The majority of bioinformatic companies are privately held, have less than twenty
employees, and offer consulting services. There are a couple stand out companies that are
substantially larger such as Certara or Schrodinger. The larger companies offer a much more
comprehensive software options and their software is considered an industry standard. The
smaller companies tend to deal with small niche product lines, versus the general product lines.
There is therefore a very real chance for acquisition for the smaller companies depending how it
relates to larger company’s product lines.
In summary, the bioinformatics industry is moderately difficult to navigate. There have
been considerable shifts in the industry in recent years due to the decrease cost of NGS. This has
allowed companies to be formed that solely deal with NGS analysis and the storing of NGS data.
Additionally, a trend that is occurring is the use for artificial intelligence or machine learning in a
multitude of contexts. In these types of systems, quality results can be ensured by a positive or
negative feedback mechanism. Atomwise, a recent company that uses this process and has
rapidly found a possible treatment for Ebola. There are also macroscopic trends that exist such as
the transition to more personalized medicine or preventive medicine. Bioinformatics and the use
of these technologies are applicable to these fields as well.
For any company that enters this race to find a valuable product, there is competitiveness
is only as good as their algorithms or minimum viable product. In so much, many companies that
obtain funding by selling this idea are ultimately dependent on developing their technology. Let
us, in the nest section, examine what barriers these companies face.
III. Bioinformatics Company Categories
i. Drug Discovery
The drug discovery industry faces many barriers to discovering viable drugs. The
traditional method to rational drug discovery depends on how well a target can be validated and
how well a small molecule can be used to inhibit the target. In the lab, this is extremely costly in
terms of money and time. The assays vary from reporter assays, to microarrays, to comparative
genomics, to gene knockouts or add-ins. High Throughput Screening (HTS) is also very costly,
but can overlook target identification to find an effective drug. There has been a substantial
increase in the use of in silico methods to find targets. This however, is not an easy undertaking.
Comparative genomics really fails to find targets. Comparative proteomics is much more
effective. The reason is because comparative genomics lacks crucial information about how the
genome is expressed. Gene expression is one of the most complex and highly regulated process
in particular and in a general sense. And so differential proteomics is much better because
knowledge requires to know what genes are expressed can be overlooked. The only way for
comparative genomics to surpass comparative proteomics would be to accurately model the
intense regulatory processes for genes. And these processes may have variability due to the
genetic diversity that exist between humans. And so, for this barrier to be overcome, there would
need to be a model or formula that exist that could readily express what genes are transcribed.
On top of this, the interactions between proteins and signaling cascades would have to be
modeled. This is currently being done, but not by rational methods, rather, but empirical
methods. For example, Advaita uses compiled research articles to form models for important
biological pathways.
And after a target has been discovered, the assays that follow to find the most effective
drug that binds to a target is very much a process. In traditional methods, this is done via
reported assays or some variation of high throughput screening. In the context of in silico
methods, this process is difficult because of the sheer number of combinations of active
compounds. There are a variety that will theoretically bind. The difficulty lies in the finding the
select few that have the proper effectiveness, ADME profile, and the proper cytotoxicity level
when applied to in vivo systems. These process can also be modeled in silico. And so the
inefficiency seems to come from the most technologically advanced molecular modeling
software is not necessarily finding the right molecule, but it is narrowing it down to find the best
molecule.
Validation reports should exist to show how well the software selects a potential drug
candidate, but are quite hard to find. Theoretically, an approved compound should be traced back
to the SAR relationship with the target and the lack of SAR with other important targets. The
combination of these two relationships would create compound that is both effective and has
proper ADME and cytotoxicity profiles. Currently, in the market, there is no software that can
accurately predict the SAR relationship with the essential proteins of the human genome. The
data required to do this would be immense and there would have to be a mathematical or logical
approach to overcoming this barrier in the future.
The current drug modeling software used integrated Newtonian dynamics along with
quantum mechanics. Ultimately, what is mapped out in the context of molecular modeling
software is the energy states and energy profiles of a molecule. The equations that exist here,
However, are not accurate. They are at best, a “best guess” and require significant modification
to adhere to accuracy. One would suggest that the code that does exist for this software doe have
an empirical aspect to it in sol much that the codes is not completely rational. The proper
question to ask is for the way to overcome this discrepancy would be: So why do the equations
not work? The reason is that the mathematical basis for the equations are not accurate of course.
There would need to be a technical advancement in how molecules are modeled— there would
have to be an advancement in molecular modeling. This is very much the barrier to technical
innovation that many drug discovery companies do face.
And there are extremely clever ways that the companies try to adjust their models to be
more accurate. For example, Atomwise used artificial intelligence to narrow down SAR
relationships. In this company, we can assume that the software has a build in trial and error
system that allows for the algorithm to be continuously modified and improved while different
compounds are tested. This is the cleverest way that drug development can be improved and
explains why the company is backed by such solid investors.
In summary, the limitations that companies have that can be overcome via technological
advantage are 1) Understanding and modeling the regulation of gene expression, 2) Using a
molecular dynamic theory that works with a high degree of efficiency in silico. 3) Being able to
understand and create novel pathways 4) Modeling the interactions between drugs and proteins
to accurately predict cytotoxic effects and ADME profiles. Ultimately, all of these points come
down to the fact that molecular dynamics needs to be revolutionized. This would accurately
predict all microscopic interactions that exist in the cell and allow biologist to understand
everything that exists between genotype and phenotype that is not currently known. A possible
way to overcome this barrier is to reevaluate the model of an atom, for that is the general input
for the molecular dynamic theory. If these barriers are overcome, there is potential for enormous
value, not just to the pharmaceutical industries mass inefficiencies, but the efficacy of healthcare
to treat disease.
II. Personalized Medicine
The concept of personalized medicine has existed for many years. But until recently has
it only begun to merge with the field of bioinformatics. This is of course due to the existence of
NGS and the huge amount of data that follows. Personalized medicine is therefore very much
based on sequence analyses of whole genomes. By comparing and contrasting genomes of
different people, each patient can be placed into a certain group that requires a certain treatment.
The idea of personalized medicine involves numerous applications. For example, in
cancer, a patient’s genome or even the cancer genomes can be analyzed to optimize treatments.
But even in this context, the benefit of personalized medicine is not completely lucid. It is hard
to prove that is works in all contexts. There are considerable challenges that personalized
medicine faces sue to the inability to organize all the genetic variations in humans and the
treatments that these variations suggest.
And so there is the technique of using genetic analysis in a case by case context and
dependent ton disease types. And through time the compilation of this data should theoretically
lead to personalized medicine that is highly accurate and effective.
However, let’s reiterate the fact that personalized medicine is hard to create. The
variations that exists between genomes is quite difficult to understand. One would assume that
one genetic marker would be all that is required to have an implication. This is not necessarily
the case for some diseases as there can be multiple markers. In addition, there are environmental
factors that cause disease that have to be integrated for proper diagnosis. And so personalized
medicine becomes convoluted— it is merely a statistic that has a probability of being correct.
Despite its shortcomings, personalized medicine has progressed to a point where is does offer
significant value to patients. However personalized medicine can be optimized to a different
level if new algorithms and more data is collected.
III. Preventative Medicine
Preventative medicine also has existed for a very long time, even before bioinformatics
became relevant. Currently, privative medicine seems to identify patients who are at risk and
alter their lifestyle to mitigate their risk. In modern day medicine, the at risk patients tend to be at
risk for cardiovascular disease or heart attack. Obesity is one of the facts that can cause this
complication and many other health issues. Therefore, changing diet and increasing exercise is
the most common prescription for preventative medicine. Bioinformatics can and does apply to
this regimen of weight loss, but bioinformatics seeks to go further.
Cancer is one of those things that is much easier to be treated in its early stages. One of
the risk factors is of course age, but there are a variety of genetic and ontological risk that can be
observed. Bioinformatics can be used to home in on these slight changes and increase the change
of diagnosing cancer in the early stages. Traditional methods in those field used for preventative
medicine are annual checkups and they work quite well. However, current bioinformatic
companies are trying to use sequence analysis and biosensors to detect cancer even earlier. One
example of these companies is Verily. The certain barriers to having this type of technology
implemented are of course obtaining the hard data that is necessary to make a prediction and of
course the difficulty in developing biosensors that can continuously monitor patients. In so
much, the idea dog this type of preventative medicine is very similar to personalized medicine in
that there seems to be a gap between the known data and the requires data to make a significant
contribution to science or healthcare.
IV. Data Management
Data management is a general term that can apply many industries. In the context of the
biological sciences, there is a need to accurately store and communicate data between people.
Science is become increasingly collaborative and this certainly plays a role in the needs for
bioinformatics software that is easy to use and allows for easy communication. The large about
of data that can be obtained by new assays such as NGS poses problems that some companies
face. There is, respectively, bioinformatics companies that specialize in these aspects of the
industry. There are companies that can store large amount of data in a cloud-based form and
there are companies that have a platform for communication of both private and public data.
The competitive barriers that exist to improving data management are, in line with all aspects of
the bioinformatics industry, are novel algorithms, adherence to customer needs, and optimization
of architecture. Data management is such a general term that it is a prerequisite that this form of
bioinformatics has many forms. And in fact, there are a lot of bioinformatic companies in this
niche that are competing with each other. In order to gain a competitive advantage, these
companies must adhere to the competitive barriers mentioned above.
VIII: Customer profile
The customer profile of the bioinformatics industry is heterogeneous. The largest customers of
course are large biotech companies whose mission is drug discovery. Secondarily, there is a great
deal of customer basis in the academic research arena. Thirdly, there is the healthcare sector in
where products are sold directly to clinicians from bioinformatic companies. There are of course
customers that exist in the private sector are also less known. For example, in the agricultural
context of biotechnology, there is a great need of bioinformatics to produce genetically modified
organisms. Consequently, Monsanto would be a good customer for the bioinformatics industry.
However, a variety of the bioinformatics processes are done internally in a research and
development context. For this particular customer, certain value could be found if they
outsourced their informatics to a professional company that had strategic industry knowledge.
Below is a graph of the varied customer profiles in the industry. Below is an industry report from
Grandview research.
In summary, the bioinformatic industry is quite heterogeneous and highly specialized.
There exist not companies that have highly distinguished products, but rather a high number of
companies with similar products. The size of the company is not necessarily due to the
effectiveness of the product, but the relationships that exist between customer and the company.
What forms the basis of this relationship is tough to describe in particular, but there must be
some social aspect that make one company have a high affinity for a bioinformatics company. In
the largest companies, there is a year after year rapport that must develop. In addition, companies
that re larger and have more services also tend to have more customers and larger revenue, which
can explain the different in the size of the company.
VII. Growth data and methodology
There is limited publish data on the growth of the bioinformatics industry. Most of the
data deal with the macroscopic aspects of the industry and does not go onto detail about the
small minutia that exists in the industry. In this way, the small niche markets are overlooked and
the larger market for well-known processes are focused on. For example, the grow of NGS
sequencing as seem by Illumina appears to be the basis of most articles. And other things such as
the use of artificial intelligence is overlooked.
The consensus is that bioinformatics is set to grow rapidly in the coming years. There is growth
in in the genomic context over proteomics or other industry services. Though, the market report
is very vague. There is too much heterogeneity in the industry to talk about each niche market in
detail. In the context methodology, it would be improbable to collect all this data on privately
held companies due to contacts with vendors and contact with the government. Therefore, as
expected. We only have categorical data that is somewhat hard to interpret. We will therefore go
into specific detail about the industry leaders in the categories mentions in the industry report.
IV.
Illumina
Illumina is the industry leader in the process of Next Generation Sequencing and the
development of NGS machines. HiSeq is the most technologically advanced line of sequencing
machines that allows for the sequencing of a human genomes in one to two days for the cost of
about USD $1,000. This technological development has occurred due to improvements in
computations, development of better algorithms and the optimization of biological processes. It
can be assumed that the biggest advancements have been in computation. In the context of
bioinformatics, there are many calculations that are necessary to construct a genome after
information is obtained from the machines. The full genome can be constructed and stored after
appropriate statistics are performed.
As of 2015, Illumina’s revenue is 2.2 Billion dollars. The price of the Hiseq machine is around
700,000, which suggest that only 3,000 machines are sold annually.
Acceleys(BioVia)
Accelrys has a variety of products that provide value to a wide number of customers. It’s main
product is an enterprise platform. Other products include software that assists in research
analyses and software that assists in drug discovery and modeling. In 2015, it’s revenue was 150
Million dollars.
Schrödinger
Schrodinger offers molecular modeling for the purpose of drug discovery and material science.
The basis of the models are a combination of Newtonian Mechanics and Quantum mechanics.
Schrödinger ha s a variety of competitors and the aspect that make their software unique is
ambiguous. A validation report again other products does not exist. However, it is estimated that
their revenue is 25 million dollars a year.
X. Recommendations and summary
The bioinformatics industry is quite a fascinating industry. There is a lot of projected
growth and a lot of potential for technological disruption. The diversity among companies is
astonishing. In so much, the industry appears to be less competitive as other industries. The
barriers to entry are somewhat obscure. Technological expertise and a relatively low amount of
financial backing are necessary for starting up a bioinformatics company.
There seems to be a bias for the human health sector for bioinformatics. There are
relatively few bioinformatics companies that assist in research in the agricultural or
environmental science industries. These industries are the least competitive out of all
bioinformatics companies, but the revenues of both sectors are comparable.
As we head into the future, it would be expected that there is considerable emphasis on
the use of NGS to gather more information on human health. In the long term, there will be
companies that deal with protein expression as well. Molecular drug discovery will also evolve
into something more technologically advance with the devolvement of novel algorithms with
artificial intelligence. Ultimately, the industry will continue to change and reach deeper into the
development of technologies that can do very powerful things.
References
http://www.marketsandmarkets.com/PressReleases/bioinformatics-market.asp
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702932/
https://www.researchgate.net/figure/267634664_fig1_New-drugs-approved-by-the-FDA-
BLAstherapeutic-biologics-filed-under-Original
https://www.statista.com/statistics/266171/revenue-of-pfizer-since-2006/
https://www.glassdoor.com/Overview/Working-at-Schrodinger-EI_IE428690.11,22.htm
https://www.grandviewresearch.com/press-release/global-bioinformatics-market

More Related Content

Similar to BioniformRep

How to finance the biomedical research with securitization techniques, a prac...
How to finance the biomedical research with securitization techniques, a prac...How to finance the biomedical research with securitization techniques, a prac...
How to finance the biomedical research with securitization techniques, a prac...
Paolo Vaona
 
Biologics This Decade
Biologics This DecadeBiologics This Decade
Biologics This Decade
Amanda Boddington
 
Pharma Market Research Report Dec 2013
Pharma Market Research Report Dec 2013Pharma Market Research Report Dec 2013
Pharma Market Research Report Dec 2013
Brian Attig
 
Health-Tech: Endless opportunities & endless disruption
Health-Tech: Endless opportunities & endless disruptionHealth-Tech: Endless opportunities & endless disruption
Health-Tech: Endless opportunities & endless disruption
Anirban Bhattacharjee
 
2012 EY Biotech Report
2012 EY Biotech Report2012 EY Biotech Report
2012 EY Biotech Report
Gautam Jaggi
 
Artificial intelligence in Pharmaceutical Industry
Artificial intelligence in Pharmaceutical Industry Artificial intelligence in Pharmaceutical Industry
Artificial intelligence in Pharmaceutical Industry
Mounika Mouni
 
Artificial intelegance.pptx
Artificial intelegance.pptxArtificial intelegance.pptx
Artificial intelegance.pptx
TejasSonawane19
 
The Future of Bioprocessing – What you need to know.
The Future of Bioprocessing – What you need to know.The Future of Bioprocessing – What you need to know.
The Future of Bioprocessing – What you need to know.
Merck Life Sciences
 
Biomarkers: Discovery Techniques and Applications - A Global Market Overview ...
Biomarkers: Discovery Techniques and Applications - A Global Market Overview ...Biomarkers: Discovery Techniques and Applications - A Global Market Overview ...
Biomarkers: Discovery Techniques and Applications - A Global Market Overview ...
MarketResearch.com
 
How to Deliver Value "Beyond the Pill"
How to Deliver Value "Beyond the Pill"How to Deliver Value "Beyond the Pill"
How to Deliver Value "Beyond the Pill"
Hewlett Packard Enterprise Business Value Exchange
 
Application of Machine Learning in Drug Discovery and Development Lifecycle
Application of Machine Learning in Drug Discovery and Development LifecycleApplication of Machine Learning in Drug Discovery and Development Lifecycle
Application of Machine Learning in Drug Discovery and Development Lifecycle
AI Publications
 
Application of Machine Learning in Drug Discovery and Development Lifecycle
Application of Machine Learning in Drug Discovery and Development LifecycleApplication of Machine Learning in Drug Discovery and Development Lifecycle
Application of Machine Learning in Drug Discovery and Development Lifecycle
AI Publications
 
Pharma Digital Marketing Trends to Watch in 2020
Pharma Digital Marketing Trends to Watch in 2020Pharma Digital Marketing Trends to Watch in 2020
Pharma Digital Marketing Trends to Watch in 2020
Let's Learn Digital
 
The Future of Bioprocessing – What you need to know.
The Future of Bioprocessing – What you need to know.The Future of Bioprocessing – What you need to know.
The Future of Bioprocessing – What you need to know.
MilliporeSigma
 
The Pharma 2020 series : PWC Pharma-success-strategies
The Pharma 2020 series : PWC Pharma-success-strategiesThe Pharma 2020 series : PWC Pharma-success-strategies
The Pharma 2020 series : PWC Pharma-success-strategies
Utai Sukviwatsirikul
 
Whitepaper: New concept opportunities in personal care
Whitepaper: New concept opportunities in personal careWhitepaper: New concept opportunities in personal care
Whitepaper: New concept opportunities in personal care
Sagentia
 
Progressions_2011_Final_021011
Progressions_2011_Final_021011Progressions_2011_Final_021011
Progressions_2011_Final_021011
Gautam Jaggi
 
Christina DiscussionHi Class,Per the National Library of Medi
Christina DiscussionHi Class,Per the National Library of MediChristina DiscussionHi Class,Per the National Library of Medi
Christina DiscussionHi Class,Per the National Library of Medi
VinaOconner450
 
Synthetic Biology Market
Synthetic Biology MarketSynthetic Biology Market
Synthetic Biology Market
analyze123
 
5th Biosimilars Congregation 2014
5th Biosimilars Congregation 20145th Biosimilars Congregation 2014
5th Biosimilars Congregation 2014
Fen Castro
 

Similar to BioniformRep (20)

How to finance the biomedical research with securitization techniques, a prac...
How to finance the biomedical research with securitization techniques, a prac...How to finance the biomedical research with securitization techniques, a prac...
How to finance the biomedical research with securitization techniques, a prac...
 
Biologics This Decade
Biologics This DecadeBiologics This Decade
Biologics This Decade
 
Pharma Market Research Report Dec 2013
Pharma Market Research Report Dec 2013Pharma Market Research Report Dec 2013
Pharma Market Research Report Dec 2013
 
Health-Tech: Endless opportunities & endless disruption
Health-Tech: Endless opportunities & endless disruptionHealth-Tech: Endless opportunities & endless disruption
Health-Tech: Endless opportunities & endless disruption
 
2012 EY Biotech Report
2012 EY Biotech Report2012 EY Biotech Report
2012 EY Biotech Report
 
Artificial intelligence in Pharmaceutical Industry
Artificial intelligence in Pharmaceutical Industry Artificial intelligence in Pharmaceutical Industry
Artificial intelligence in Pharmaceutical Industry
 
Artificial intelegance.pptx
Artificial intelegance.pptxArtificial intelegance.pptx
Artificial intelegance.pptx
 
The Future of Bioprocessing – What you need to know.
The Future of Bioprocessing – What you need to know.The Future of Bioprocessing – What you need to know.
The Future of Bioprocessing – What you need to know.
 
Biomarkers: Discovery Techniques and Applications - A Global Market Overview ...
Biomarkers: Discovery Techniques and Applications - A Global Market Overview ...Biomarkers: Discovery Techniques and Applications - A Global Market Overview ...
Biomarkers: Discovery Techniques and Applications - A Global Market Overview ...
 
How to Deliver Value "Beyond the Pill"
How to Deliver Value "Beyond the Pill"How to Deliver Value "Beyond the Pill"
How to Deliver Value "Beyond the Pill"
 
Application of Machine Learning in Drug Discovery and Development Lifecycle
Application of Machine Learning in Drug Discovery and Development LifecycleApplication of Machine Learning in Drug Discovery and Development Lifecycle
Application of Machine Learning in Drug Discovery and Development Lifecycle
 
Application of Machine Learning in Drug Discovery and Development Lifecycle
Application of Machine Learning in Drug Discovery and Development LifecycleApplication of Machine Learning in Drug Discovery and Development Lifecycle
Application of Machine Learning in Drug Discovery and Development Lifecycle
 
Pharma Digital Marketing Trends to Watch in 2020
Pharma Digital Marketing Trends to Watch in 2020Pharma Digital Marketing Trends to Watch in 2020
Pharma Digital Marketing Trends to Watch in 2020
 
The Future of Bioprocessing – What you need to know.
The Future of Bioprocessing – What you need to know.The Future of Bioprocessing – What you need to know.
The Future of Bioprocessing – What you need to know.
 
The Pharma 2020 series : PWC Pharma-success-strategies
The Pharma 2020 series : PWC Pharma-success-strategiesThe Pharma 2020 series : PWC Pharma-success-strategies
The Pharma 2020 series : PWC Pharma-success-strategies
 
Whitepaper: New concept opportunities in personal care
Whitepaper: New concept opportunities in personal careWhitepaper: New concept opportunities in personal care
Whitepaper: New concept opportunities in personal care
 
Progressions_2011_Final_021011
Progressions_2011_Final_021011Progressions_2011_Final_021011
Progressions_2011_Final_021011
 
Christina DiscussionHi Class,Per the National Library of Medi
Christina DiscussionHi Class,Per the National Library of MediChristina DiscussionHi Class,Per the National Library of Medi
Christina DiscussionHi Class,Per the National Library of Medi
 
Synthetic Biology Market
Synthetic Biology MarketSynthetic Biology Market
Synthetic Biology Market
 
5th Biosimilars Congregation 2014
5th Biosimilars Congregation 20145th Biosimilars Congregation 2014
5th Biosimilars Congregation 2014
 

BioniformRep

  • 1. Market Report: The Bioinformatics Services Industry October 15th 2016
  • 2. Table of Contents I. Introduction i. General overview II. Industry Overview i. Table of bioinformatics companies III. Categories of Bioinformatics Companies i. Drug Discovery ii. Personalized Medicine iii. Preventative Medicine iv. Data Management IV. Macroscopic Trends i. Summary of market reports V. Current Adhesive Customer Profile i. Overview of customer types VI. Top Company Overviews i. Illumina ii. Accelrys iii Schrödinger VII. Preliminary Recommendations i. Summary ii. Recommendation on how resources should be distributed
  • 3. I. Introduction: In the world economy, there has been an increase in bioinformatics companies. This has most obviously occurred due to the need for improve research methods, improving drug development efficiency and, generally, the advancement human knowledge. Part of this growth is due to the ability for the human genome to be sequenced using Next Generation Sequencing (NGS). With the abundance of information such as billions of base pairs, it is no wonder that bioinformatics is a growing industry both in size and complexity. And base pair sequences are not just the input to this algorithm— we have recently seen a shift from genomics to proteomics. This is exemplified by the ENCODE project which seems to find out the regulatory proteins that exist in a living organism, which is equal or of greater importance than just coming to understand simple sequences. And there are of course countless other inputs that are used in research today that have informational outputs. For example: protein sequence and conformation, chemical formulas and structure; regulatory proteins and binding sites, and differential genomes and differently expressed genes. In short, there is an increase in the complexity in this bioinformatics sector. It is therefore the purpose of this paper to give an overview of the highly diverse bioinformatics sector in the hopes that some patterns can be elucidated and suggestions to current issues can be addressed. In effect, this report should give someone who has relatively no knowledge of the industry enough information for him or her to construct a very good picture of the past, current, a future face of bioinformatics. First of all, let us define bioinformatics. The most basic definition is the use of computers to manage or manipulate biological information. We are not talking about simple calculations in this context, but the use of computers that have unique algorithms with unique inputs and unique outputs to typically deal with information that cannot be handled by a single person. And so we see that bioinformatics is not well defined by this definition— there are infinite processes that fall under the described categories. And so, we tend to talk about bioinformatics in a general sense, not in the particular. But we must remember at the same time the particular is important as well. The particular is where innovation comes from and it is therefore very important. This paper cannot talk about all particulars, but it will mention those that are important. The general trends, although connected to the particular, are what this paper will focus on. The different general group of bioinformatics classifications are: personalized medicine, drug discovery, preventative medicine, and agronomics. Within these categories are sequence analysis, genome analysis, pharmacodynamics, structure activity analysis, clinical trials management, and diagnostics. There are a variety of categories that exist. Below is a diagram of the major categories to the biotechnology sector adapted from Markets and Markets. Figure 1: Bioinformatics Industry Breakup
  • 4. The obvious benefits to bioinformatics is a decrease in the cost to do research and development. By optimizing processes and the algorithms that form he basis of bioinformatics, cost to research and development can be greatly reduced. Many processes can be streamlined. Though, the degree of efficiency that is obtained is completely dependent on the validity of the algorithms. This puts a high amount of dependency on new, efficient, and accurate algorithms. There has been a steady increase in the type of algorithms and the number of algorithms in the bioinformatics sector, as evidences by the increase of the number of products and the increase in the number of tracks these products can perform. In the context of the drug development industry, we see that there are massive research and development costs for a single drug. This is due to the elimination of a high degree of variables to find the specific small molecule that exists to perform a function. Money is spent on target identification and target validation. After that, a molecule needs to be found that binds directly to the target. And the that drug need to be safe and effective. And so we see that the main functions of the drug discovery process is to cut down the variation that exist in an organism— to take the discrete out of the continuous. And this is very much done via trail and error and via empirical methods. There really is no rational method to drug design. Even in using the current bioinformatics tools for structure analysis, the algorithms are not accurate to a high degree. And so, once again the effectiveness is dependent on the validity of the algorithms. And so there must be a more accurate algorithm that could furthermore increase the efficiency. Another issue that comes up in the drug companies is the development of novel therapeutics that bring a substantial value to the market. It seems at though the number of drugs that are approved currently are for diseases that are already well managed. And it seems that the “new” meaning in the term “new drugs” referred merely to the drug itself, not the effect of the Bioinformatics Industry Breakup NGS Cheminformatics Scientific Software MicroArray Biological Data Others
  • 5. drug on the patient. And so there must be a method in which we can use bioinformatics to help this dilemma. There must be a way that new, novel, and revolutionary drugs are discovered verses the traditional methods. The answer may depend on organizing signaling cascades, finding new regulatory proteins, or using bioinformatics to obtain information from raw sequences. In these contexts, accomplishing these goal just comes down to the input or the correct data efficiency of the algorithms. Below is a diagram from ResearchGate. There is also a bubble that does exist in the market— an astonishing overvalue of some companies that have little to know value. This in turn, multiplied by a variety of companies, gives the illusion that the market is growing. This overvaluation is quite necessary though for development as investors would not be able to put their money down if there were not optimistic about the company’s ability. And progress would not be made. A simple way to measure the ability for pharmaceutical and medical device company’s ability to perform would be a year after year look at the fraction of money invested and the revenue generated. In so much we can do this and as we see below, that this ratio tends to remain constant. There is therefore little growth of the industry in this context. The diagram is adapted from Statista.com.
  • 6. The objective of this article is to once again to give an overview of the bioinformatics industry and pinpoint certain pressure points that needs to be considered. The alleviation of stress at these pressure point should make the system more efficient, or at least give a foundation for new innovation. The industry is like an artist who paints a painting. In order to sell the painting certain requirements must be made such as sufficient quality an innovation. The only way that the paintings can reach this quality is through either experience or knowledge. And so this paper seeks to give the painter knowledge so he can know what to paint and now how to do it. This way, the artist fulfills his role as an artist and a valid painting is created. Similarly, the bioinformatics industry has many projects that need to be completed and certain projects hold more weight than others. II. The bioinformatics industry overview There are a variety of bioinformatics companies that exist. This paper focuses on companies who offer at lead on bioinformatics service to a customer. That means that we are not focusing on internal bioinformatics processes that occur out of a company’s own research and development budget. Open source bioinformatics resources are not considered part of the biotech industry. They are, however, of importance. The makeup of this sector will be mentioned later in the article in the context of discussion innovations in bioinformatics. The bioinformatics companies are organized alphabetically. Certain categories that exist are: database management and maintenance, biobank maintenance, NGS diagnostics, lab management systems, sequence analysis, work flow development, comparative genomics, pathway analysis, life science research software, proteomics, modeling of complex systems, cheminformatics, drug discovery, SAR, clinical trials management, and imaging software. Company Name Year Established Main Competencies Estimated Revenue/ Public or Private Notes 0 10 20 30 40 50 60 70 Year 2006 2007 2008 2009 2010 2011 2012 2013 2014 Phizer Revenue vs. R & D Costs Revenue M USD Research and Development in M USD
  • 7. Valuation Accelrys 2001 Workflow Integration/ Automation/ Data Sharing/Enterprise Management 155 m/ 750 m Private Acquired by Dassault in 2014 Advaita 2014 Genetic Analysis, Pathway Analysis, Variant Analysis Unreported/ Unreported Private National Cancer Institute Adapts Pathway Program Advanced Brain Monitoring 2001 Monitoring Brain Function Unreported/ Unreported Private Heavily funded by NIH Agile Bio 2002 Laboratory Management and Collaboration Platform Unreported/ Unreported Private Annai Systems 2010 NGS analysis Unreported/ Unreported Private Applied Maths 1992 Analyses of a laboratory assays Unreported/ Unreported Private BioNumerics Suite ArcherDx 2013 NGS for cancer treatment Unreported/ Unreported Private Astrid Bioscience 2014 Research Data Management, NGS, Banking Unreported/ Unreported Private Atomwise 2012 AI to improve drug discovery Unreported/ Unreported Private Ebola treatment BaseHealth 2011 Predictive analytics based on compiled scientific data Unreported/ Unreported Private Berg 2014 Healthcare optimization through AI Unreported/ Unreported Private Biobam 2011 Functional analysis of genomes Unreported/ Unreported Private Biobase 1997 Molecular Biology Databases Unreported/ Unreported Private Knowledge process outsourcing Biodiscovery 1997 Genomic Analysis Unreported/ Unreported Private BioMax 1997 Tissue Arrays Unreported/ Unreported Private Bioxing 2013 Library for DNA and protein Microarrays Unreported/ Unreported Private
  • 8. Cellcion 2016 Flo Cytometry Software Unreported/ Unreported Private Certara 2008 PK/PD Analysis Unreported/ Unreported Private CLC BIO 2005 Microbial genetics and sequence analysis Unreported/ Unreported Private Clondiag Chip Technologies 2006 Microarrys Unreported/ Unreported Private Cognuse Neuroscience 2013 Cognitive rehab program Unreported/ Unreported Private Compugen 1993 Drug discovery through data analytics Unreported/ Unreported Public Focus on cancer and autoimmunity Cytel 1987 Statistical software Unreported/ Unreported Private Data MATRIX 2009 Clinical Research Management Software Unreported/ Unreported Private Definiens 1994 Tissue Image Analysis Unreported/ Unreported Private Desktop Genetics 2012 CRISPR Design and Analysis Unreported/ Unreported Private DNA Star 1984 Sequence analysis Unreported/ Unreported Private Eidogen- Sertanty 2003 Software and database for drug discovery Unreported/ Unreported Private eneData 1997 Life Science Research Unreported/ Unreported Private Envisagenics 2014 Predictive Analytics Unreported/ Unreported Private ePitope Informatics 2001 Antibody epitope discovery Unreported/ Unreported Private FreeThink Technologies 2011 Stability Studies Unreported/ Unreported Private Genamics 2003 Online journals Unreported/ Unreported Private Largest free journal database GenCAD 1997 Compilation of annotated genomes Unreported/ Unreported Private Gene Codes 1988 Sequence and microarray Unreported/ Unreported Private Gene Data 1997 Automated workflows Unreported/ Unreported Private Gene Talk 2011 Sequence analysis Unreported/ Unreported Private Genestack 2011 Bioinformatics organizer Unreported/ Unreported Private
  • 9. Geneva Bioinformatics (GeneBio) 1997 Life Science Software Unreported/ Unreported Private GenoLogics (Illumina) 2002 Laboratory Management System 20m/Unrep orted Private Genomatica, Inc. 2000 Manufacting process Unreported/ Unreported Private Gene BDO licensed to BASF Genomatix Software GmbH 1997 Literature and genome editing Unreported/ Unreported Private Genome Life Sciences 2004 Biomarker Discovery Unreported/ Unreported Private Genostarr 2004 Microbial Research Unreported/ Unreported Public-private Geospiza, Inc. 1997 Genetic Analysis Workflow 4.5m/ Unreported Private goBalto 2008 Clinical Trials Mangemnt Unreported/ Unreported Private Health Discovery Corporation Home Page 2004 Diagnostics vis Artificial Intelligence 10.5m /Unreported Public HeartVista 2006 Imaging Technologies Unreported/ Unreported Private Illumia 1998 NGS Systems 2.2 B /15 B Public INCOGEN, Inc. 2000 Workflows and Data Management Unreported/ Unreported Private Informagen 1992 Sequence Analysis Unreported/ Unreported Private Ingenuity Pathways Analysis 1998 Pathway analysis Unreported/ Unreported Private IPA System Inte:Ligand 2003 Drug Discovery Unreported/ Unreported Private Integrated Proteomics Applications 2008 Analysis of proteins Unreported/ Unreported Private Integromics 2002 Data management for proteomics or genomics Unreported/ Unreported Private inviCRO 1964 Diagnostics through MRI Unreported/ Unreported Public InVitrogen 1987 Drug discovery and genomics Unreported/ Unreported Public IO Informatics 2003 Data Management Unreported/ Unreported Private Kelaroo 2000 Inventory Management Unreported/ Unreported Private
  • 10. Lab7 2012 Enterprise System for NGS Unreported/ Unreported Private LabVantage Solutions 1981 Laboratory Management Unreported/ Unreported Private MAcVector 2007 Sequence design and analysis Unreported/ Unreported Private MasterControl 1993 Enterprise management Unreported/ Unreported Private Metabolic Explorer 2007 Bacteria manufacturing 53m/Unrep orted Public MetaHelix Life Sciences 2001 Agricultural/GMO Unreported/ Unreported Private Metalife AG 2000 Compilation of data Unreported/ Unreported Private MicroDiscover y 2000 Personalized medicine and research Unreported/ Unreported Private Molecular Connections 2001 Drug discovery via database Unreported/ Unreported Private Over 1000 employees NextBio 2005 Knowledge sharing Unreported/ Unreported Private NimbleGen Systems, Inc. 1999 Next gen optimization Unreported/ Unreported Private NorayBio 2002 Bioscience software Unreported/ Unreported Private Novocraft 2008 NGS Analysis Unreported/ Unreported Private Ocimum Bio SOlutions 2000 Genomics and laboratory management Unreported/ Unreported Private Optibrium 2009 Small Molecule Design Unreported/ Unreported Private Origent Data Sciences 2013 Data Analytics Unreported/ Unreported Private Premeir Biosoft International 1994 Life science research solutions Unreported/ Unreported Private Qlucore 2007 Data Analysis Unreported/ Unreported Private Quaigen 1984 Molecular Diagnostics 210m/Unre ported Private Quantiom 2003 Statistical data analysis Unreported/ Unreported Private QuantumBio 2004 Data analysis Unreported/ Unreported Private Quartzy 2009 Lab Inventory Unreported/ Unreported Private QuesGen Systems, Inc. 2004 Clinical Trails Management Unreported/ Unreported Private
  • 11. Sapio Sciences 2004 Laboratory Management System Unreported/ Unreported Private Schrödinger 1990 Molecular Simulation Unreported/ Unreported Private Seascape Learning 1997 Molecular modeling Unreported/ Unreported Private Sequencing.co m 2014 Storage of genetic data and analysis Unreported/ Unreported Private SilcsBio 2012 Protein Modeling and Drug Discovery Unreported/ Unreported Private SimBioSys 2001 3D Modeling Unreported/ Unreported Private SoftGenetics 2001 Genetic Analysis Unreported/ Unreported Private SOLABS 1999 Quality Management System Unreported/ Unreported Private SolveBio 2013 Precision genetics, personalized medicine Unreported/ Unreported Private SRA International 1978 IT for government Unreported/ Unreported Public Station X 2010 Enterprise software Unreported/ Unreported Private Strand Life Sciences 2000 Sequence analysis Unreported/ Unreported Private Synamatix Sdn Bhd 2001 Sequence analysis Unreported/ Unreported Private Synergistix 1997 Customer relationship management Unreported/ Unreported Private TimeLogic 1981 Sequence Analysis Unreported/ Unreported Private twoXAR 2014 Drug Discovery Unreported/ Unreported Private Veeva Systems 2007 Cloud computing Unreported/ 5B Public VeraChem 2000 Drug Discovery Via Modeling Unreported/ Unreported Private Verily (Google) 2015 Big Data for Life Sciences Unreported/ Unreported Public Veritomyx 2009 Spectral Analysis Unreported/ Unreported Private The majority of bioinformatic companies are privately held, have less than twenty employees, and offer consulting services. There are a couple stand out companies that are substantially larger such as Certara or Schrodinger. The larger companies offer a much more
  • 12. comprehensive software options and their software is considered an industry standard. The smaller companies tend to deal with small niche product lines, versus the general product lines. There is therefore a very real chance for acquisition for the smaller companies depending how it relates to larger company’s product lines. In summary, the bioinformatics industry is moderately difficult to navigate. There have been considerable shifts in the industry in recent years due to the decrease cost of NGS. This has allowed companies to be formed that solely deal with NGS analysis and the storing of NGS data. Additionally, a trend that is occurring is the use for artificial intelligence or machine learning in a multitude of contexts. In these types of systems, quality results can be ensured by a positive or negative feedback mechanism. Atomwise, a recent company that uses this process and has rapidly found a possible treatment for Ebola. There are also macroscopic trends that exist such as the transition to more personalized medicine or preventive medicine. Bioinformatics and the use of these technologies are applicable to these fields as well. For any company that enters this race to find a valuable product, there is competitiveness is only as good as their algorithms or minimum viable product. In so much, many companies that obtain funding by selling this idea are ultimately dependent on developing their technology. Let us, in the nest section, examine what barriers these companies face. III. Bioinformatics Company Categories i. Drug Discovery The drug discovery industry faces many barriers to discovering viable drugs. The traditional method to rational drug discovery depends on how well a target can be validated and how well a small molecule can be used to inhibit the target. In the lab, this is extremely costly in terms of money and time. The assays vary from reporter assays, to microarrays, to comparative genomics, to gene knockouts or add-ins. High Throughput Screening (HTS) is also very costly, but can overlook target identification to find an effective drug. There has been a substantial increase in the use of in silico methods to find targets. This however, is not an easy undertaking. Comparative genomics really fails to find targets. Comparative proteomics is much more effective. The reason is because comparative genomics lacks crucial information about how the genome is expressed. Gene expression is one of the most complex and highly regulated process in particular and in a general sense. And so differential proteomics is much better because knowledge requires to know what genes are expressed can be overlooked. The only way for comparative genomics to surpass comparative proteomics would be to accurately model the intense regulatory processes for genes. And these processes may have variability due to the genetic diversity that exist between humans. And so, for this barrier to be overcome, there would need to be a model or formula that exist that could readily express what genes are transcribed. On top of this, the interactions between proteins and signaling cascades would have to be modeled. This is currently being done, but not by rational methods, rather, but empirical
  • 13. methods. For example, Advaita uses compiled research articles to form models for important biological pathways. And after a target has been discovered, the assays that follow to find the most effective drug that binds to a target is very much a process. In traditional methods, this is done via reported assays or some variation of high throughput screening. In the context of in silico methods, this process is difficult because of the sheer number of combinations of active compounds. There are a variety that will theoretically bind. The difficulty lies in the finding the select few that have the proper effectiveness, ADME profile, and the proper cytotoxicity level when applied to in vivo systems. These process can also be modeled in silico. And so the inefficiency seems to come from the most technologically advanced molecular modeling software is not necessarily finding the right molecule, but it is narrowing it down to find the best molecule. Validation reports should exist to show how well the software selects a potential drug candidate, but are quite hard to find. Theoretically, an approved compound should be traced back to the SAR relationship with the target and the lack of SAR with other important targets. The combination of these two relationships would create compound that is both effective and has proper ADME and cytotoxicity profiles. Currently, in the market, there is no software that can accurately predict the SAR relationship with the essential proteins of the human genome. The data required to do this would be immense and there would have to be a mathematical or logical approach to overcoming this barrier in the future. The current drug modeling software used integrated Newtonian dynamics along with quantum mechanics. Ultimately, what is mapped out in the context of molecular modeling software is the energy states and energy profiles of a molecule. The equations that exist here, However, are not accurate. They are at best, a “best guess” and require significant modification to adhere to accuracy. One would suggest that the code that does exist for this software doe have an empirical aspect to it in sol much that the codes is not completely rational. The proper question to ask is for the way to overcome this discrepancy would be: So why do the equations not work? The reason is that the mathematical basis for the equations are not accurate of course. There would need to be a technical advancement in how molecules are modeled— there would have to be an advancement in molecular modeling. This is very much the barrier to technical innovation that many drug discovery companies do face. And there are extremely clever ways that the companies try to adjust their models to be more accurate. For example, Atomwise used artificial intelligence to narrow down SAR relationships. In this company, we can assume that the software has a build in trial and error system that allows for the algorithm to be continuously modified and improved while different compounds are tested. This is the cleverest way that drug development can be improved and explains why the company is backed by such solid investors. In summary, the limitations that companies have that can be overcome via technological advantage are 1) Understanding and modeling the regulation of gene expression, 2) Using a molecular dynamic theory that works with a high degree of efficiency in silico. 3) Being able to
  • 14. understand and create novel pathways 4) Modeling the interactions between drugs and proteins to accurately predict cytotoxic effects and ADME profiles. Ultimately, all of these points come down to the fact that molecular dynamics needs to be revolutionized. This would accurately predict all microscopic interactions that exist in the cell and allow biologist to understand everything that exists between genotype and phenotype that is not currently known. A possible way to overcome this barrier is to reevaluate the model of an atom, for that is the general input for the molecular dynamic theory. If these barriers are overcome, there is potential for enormous value, not just to the pharmaceutical industries mass inefficiencies, but the efficacy of healthcare to treat disease. II. Personalized Medicine The concept of personalized medicine has existed for many years. But until recently has it only begun to merge with the field of bioinformatics. This is of course due to the existence of NGS and the huge amount of data that follows. Personalized medicine is therefore very much based on sequence analyses of whole genomes. By comparing and contrasting genomes of different people, each patient can be placed into a certain group that requires a certain treatment. The idea of personalized medicine involves numerous applications. For example, in cancer, a patient’s genome or even the cancer genomes can be analyzed to optimize treatments. But even in this context, the benefit of personalized medicine is not completely lucid. It is hard to prove that is works in all contexts. There are considerable challenges that personalized medicine faces sue to the inability to organize all the genetic variations in humans and the treatments that these variations suggest. And so there is the technique of using genetic analysis in a case by case context and dependent ton disease types. And through time the compilation of this data should theoretically lead to personalized medicine that is highly accurate and effective. However, let’s reiterate the fact that personalized medicine is hard to create. The variations that exists between genomes is quite difficult to understand. One would assume that one genetic marker would be all that is required to have an implication. This is not necessarily the case for some diseases as there can be multiple markers. In addition, there are environmental factors that cause disease that have to be integrated for proper diagnosis. And so personalized medicine becomes convoluted— it is merely a statistic that has a probability of being correct. Despite its shortcomings, personalized medicine has progressed to a point where is does offer significant value to patients. However personalized medicine can be optimized to a different level if new algorithms and more data is collected. III. Preventative Medicine Preventative medicine also has existed for a very long time, even before bioinformatics became relevant. Currently, privative medicine seems to identify patients who are at risk and alter their lifestyle to mitigate their risk. In modern day medicine, the at risk patients tend to be at risk for cardiovascular disease or heart attack. Obesity is one of the facts that can cause this
  • 15. complication and many other health issues. Therefore, changing diet and increasing exercise is the most common prescription for preventative medicine. Bioinformatics can and does apply to this regimen of weight loss, but bioinformatics seeks to go further. Cancer is one of those things that is much easier to be treated in its early stages. One of the risk factors is of course age, but there are a variety of genetic and ontological risk that can be observed. Bioinformatics can be used to home in on these slight changes and increase the change of diagnosing cancer in the early stages. Traditional methods in those field used for preventative medicine are annual checkups and they work quite well. However, current bioinformatic companies are trying to use sequence analysis and biosensors to detect cancer even earlier. One example of these companies is Verily. The certain barriers to having this type of technology implemented are of course obtaining the hard data that is necessary to make a prediction and of course the difficulty in developing biosensors that can continuously monitor patients. In so much, the idea dog this type of preventative medicine is very similar to personalized medicine in that there seems to be a gap between the known data and the requires data to make a significant contribution to science or healthcare. IV. Data Management Data management is a general term that can apply many industries. In the context of the biological sciences, there is a need to accurately store and communicate data between people. Science is become increasingly collaborative and this certainly plays a role in the needs for bioinformatics software that is easy to use and allows for easy communication. The large about of data that can be obtained by new assays such as NGS poses problems that some companies face. There is, respectively, bioinformatics companies that specialize in these aspects of the industry. There are companies that can store large amount of data in a cloud-based form and there are companies that have a platform for communication of both private and public data. The competitive barriers that exist to improving data management are, in line with all aspects of the bioinformatics industry, are novel algorithms, adherence to customer needs, and optimization of architecture. Data management is such a general term that it is a prerequisite that this form of bioinformatics has many forms. And in fact, there are a lot of bioinformatic companies in this niche that are competing with each other. In order to gain a competitive advantage, these companies must adhere to the competitive barriers mentioned above. VIII: Customer profile The customer profile of the bioinformatics industry is heterogeneous. The largest customers of course are large biotech companies whose mission is drug discovery. Secondarily, there is a great deal of customer basis in the academic research arena. Thirdly, there is the healthcare sector in where products are sold directly to clinicians from bioinformatic companies. There are of course customers that exist in the private sector are also less known. For example, in the agricultural context of biotechnology, there is a great need of bioinformatics to produce genetically modified organisms. Consequently, Monsanto would be a good customer for the bioinformatics industry. However, a variety of the bioinformatics processes are done internally in a research and
  • 16. development context. For this particular customer, certain value could be found if they outsourced their informatics to a professional company that had strategic industry knowledge. Below is a graph of the varied customer profiles in the industry. Below is an industry report from Grandview research. In summary, the bioinformatic industry is quite heterogeneous and highly specialized. There exist not companies that have highly distinguished products, but rather a high number of companies with similar products. The size of the company is not necessarily due to the effectiveness of the product, but the relationships that exist between customer and the company. What forms the basis of this relationship is tough to describe in particular, but there must be some social aspect that make one company have a high affinity for a bioinformatics company. In the largest companies, there is a year after year rapport that must develop. In addition, companies that re larger and have more services also tend to have more customers and larger revenue, which can explain the different in the size of the company. VII. Growth data and methodology There is limited publish data on the growth of the bioinformatics industry. Most of the data deal with the macroscopic aspects of the industry and does not go onto detail about the small minutia that exists in the industry. In this way, the small niche markets are overlooked and the larger market for well-known processes are focused on. For example, the grow of NGS sequencing as seem by Illumina appears to be the basis of most articles. And other things such as the use of artificial intelligence is overlooked. The consensus is that bioinformatics is set to grow rapidly in the coming years. There is growth in in the genomic context over proteomics or other industry services. Though, the market report is very vague. There is too much heterogeneity in the industry to talk about each niche market in detail. In the context methodology, it would be improbable to collect all this data on privately held companies due to contacts with vendors and contact with the government. Therefore, as
  • 17. expected. We only have categorical data that is somewhat hard to interpret. We will therefore go into specific detail about the industry leaders in the categories mentions in the industry report. IV. Illumina Illumina is the industry leader in the process of Next Generation Sequencing and the development of NGS machines. HiSeq is the most technologically advanced line of sequencing machines that allows for the sequencing of a human genomes in one to two days for the cost of about USD $1,000. This technological development has occurred due to improvements in computations, development of better algorithms and the optimization of biological processes. It can be assumed that the biggest advancements have been in computation. In the context of bioinformatics, there are many calculations that are necessary to construct a genome after information is obtained from the machines. The full genome can be constructed and stored after appropriate statistics are performed. As of 2015, Illumina’s revenue is 2.2 Billion dollars. The price of the Hiseq machine is around 700,000, which suggest that only 3,000 machines are sold annually. Acceleys(BioVia) Accelrys has a variety of products that provide value to a wide number of customers. It’s main product is an enterprise platform. Other products include software that assists in research analyses and software that assists in drug discovery and modeling. In 2015, it’s revenue was 150 Million dollars. Schrödinger Schrodinger offers molecular modeling for the purpose of drug discovery and material science. The basis of the models are a combination of Newtonian Mechanics and Quantum mechanics. Schrödinger ha s a variety of competitors and the aspect that make their software unique is ambiguous. A validation report again other products does not exist. However, it is estimated that their revenue is 25 million dollars a year. X. Recommendations and summary The bioinformatics industry is quite a fascinating industry. There is a lot of projected growth and a lot of potential for technological disruption. The diversity among companies is astonishing. In so much, the industry appears to be less competitive as other industries. The
  • 18. barriers to entry are somewhat obscure. Technological expertise and a relatively low amount of financial backing are necessary for starting up a bioinformatics company. There seems to be a bias for the human health sector for bioinformatics. There are relatively few bioinformatics companies that assist in research in the agricultural or environmental science industries. These industries are the least competitive out of all bioinformatics companies, but the revenues of both sectors are comparable. As we head into the future, it would be expected that there is considerable emphasis on the use of NGS to gather more information on human health. In the long term, there will be companies that deal with protein expression as well. Molecular drug discovery will also evolve into something more technologically advance with the devolvement of novel algorithms with artificial intelligence. Ultimately, the industry will continue to change and reach deeper into the development of technologies that can do very powerful things. References http://www.marketsandmarkets.com/PressReleases/bioinformatics-market.asp https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702932/ https://www.researchgate.net/figure/267634664_fig1_New-drugs-approved-by-the-FDA- BLAstherapeutic-biologics-filed-under-Original https://www.statista.com/statistics/266171/revenue-of-pfizer-since-2006/ https://www.glassdoor.com/Overview/Working-at-Schrodinger-EI_IE428690.11,22.htm https://www.grandviewresearch.com/press-release/global-bioinformatics-market