Phenotypic characterization of individual cells provides crucial insights into intercellular heterogeneity and enables access to information that is unavailable from ensemble averaged, bulk cell analyses. Single-cell studies have attracted significant interest in recent years and spurred the development of a variety of commercially available and research-grade technologies. To quantify cell-to-cell variability of cell populations, we have developed an experimental platform for real-time measurements of oxygen consumption (OC) kinetics at the single-cell level. Unique challenges inherent to these single-cell measurements arise, and no existing data analysis
methodology is available to address them. Here we present a data processing and analysis method that addresses challenges encountered with this unique type of data in order to extract biologically relevant information. We applied the method to analyze OC profiles obtained with single cells of two different cell lines derived from metaplastic and dysplastic human Barrett’s esophageal epithelium. In terms of method development, three main challenges were considered for this heterogeneous dynamic system: (i) high levels of noise, (ii) the lack of a priori knowledge of single-cell dynamics, and (iii) the role of intercellular variability within and across cell types.
Several strategies and solutions to address each of these three challenges are presented. The features such as slopes, intercepts, breakpoint or change-point were extracted for every OC profile and compared across individual cells and cell types. The results demonstrated that the extracted features facilitated exposition of subtle differences between individual cells and their responses to
cell–cell interactions. With minor modifications, this method can be used to process and analyze
data from other acquisition and experimental modalities at the single-cell level, providing a valuable statistical framework for single-cell analysis.
Method for physiologic phenotype characterization at the single-cell level in...Shashaanka Ashili
Intercellular heterogeneity is a key factor in a variety of core cellular processes including proliferation, stimulus response, carcinogenesis, and drug resistance. However, cell-to-cell variability studies at the single-cell level have been hampered by the lack of enabling experimental techniques. We present a measurement platform that features the capability to quantify oxygen consumption rates of individual, non-interacting and interacting cells
under normoxic and hypoxic conditions. It is based on real-time concentration measurements of metabolites of
interest by means of extracellular optical sensors in cell-isolating microwells of subnanoliter volume. We present the results of a series of measurements of oxygen consumption rates (OCRs) of individual non-interacting and interacting human epithelial cells. We measured the effects of cell-to-cell interactions by using the system’s capability to isolate two and three cells in a single well. The major advantages of the approach are: 1. ratiometric, intensity-based characterization of the metabolic phenotype at the single-cell level, 2. minimal invasiveness due to the distant positioning of sensors, and 3. ability to study the effects of cell-cell interactions on cellular respiration rates.
this presentation depicts the usefulness of single cell profiling in crop plant for identifying novel gene sources which can be used for crop improvement
Method for physiologic phenotype characterization at the single-cell level in...Shashaanka Ashili
Intercellular heterogeneity is a key factor in a variety of core cellular processes including proliferation, stimulus response, carcinogenesis, and drug resistance. However, cell-to-cell variability studies at the single-cell level have been hampered by the lack of enabling experimental techniques. We present a measurement platform that features the capability to quantify oxygen consumption rates of individual, non-interacting and interacting cells
under normoxic and hypoxic conditions. It is based on real-time concentration measurements of metabolites of
interest by means of extracellular optical sensors in cell-isolating microwells of subnanoliter volume. We present the results of a series of measurements of oxygen consumption rates (OCRs) of individual non-interacting and interacting human epithelial cells. We measured the effects of cell-to-cell interactions by using the system’s capability to isolate two and three cells in a single well. The major advantages of the approach are: 1. ratiometric, intensity-based characterization of the metabolic phenotype at the single-cell level, 2. minimal invasiveness due to the distant positioning of sensors, and 3. ability to study the effects of cell-cell interactions on cellular respiration rates.
this presentation depicts the usefulness of single cell profiling in crop plant for identifying novel gene sources which can be used for crop improvement
Exosomes biomarkers mediating important biological process,especially in the systemic disease
diagnostics and therapeutics,yet the protective exosomal vesicle structure hinders rapid,simple detection of the harbored molecules.
EXTRACTION AND CLASSIFICATION OF BLEBS IN HUMAN EMBRYONIC STEM CELLdbpublications
A main objective of this paper is
to extract bleb from the human
embryonic stem cells. Blebbing is an
important biological indicator in
determining the health of human
embryonic stem cells (hESC). Especially,
areas of a bleb sequence in a video are
often used to distinguish two cells
blebbing behaviours in HESC; dynamic
and apoptotic blessings. Here analyses
active contour segmentation method for
bleb extraction in hESC videos and
introduces a bio-inspired score function
to improve the performance in bleb
extraction. The full bleb formation
consists of bulb expansion and retraction.
Blebs change their size and image
properties dynamically in both processes
and between frames. Therefore, adaptive
parameters are needed for each
segmentation method. A score function
derived from the change of bleb area and
orientation between consecutive frames with cuckoo optimization is proposed
which provides adaptive parameters for
bleb extraction in videos and classified
using artificial neural networks (ANN).
Lecture presented by Dr.Fatma Taha at BIOCHEM Cairo 2014 organized by Department of Medical Biochemistry and Molecular Biology, Cairo University. BIOCHEM Cairo 2014 is a Scribe event ( www.scribeofegypt.com)
Development of cancer therapeutics is often carried out in 2D cultures prior to testing on animal model. In comparison to 2D cultures, discuss the potential of using 3D in vitro models for drug efficiency testing.
Exosomes are 30-150 nm tiny vesicles secreted by most cell types in vivo and in vitro. They are found in all body fluids including plasma, serum, saliva, urine, amniotic fluid, malignant ascite fluids, and cultured medium of cell cultures.
A normal cell can be transformed into a cancerous cell. Discuss the therapeutic strategies that are employed to target the cellular transformation process for cancer prevention and treatment.
The pursuit of understanding cellular processes and their intricate interplay with external stimuli lies at the heart of modern biomedical research. In this context, assay development in cell culture has emerged as an indispensable tool, allowing scientists to investigate cellular responses, signalling pathways, drug effects, and disease mechanisms in a controlled and replicable environment. This essay delves into the significance of assay development in cell culture, its methodologies, applications, and contributions to advancing scientific knowledge.
Exosomes biomarkers mediating important biological process,especially in the systemic disease
diagnostics and therapeutics,yet the protective exosomal vesicle structure hinders rapid,simple detection of the harbored molecules.
EXTRACTION AND CLASSIFICATION OF BLEBS IN HUMAN EMBRYONIC STEM CELLdbpublications
A main objective of this paper is
to extract bleb from the human
embryonic stem cells. Blebbing is an
important biological indicator in
determining the health of human
embryonic stem cells (hESC). Especially,
areas of a bleb sequence in a video are
often used to distinguish two cells
blebbing behaviours in HESC; dynamic
and apoptotic blessings. Here analyses
active contour segmentation method for
bleb extraction in hESC videos and
introduces a bio-inspired score function
to improve the performance in bleb
extraction. The full bleb formation
consists of bulb expansion and retraction.
Blebs change their size and image
properties dynamically in both processes
and between frames. Therefore, adaptive
parameters are needed for each
segmentation method. A score function
derived from the change of bleb area and
orientation between consecutive frames with cuckoo optimization is proposed
which provides adaptive parameters for
bleb extraction in videos and classified
using artificial neural networks (ANN).
Lecture presented by Dr.Fatma Taha at BIOCHEM Cairo 2014 organized by Department of Medical Biochemistry and Molecular Biology, Cairo University. BIOCHEM Cairo 2014 is a Scribe event ( www.scribeofegypt.com)
Development of cancer therapeutics is often carried out in 2D cultures prior to testing on animal model. In comparison to 2D cultures, discuss the potential of using 3D in vitro models for drug efficiency testing.
Exosomes are 30-150 nm tiny vesicles secreted by most cell types in vivo and in vitro. They are found in all body fluids including plasma, serum, saliva, urine, amniotic fluid, malignant ascite fluids, and cultured medium of cell cultures.
A normal cell can be transformed into a cancerous cell. Discuss the therapeutic strategies that are employed to target the cellular transformation process for cancer prevention and treatment.
The pursuit of understanding cellular processes and their intricate interplay with external stimuli lies at the heart of modern biomedical research. In this context, assay development in cell culture has emerged as an indispensable tool, allowing scientists to investigate cellular responses, signalling pathways, drug effects, and disease mechanisms in a controlled and replicable environment. This essay delves into the significance of assay development in cell culture, its methodologies, applications, and contributions to advancing scientific knowledge.
Interactomics, Integromics to Systems Biology: Next Animal Biotechnology Fron...Varij Nayan
“Organisms function in an integrated manner-our senses, our muscles, our metabolism and our minds work together seamlessly. But biologists have historically studied organisms part by part and celebrated the modern ability to study them molecule by molecule, gene by gene. Systems biology is critical science of future that seeks to understand the integration of the pieces to form biological
systems”
(David Baltimore, Nobel Laureate)
Flow Cytometry: Guide to Understanding Applications and Benefits | The Lifesc...The Lifesciences Magazine
Flow cytometry provides unmatched insights into the properties and functions of cells, revolutionizing the fields of clinical diagnosis and biological research.
Presenter: Angela Oliveira Pisco , PhD
Abstract
Although the genome is often called the blueprint of an organism, it is perhaps more accurate to describe it as a parts list composed of the various genes that may or may not be used in the different cell types of a multicellular organism. While nearly every cell in the body has essentially the same genome, each cell type makes different use of that genome and expresses a subset of all possible genes. This has motivated efforts to characterize the molecular composition of various cell types within humans and multiple model organisms, both by transcriptional and proteomic approaches. We created a human reference atlas comprising nearly 500,000 cells from 24 different tissues and organs, many from the same donor. This atlas enabled molecular characterization of more than 400 cell types, their distribution across tissues, and tissue-specific variation in gene expression. One caveat to current approaches to make cell atlases is that individual organs are often collected at different locations, collected from different donors, and processed using different protocols. Controlled comparisons of cell types between different tissues and organs are especially difficult when donors differ in genetic background, age, environmental exposure, and epigenetic effects. To address this, we developed an approach to analyzing large numbers of organs from the same individual. We collected multiple tissues from individual human donors and performed coordinated single-cell transcriptome analyses on live cells. The donors come from a range of ethnicities, are balanced by gender, have a mean age of 51 years, and have a variety of medical backgrounds. Tissue experts used a defined cell ontology terminology to annotate cell types consistently across the different tissues, leading to a total of 475 distinct cell types with reference transcriptome profiles. The Tabula Sapiens also provided an opportunity to densely and directly sample the human microbiome throughout the gastrointestinal tract. The Tabula Sapiens has revealed discoveries relating to shared behavior and subtle, organ-specific differences across cell types. We found T cell clones shared between organs and characterized organ-dependent hypermutation rates among B cells. Endothelial cells and macrophages are shared across tissues, often showing subtle but clear differences in gene expression. We found an unexpectedly large and diverse amount of cell type–specific RNA splice variant usage and discovered and validated many previously undefined splices. The intestinal microbiome was revealed to have nonuniform species distributions down to the 3-inch (7.62-cm) length scale. These are but a few examples of how the Tabula Sapiens represents a broadly useful reference...Full abstract: https://dknet.org/about/blog/2726
Resource link: https://tabula-sapiens-portal.ds.czbiohub.org
Upcoming webinars schedule: https://dknet.org/about/webinar
Similar to A statistical framework for multiparameter analysis at the single cell level (20)
Optical Properties of Mesoscopic Systems of Coupled MicrospheresShashaanka Ashili
Two mechanisms of optical coupling between spherical cavities, tight-binding between their whispering gallery modes and focusing produced by periodically coupled microlenses, are directly observed using spatially resolved scattering spectroscopy and imaging. The results can be used for developing device concepts of lasers, optical filters, microspectrometers and sensors based on mesoscopic systems of coupled microspheres.
A physical sciences network characterization of non-tumorigenic and metastati...Shashaanka Ashili
To investigate the transition from non-cancerous to metastatic from a physical sciences perspective, the
Physical Sciences–Oncology Centers (PS-OC) Network performed molecular and biophysical comparative studies of the non-tumorigenic MCF-10A and metastatic DA-MB-231 breast epithelial cell lines, commonly used as models of cancer metastasis. Experiments were performed in 20 laboratories from 12 PS-OCs. Each laboratory was supplied with identical aliquots and common reagents and culture protocols. Analyses of these measurements revealed dramatic differences in their mechanics, migration, adhesion, oxygen response, and proteomic profiles. Model-based multi-omics approaches identified key differences between these cells’ regulatory networks involved in morphology and survival. These results provide a multifaceted description of cellular parameters of two widely used cell lines and demonstrate the value of the PS-OC Network approach for integration of diverse experimental observations to elucidate the phenotypes associated with cancer metastasis.
Percolation of light through whispering gallery modes in 3D lattices of coupl...Shashaanka Ashili
Using techniques of flow-assisted self-assembly we synthesized three-dimensional (3D) lattices of dye-doped fluorescent (FL) 5 μm polystyrene spheres with 3% size dispersion with well controlled thickness from one monolayer up to 43 monolayers. In FL transmission spectra of such lattices we observed signatures of coupling between multiple spheres with nearly resonant whispering gallery modes (WGMs). These include (i)
splitting of the WGM-related peaks with the magnitude 4.0-5.3 nm at the average wavelength 535 nm, (ii) pump dependence of FL transmission showing that the splitting is seen only above the threshold for lasing WGMs, and (iii) anomalously high transmission at the WGM peak
wavelengths compared to the background for samples with thickness around 25 μm. We propose a qualitative interpretation of the observed WGM transport based on an analogy with percolation theory where the sites of the
lattice (spheres) are connected with optical “bonds” which are present with probability depending on the spheres’ size dispersion. We predict that the WGM percolation threshold should be achievable in close packed 3D
lattices formed by cavities with ~103 quality factors of WGMs and with ~1% size dispersion. Such systems can be used for developing next generation of resonant sensors and arrayed-resonator light emitting devices.
Optical coupling and transport phenomena in chains of spherical dielectric mi...Shashaanka Ashili
The optical transmission properties of chains or circuits of touching polystyrene microspheres with sizes in the 3–20 mm range and a size dispersion of ,1% are studied. The dye-doped spheres with fluorescent peaks due to whispering gallery modes were attached to one end of the chains. The effects of optical transport were detected using spatially resolved scattering spectroscopy. The
attenuation was shown to be ,3 to 4 dB per sphere for the modes with the best transport properties. A mechanism for the observed transport is suggested based on the formation of strongly coupled photonic modes in the systems of randomly detuned resonators with size disorder. It is shown that such circuits possess broad bandpass waveguiding characteristics essential for applications in integrated all-optical network devices.
The effects of inter-cavity separation on optical coupling in dielectric bisp...Shashaanka Ashili
The optical coupling between two size-mismatched spheres was studied by using one sphere as a local source of light with whispering gallery modes (WGMs) and detecting the intensity of the light scattered by a second sphere playing the part of a receiver of electromagnetic energy. We developed techniques to control inter-cavity gap sizes between microspheres with ~30nm accuracy. We demonstrate high efficiencies (up to 0.2-0.3) of coupling between two separated cavities with strongly detuned eigenstates. At small separations (<1 μm) between the spheres, the mechanism of coupling is interpreted in terms of the Fano resonance between discrete level (true WGMs excited in a source sphere) and a continuum of “quasi”-WGMs with distorted shape which can be induced in the receiving sphere. At larger separations the spectra detected from the receiving sphere originate from scattering of the radiative modes.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
A statistical framework for multiparameter analysis at the single cell level
1. 804 Mol. BioSyst., 2012, 8, 804–817 This journal is c The Royal Society of Chemistry 2012
Cite this: Mol. BioSyst., 2012, 8, 804–817
A statistical framework for multiparameter analysis at the single-cell levelw
Wandaliz Torres-Garcı´a,ab
Shashanka Ashili,b
Laimonas Kelbauskas,*b
Roger H. Johnson,b
Weiwen Zhang,*b
George C. Runger*a
and Deirdre R. Meldrumb
Received 17th October 2011, Accepted 2nd December 2011
DOI: 10.1039/c2mb05429a
Phenotypic characterization of individual cells provides crucial insights into intercellular
heterogeneity and enables access to information that is unavailable from ensemble averaged, bulk
cell analyses. Single-cell studies have attracted significant interest in recent years and spurred the
development of a variety of commercially available and research-grade technologies. To quantify
cell-to-cell variability of cell populations, we have developed an experimental platform for
real-time measurements of oxygen consumption (OC) kinetics at the single-cell level. Unique
challenges inherent to these single-cell measurements arise, and no existing data analysis
methodology is available to address them. Here we present a data processing and analysis method
that addresses challenges encountered with this unique type of data in order to extract
biologically relevant information. We applied the method to analyze OC profiles obtained with
single cells of two different cell lines derived from metaplastic and dysplastic human Barrett’s
esophageal epithelium. In terms of method development, three main challenges were considered
for this heterogeneous dynamic system: (i) high levels of noise, (ii) the lack of a priori knowledge
of single-cell dynamics, and (iii) the role of intercellular variability within and across cell types.
Several strategies and solutions to address each of these three challenges are presented. The
features such as slopes, intercepts, breakpoint or change-point were extracted for every OC profile
and compared across individual cells and cell types. The results demonstrated that the extracted
features facilitated exposition of subtle differences between individual cells and their responses to
cell–cell interactions. With minor modifications, this method can be used to process and analyze
data from other acquisition and experimental modalities at the single-cell level, providing a
valuable statistical framework for single-cell analysis.
Introduction
Cell-to-cell variability has been found to play a central role in
a variety of physiological processes such as differentiation,
proliferation, stress response and pathogenesis. Due to the
stochastic nature of many intracellular processes, individual
cells can exhibit significant phenotypic differences and respond
differently to stimuli and changes in the microenvironment.1–4
The origin of many diseases is thought to be in several, or
perhaps even one aberrant cell that acquires the capability to
evade the cues regulating normal cell function and death.
Early identification and detailed characterization of such
abnormal cells bear the potential not only to provide deep
insights into fundamental cell processes, but also to open new
avenues for treatment and management of diseases with high
morbidity and mortality, including cancer. Because of that,
single-cell studies have been gaining momentum in the last
decade facilitated by technological advances enabling reliable
measurement of various biologically relevant parameters with
high sensitivity and precision. To study cell signaling and
metabolic pathways, one needs to be able to characterize
simultaneously as many parameters of living single cells
as possible. Multiparameter analysis could reveal the details
of intracellular mechanisms, providing novel insights into
systems biology of cells.
Technological challenges such as extremely low amounts of
biological material, small differential changes in metabolite
concentrations and the fragility of cells have been hampering
significant progress in single-cell analysis. One of the major
limitations in single-cell experiments is the low signal-to-noise
ratio. Reliable separation of meaningful data from noise
represents a formidable challenge, one that is exacerbated by
the absence of a priori knowledge of the dynamics of physio-
logical processes that take place in individual cells. This is
particularly true in experiments where single living cells need
a
School of Computing, Informatics, and Decision Systems
Engineering, Arizona State University, Tempe, AZ 85287-5906,
USA. E-mail: George.Runger@asu.edu
b
Center for Biosignatures Discovery Automation, The Biodesign
Institute, Arizona State University, Tempe, AZ 85287-6501, USA.
E-mail: Laimonas.Kelbauskas@asu.edu, Weiwen.Zhang@asu.edu
w Electronic supplementary information (ESI) available. See DOI:
10.1039/c2mb05429a
Molecular
BioSystems
Dynamic Article Links
www.rsc.org/molecularbiosystems PAPER
DownloadedbyArizonaStateUniversityon14March2012
Publishedon05January2012onhttp://pubs.rsc.org|doi:10.1039/C2MB05429A
View Online / Journal Homepage / Table of Contents for this issue
2. This journal is c The Royal Society of Chemistry 2012 Mol. BioSyst., 2012, 8, 804–817 805
to be characterized with minimal perturbation of their normal
function, further limiting the experimentalists’ choice among
available methodology.
There are numerous methods available to remove white noise
in dynamic data. Many of them have been suitably adapted for
application in a variety of research fields such as chemistry,
environmental studies and medicine.5–7
Nevertheless, reducing
random perturbation in a signal is not a trivial task, since
it is usually unclear how much noise can be removed without
losing the ‘‘true’’ signal. Quality assurance problems in newly
developed technologies are common. For example, in the
1990’s, when DNA microarrays started to gain interest in the
scientific community, the importance and value of the unique
information for understanding biological systems,8–13
as well as
the need for quality assessment and noise reduction14,15
were
clearly acknowledged. In light of these challenges, many noise
reduction methods were proposed, and later developed into a
mature and unified methodology.16,17
In a way analogous to
noise reduction, the characterization of data and signals in the
bioinformatics arena has been widely studied especially for data
quality assessment purposes in microarray-based studies. Many
feature selection techniques are commonly used for microarray
data characterization, including selection of genes with signifi-
cant expression levels in response to changes in conditions or
experimental settings.18
Modeling of real-time data obtained from dynamical
systems has been explored in the literature utilizing traditional
statistical methods.19–23
The traditional methods tend to
establish parametric assumptions which are often hard to
justify in complex biological systems. Hence, there exists a
critical need to model real-time measurements in biological
systems, including live cells, without a priori knowledge of the
nature of underlying dynamical processes. However, so far
none of these established methods have been applied to
analyze data obtained from individual living cells.
Here we present a study focused on the analysis of novel
respiration kinetics data from individual cells. The cell
metabolic analysis method entails manipulation24
and isolation
of single cells25
and determination of their oxygen consump-
tion (OC) kinetics in real-time.26–28
The data obtained from
these measurements exhibit much higher levels of noise com-
pared to bulk-cell experiments. The lack of a priori knowledge
of single-cell dynamics makes it difficult to define charac-
teristic features in these datasets, posing challenges in the
extraction of biologically information and its proper biological
interpretation. The real-time nature of the measurements
contributes additional complexity to the analysis. In this
work we describe our initial efforts to develop statistical
methodologies to address the challenges of noise reduction,
data characterization through feature extraction, and biological
comparison for respiration phenotype measurements in
individual cells. We analyzed OC kinetics data of single cells
obtained from two esophageal epithelial cell lines: metaplastic
(CP-A) and early dysplastic (CP-C) Barrett’s esophageal cells.
These cell lines were derived from biopsies taken from the
corresponding regions in human esophagus and represent
different stages of pre-neoplastic progression. Because of the
clear delineation of the two cell types in terms of histopathology
and their relevance to cancer, our findings may also be of interest
to cancer biologists. Because they serve to define and extract
elements of a disease biosignature, the statistical methodologies
presented here could be used as a foundational framework for
analyzing single-cell data.
Results
Data preprocessing
The data consist of OC kinetics in single human metaplastic
(CP-A) and early dysplastic (CP-C) cells. Fig. 1 summarizes
the challenges addressed in this unique data structure. The
early exploration of OC measurements at the single-cell level
indicated the need to reduce noise and unwanted perturbations
in the signals. Reducing noise helps enhance the discovery of
Fig. 1 Statistical framework diagram. Sequential steps to process and analyze single-cell oxygen consumption data: smoothing, feature extraction
and classification. Major challenges and proposed solution strategies to address each one of them are shown.
DownloadedbyArizonaStateUniversityon14March2012
Publishedon05January2012onhttp://pubs.rsc.org|doi:10.1039/C2MB05429A
View Online
3. 806 Mol. BioSyst., 2012, 8, 804–817 This journal is c The Royal Society of Chemistry 2012
relevant features related to the ‘‘true’’ signal’s behavior. In our
approach two main stages of noise reduction were performed:
(1) low-pass filtering and (2) outlier smoothing. Common
filtering techniques were applied to the OC data in two
different ways. First, a filter was applied to each of the OC
curves to estimate a curve-specific metric of variation that was
used to detect outliers in the unsmoothed data. This outlier
smoothing process invoked traditional control charts in which
any data point of the OC curves lying outside curve-specific
control limits was considered an outlier and its value was
smoothed by neighborhood averaging (see the Methods section).
This step reduced the adverse influence of artifact caused by
stochastic response of the microsensor or other measurement
system components. After outlier smoothing, the resulting signal
was processed through a low-pass filter (Fig. 2).
Feature extraction
After preprocessing, the data analysis procedure aimed to
characterize the OC kinetics. The feature extraction step
addresses challenge number two described in Fig. 1, which
can be divided into two separate problems: (1) removal of
redundant information characterized through the understanding
of experimental limitations, and (2) extraction of distinctive
features without a priori knowledge of the system.
Detection of the time needed to reach zero oxygen concen-
tration in the microchambers. The removal of redundant
information from further analysis is based on experimental
considerations. During measurement, individual cells are
hermetically isolated in sub-nanolitre volume chambers which
results in a limited amount of oxygen being available for
consumption by cells.27,28
Data collected after the oxygen
concentration in the chambers reaches zero are not useful
for OC kinetics analysis and can be discarded as extraneous
or redundant. During an experiment, OC kinetics of nine
individual cells was recorded simultaneously. The time needed
for each cell to deplete the oxygen in the microchamber varied
significantly from cell to cell due to the metabolic rate hetero-
geneity. The experiment was continued until oxygen concen-
tration in all nine chambers reached zero, resulting in different
amounts of redundant data collected for each cell.28
To
address this issue, we proceeded to automatically detect the
time point where each curve reached a zero value (0% oxygen)
and to discard data collected after that time point (referred to
as zero-value tails), excluding it from further analysis (Fig. 2).
Hence, we define redundant information in the context of
experimental conditions, namely by the limited amount of
oxygen available for each cell to consume and the variable rate
at which they consume it. Further experimental details related
to the data used in this study can be found in the Methods
section. Removing the zero-value tails from each of the OC
curves facilitated robust modeling of these curves in regions of
interests and allowed for reliable feature extraction from the
kinetics. Removal of the zero-value tails would be a trivial
problem if one had to analyze a small number of curves or if
the time to reach zero was the same for all cells. For the
analysis of hundreds of cells, however, we needed to develop
an algorithm to automatically remove the tails to ensure
consistency and rapid data processing.
A cumulative sum (CUSUM) control chart is a commonly
used statistical tool to detect small changes.29
Its application
allowed us to automatically detect the time point at which each
OC curve reaches a zero oxygen value and does not change
significantly afterwards (Fig. 2). This statistical procedure
was performed to detect a change time point, feature called
TimetoZero, for each sample. A summary of these detected
time points across samples using the CUSUM procedure is
shown in Fig. 3. Stratification frequency plots of the times
needed to reach 0% oxygen concentration for the entire set of
data by cell lines are shown in Fig. 3 providing a general view
of this TimetoZero feature distribution across different cell
lines. We used this feature as a reference point to remove the
data lying beyond this point as redundant. However, by
determining the time-to-zero reference points, we captured a
unique characteristic of the OC kinetics to better understand
cell heterogeneity.
OC characterization and other features. Modeling OC curves
is challenging since there is no a priori knowledge of single-cell
respiration kinetics. Other than the notion that cells are
Fig. 2 Step-by-step statistical framework example. Main steps used to characterize the OC kinetics data are shown: (a) data filtering, (b) detection
of feature, TimetoZero, using CUSUM, (c) removal of zero-valued tails, (d) identification of characteristic features using a spline model.
DownloadedbyArizonaStateUniversityon14March2012
Publishedon05January2012onhttp://pubs.rsc.org|doi:10.1039/C2MB05429A
View Online
4. This journal is c The Royal Society of Chemistry 2012 Mol. BioSyst., 2012, 8, 804–817 807
heterogeneous, little is known about the characteristics of specific
factors such as oxygen consumption and their relationship to
different cell types and metabolic states.1,30
We addressed this
challenge by approximating OC kinetics with a constrained
piecewise linear regression model. This spline model fits two
continuous linear regressions with slopes constrained to be
negative. Based upon preliminary data analysis which revealed
this pattern, it seemed appropriate to study OC curves by means
of fitting two linear models (Fig. 2). The two continuous regres-
sions share a mutual breakpoint optimally detected through a
likelihood method across the entire time span. This model allows
us to capture features in different segments of the data.
The spline model was compared with the simple linear
regression model using a goodness-of-fit criterion. We
performed the comparison of these two models by using an
F test for each OC curve. These multiple comparisons raise a
commonly known problem in multiple hypothesis testing:
increased false-positives. To address this problem, we have
corrected all computed p-values using the Bonferroni correction
method.31
Through the evaluation of these tests, we found
that 99.3% and 97.7% of the OC kinetics data obtained from
CP-A and CP-C cells, respectively, could be fit better with the
constrained piecewise linear (spline) regression than with the
simple linear regression model at a = 0.001. Fig. 4 shows
the percentage of curves that were fit more accurately with the
spline model as a function of the level (a) of Type I error for the
F test. In general, more than 90% of OC curves obtained with
both cell types showed a statistically significant improvement of
the fit at different values of a when using the constrained
piecewise model as compared to simple linear regression.
A slightly higher percentage of curves measured from CP-A
compared to CP-C cells could be fit more reliably with the
constrained piecewise model.
The model enabled the extraction of relevant features that
were used to characterize the OC kinetics. Besides the regular
features from fitting linear regressions (intercepts and slopes),
we were able to detect several other features (Table 1), such as
time and oxygen concentration at which the first slope of the
piecewise model is replaced by the second slope. All features
were determined for each kinetics curve of both cell types
(CP-A and CP-C), and the feature distributions within and
across cell types were further analyzed. Whether or not piece-
wise linear regression represents a biologically relevant model
Fig. 3 Features histogram and significance tests between CP-A and CP-C for the TimetoZero feature. (a) Distribution histograms for single CP-A
and CP-C cells; (b) 95% confidence interval of the means of the feature for both cell types.
Fig. 4 Comparative multiple hypothesis testing between the spline
model and linear regression fit. Percentage of OC curves per cell type
that revealed a better fit with the spline model than with the linear
regression shown as a function of different values of a (Type-I error).
The Bonferroni correction was applied to the individual test p-values
to alleviate the problem of false-positives when multiple comparisons
are performed. Inset: zoom in on a range of [0, 0.05].
DownloadedbyArizonaStateUniversityon14March2012
Publishedon05January2012onhttp://pubs.rsc.org|doi:10.1039/C2MB05429A
View Online
5. 808 Mol. BioSyst., 2012, 8, 804–817 This journal is c The Royal Society of Chemistry 2012
of OC kinetics, it provided a good empirical fit to the
experimental data with a simple structure, permitting feature
extraction for comparative studies between different cell types
and conditions.
Validation
Prior to using this statistical methodology for biological data
interpretation, it was validated to assess its accuracy and
robustness. For validation we used a model system based on
enzymatic scavenging of oxygen by Oxyrase.32,33
Oxyrase is a
preparation of membrane fragments from Enterococcus coli
and contains membrane monooxygenases and dioxygenases.
When it comes in contact with lactic acid, Oxyrase removes
oxygen rapidly from aqueous environments, including cell
medium. Because of its enzymatic basis, oxygen removal
kinetics by Oxyrase can be modeled using the Michaelis–
Menten equation that describes enzymatic reaction rates as
a function of substrate concentration. To reproduce data
collection conditions as close as possible to actual experiments,
we measured oxygen consumption kinetics of Oxyrase (no cells)
using experimental settings identical to those used for single
cells. This ensures that the signal-to-noise ratios are similar to
single-cell data. We used four different Oxyrase concentrations,
50 mL, 150 mL, 200 mL, and 250 mL (ranging from 0.06–0.2% by
volume) for more robust validation of the statistical framework.
The features extracted from the OC kinetics data obtained
with Oxyrase utilizing the statistical framework showed signifi-
cant differences among signals measured with different
Oxyrase concentrations. The application of a Random Forest
classifier model34
to the extracted features revealed clear
discrimination among the four different concentrations with
out-of-bag error rates of 2% when all features were included
in the model, and 11.1%, when TimetoZero (see Feature
extraction) was removed from the data analysis. Ensemble
learners are predictive models that combine a collection of
simpler classifiers yielding better predictive performance as an
ensemble than any of the individual classifiers.35
The distinct
discrimination among the different Oxyrase concentrations
was visualized with the use of multidimensional scaling36
in
panels (a) and (b) of Fig. 5. Each panel portrays the visualization
Table 1 Extracted features and their descriptions
Features Description
Change-point.Time Time value at which the change in slopes in the piecewise linear fit takes place
Change-point.Oxygen Oxygen consumption value at which the change in slopes in the piecewise linear fit takes place
Intercept coefficient (B0) Intercept of left linear regression
Left slope coefficient (B1) Slope of the linear regression before the Change-Pointa
Right slope coefficient (B1) Slope of the linear regression after the Change-Pointa
Kurtosis Measure of ‘‘peakedness’’. Higher kurtosis means more of the variance is the result of infrequent extreme
deviations, as opposed to frequent modestly sized deviations.
Skewness Measure of the asymmetry.
Minimum MSE The Mean squared error value for the best piecewise linear regression fit.
TimetoZero Time at which the oxygen concentration in the chamber reaches a value of zero
Brief description of features extracted from curves after application of smoothing and filtering techniques.a
Slope magnitudes extracted from the
spline model are divided by two for curves obtained with two cells per well.
Fig. 5 Multidimensional Plots for Oxyrase enzymatic reaction for validation. This plot visualizes the scaling coordinates of the proximity matrix
obtained with a Random Forest performed to classify four distinct Oxyrase concentration values. These oxyrase measurements were gathered
through the same semi-automated technology as the OC curves in study. These were used in validation since its behavior is well-understood and
differences are expected across features from oxyrase curves from different concentrations.
DownloadedbyArizonaStateUniversityon14March2012
Publishedon05January2012onhttp://pubs.rsc.org|doi:10.1039/C2MB05429A
View Online
6. This journal is c The Royal Society of Chemistry 2012 Mol. BioSyst., 2012, 8, 804–817 809
patterns for the two Random Forest models discussed earlier:
(a) a classifier with all features and (b) a classifier with all
features except TimeToZero. The resulting proximity matrix
from the random forest classifier is used as input in multi-
dimensional scaling to find a suitable 2D visual configuration
that showcases the sample patterns. Each axis, named scaling
dimensions, represents the 2D coordinates in which these
patterns are plotted. The ability to clearly differentiate varying
reaction rates (slopes) obtained with different Oxyrase concen-
trations shows that our approach enables adequately robust and
accurate characterization of dynamic processes. By capturing
these differences among the signals known to have different
kinetics using the statistical framework employed in this work,
we validated our approach for application to single-cell OC data.
Biological inferences and interpretation
Comparison between different cell lines. Extracted quantita-
tive features such as slopes, intercepts, breakpoint or change-
point were compared across individual cells and cell types.
To detect differences between CP-A and CP-C features we
computed two sets of significance tests. A test of the statistical
Fig. 6 Comparison of features between CP-A and CP-C cells by means of a spline model. Three main features were extracted using the
constrained piecewise linear model: (a, b) oxygen concentration where the change of slopes in the fit occurs (change-point), (c, d) left (before slope
change) and (e, f) right (after slope change) slopes. Figures on the left show feature frequency values and those on the right show 95% confidence
interval of the features means.
DownloadedbyArizonaStateUniversityon14March2012
Publishedon05January2012onhttp://pubs.rsc.org|doi:10.1039/C2MB05429A
View Online
7. 810 Mol. BioSyst., 2012, 8, 804–817 This journal is c The Royal Society of Chemistry 2012
significance of differences between the means or the medians of
the features of the two cell lines revealed significant differences
for the TimetoZero and Change point.Oxygen features (Table 1).
The distribution of the time point when each OC kinetics
curve reaches a oxygen concentration value near zero
(TimetoZero feature) exhibits a broad range of values in both
cell types, as mentioned previously (Fig. 3). Statistical analysis
revealed significant differences between both the means
and the medians of the two cell types with p-values equal to
0.003 and 0.008, respectively (Fig. 3).
Another feature of interest is the value of oxygen at the
point where the two linear regressions of the spline model meet
(Change-point.Oxygen). At the breakpoint of the spline model
two features can be captured: oxygen concentration and time.
Oxygen concentration when the change in slopes takes place is
biologically relevant as it indicates a change in the oxygen
consumption kinetics most likely caused by alterations in the
energy production of the cell. The distributions of the Change-
point.Oxygen feature within each cell type showed character-
istics typical of a bimodal density function. Qualitatively the
distribution histograms of the two cell types show significant
similarity (Fig. 6) with a more clearly defined main peak at
6–6.5 ppm for CP-C cells. The distributions clearly indicate
marked heterogeneity in OC kinetics within the same cell type.
More subtle differences can be seen when comparing the two
cell types (Fig. 6b). One of the most notable differences is the
existence of a second, broader peak between 2–4 ppm in CP-C
cells, which is less pronounced in CP-A cells. However, the
statistical test of the mean and median showed p-values of
0.053 and 0.061, respectively, indicating that both of these
parameters are not statistically different at a =0.05.
Two other features that we analyzed were the slopes (rates)
of the OC kinetics measured in the study. Understanding how
fast individual cells consume oxygen is of great interest as it is
directly related to the energy production levels in the cell. The
distributions of the slopes showed a long tail containing only a
small number of cells, while the majority of the cells’ OC rates
were concentrated in a relatively narrow range (Fig. 6)
[À0.02,0]. For both, left and right slopes no statistically
significant differences between their means were found when
comparing the two cell types (Fig. 6). However, the median
values of the right slope were found to be statistically different
between the two cell types with a p-value equal to 0.002
(Fig. 6).
We further explored these comparisons as a classification
problem with two classes (e.g. one cell type versus another)
finding subtle differences between the two cell types using an
ensemble-based classifier: Random Forest. The classification
problem indicated an out-of-bag error rate of 30% when
classifying single-cell CP-A and CP-C cells based on the
extracted features (Table 1). A multidimensional plot from
the tested Random Forest (more details in the Methods
section: Comparisons and classification techniques) is shown
in Fig. 7. This plot shows differences among cell lines.
The role of intercellular interactions: comparison between OC
kinetics in isolated single and interacting cells. To explore
metabolic heterogeneity in the presence of intercellular inter-
actions, OC kinetics curves were obtained with two cells of
the same type placed into one microchamber. We compared
features extracted from the OC data of single cells (i.e., CP-A_1
and CP-C_1) with those obtained with two cells per single
chamber (i.e., CP-A_2 and CP-C_2). The same statistical
methodology was applied to CP-A_2 and CP-C_2 OC
curves as for the data acquired with single, non-interacting cells
with only minor modifications to certain features. To account
for the number of cells (one or two) per microchamber the
values of the slopes measured in microchambers with double
occupancy were divided by two assuming equal OC for the two
cells in a microwell, allowing comparisons with single-cell
slopes.
We first investigated the goodness-of-fit of the spline model
applied to the OC kinetics data of interacting cells. We
compared data fits obtained with the spline model and with
simple linear regression using a multiple hypothesis testing with
Bonferroni correction as described in the Methods section.
Similar to the results obtained with individual, non-interacting
cells of both cell lines, the spline model fit was found to be
statistically better than the simple linear regression model for
all measurements with double-occupancy, interacting cells
(Fig. S1, ESIw).
A set of features from CP-A_1, CP-A_2, CP-C_1, and
CP-C_2 curves were extracted using the constrained piecewise
linear regression model. Distribution patterns similar to those
obtained with single, non-interacting cells were found for the
OC kinetics curves with interacting cells for features such as
TimetoZero, Change-point.Oxygen, Left.Slope, and Right.Slope
(description in Table 1). Statistically significant differences in
both the mean and median were found for at least one of the
four distinct groups of OC curves for the feature TimetoZero as
Fig. 7 Multidimensional scaling plot: a Random Forest classifier for
single CP-A vs. CP-C cells. This plot visualizes the scaling coordinates
of the proximity matrix obtained from a Random Forest to classify
CP-A versus CP-C cells at the single-cell level. This graphical repre-
sentation shows how the Random Forest classifier was able to find
high-dimensional interactions between data features that cluster OC
curves together.
DownloadedbyArizonaStateUniversityon14March2012
Publishedon05January2012onhttp://pubs.rsc.org|doi:10.1039/C2MB05429A
View Online
8. This journal is c The Royal Society of Chemistry 2012 Mol. BioSyst., 2012, 8, 804–817 811
Fig. 8 The TimetoZero feature extracted from single- and double-cells for CP-A and CP-C oxygen consumption curves. Time to zero is a
time feature extracted after removal of zero-valued tails using the CUSUM method. (a) Distribution histogram of the feature among single,
non-interacting (CP-A_1 and CP-C_1) cells and for interacting (two cells per well; CP-A_2 and CP-C_2) cells. (b) 95% confidence interval plot of
the means of TimetoZero for each experimental condition. Testing for statistically significant differences between the means or between the
location shifts (e.g., medians) showed p-values equal to 0 in both cases.
Fig. 9 Other features of interest extracted from oxygen consumption kinetics of single, non-interacting- and double, interacting-CP-A and CP-C
cells. The left panels show distribution histograms of the corresponding features; the right panels show 95% confidence interval of the means of the
corresponding features. (a) and (b) Oxygen concentration values where the change of slopes in the spline model occurs. (c) and (d) Slope values of
the first linear regression of the spline model (Left.Slope). (e) and (f) Slope values of the second linear regression (Right.Slope). See Table 1 for
more detailed description of the slopes.
DownloadedbyArizonaStateUniversityon14March2012
Publishedon05January2012onhttp://pubs.rsc.org|doi:10.1039/C2MB05429A
View Online
9. 812 Mol. BioSyst., 2012, 8, 804–817 This journal is c The Royal Society of Chemistry 2012
shown in Fig. 8. With p-values close to zero, this feature may
be an important discriminator among these non-interacting
and interacting cells (less marked differences can be observed
for CP-C_2 probably due to its small sample size). Other
extracted features such as the ones presented in Fig. 9 (oxygen
concentration at breakpoint, slopes before and after the break-
point) portrayed less distinct differences among these groups
but revealed empirical distribution patterns only available
through the study of individual OC curves. For example, oxygen
concentration at the breakpoint revealed significant differences
for at least one group among all groups with p-values of 0.001
and 0.01 when testing for means and medians, respectively,
suggesting CP-C_1 as more different for this feature (Fig. 9). In
contrast, slope values (adjusted for interacting cells by dividing
by two) did not differ as much across different cell groups
besides the median of Right.slope which showcased a p-value
of 0.003 for at least one group being different among others
(Fig. 9). These comparisons are possible through the application
of the methodology presented in this work.
The features extracted using the statistical framework
allowed for multiple comparisons of different phenotypes.
As seen before, the distributions of each of the features
permitted comparisons and showcased subtle differences. To
further analyze the OC curves through the extracted features,
an ensemble classifier34,35
was applied with the objective
of classifying the four groups of interest (CP-A_1, CP-A_2,
CP-C_1, and CP-C_2). A Random Forest classifier34
(see Methods) was applied to the extracted features to unravel
nonlinear relationships among the relevant features. Initially,
we built Random Forest models for pairs of classes
(i.e., CP-A_1 vs. CP-A_2, CP-C_1 vs. CP-C_2, etc.) obtaining
error rates of B20–30% for all pairs. These models included
all extracted features. When all four data classes were included
in a single Random Forest model, the classification error rates
were found to be around 40% when all features were used in
the model and 50% for a Random Forest model that included all
features except TimetoZero (Fig. 10). The TimetoZero feature
was removed from the classification model to capture discrimi-
nant relationships among other features where differences might
not be as clear or direct as in the case of TimetoZero.
Table 2 shows the confusion matrices providing details on
how many curves were misclassified using the models with or
without the TimetoZero feature. Also shown in Table 2 is that
the number of curves among the four different classes is
unbalanced. To address this problem, down-sampling was
performed on all Random Forest models applied here to lessen
the sample size effect in the learning model. Down-sampling is
a sampling technique that reduces the size of the majority class
or the class with the greatest number of samples. It is widely
used to balance the classes to minimize the overall error rate.37
In addition, Table 3 presents the feature importance scores for
both Random Forest models. It can be seen that TimetoZero
has the highest score for distinguishing between the different
experimental classes. However, when the TimetoZero feature
was removed, all features ranked similarly. Although their
predictability measures are not high, the results obtained with
the Random Forest models show semi-defined clusters within
the same experimental condition or the cell type. Fig. 10 shows
how the data points of the same type of experiment tend to
agglomerate in regions partially overlapping with other experi-
mental conditions. This Random Forest model extracts non-
linear patterns among the features to discriminate among
different classes. The two cell lines used in the study represent
different stages of pre-neoplastic progression in esophageal
cancer and, thus, are closely related in their phenotypic and
genotypic profiles. Therefore, it is likely that they will show
similarities in terms of oxygen consumption as well, thus
making the differentiation more difficult. More features either
from the OC curves or any other biologically relevant data
might be necessary to distinguish them clearly.
Fig. 10 Multidimensional scaling plots: a Random Forest model for non-interacting and interacting CP-A and CP-C cells. This plot visualizes
the scaling coordinates of the proximity matrix obtained with a Random Forest performed to classify CP-A versus CP-C at the single- and
double-cell level. (a) Results using all features as described in Table 1. (b) Results using all features with the TimetoZero feature excluded from the
analysis.
DownloadedbyArizonaStateUniversityon14March2012
Publishedon05January2012onhttp://pubs.rsc.org|doi:10.1039/C2MB05429A
View Online
10. This journal is c The Royal Society of Chemistry 2012 Mol. BioSyst., 2012, 8, 804–817 813
Conclusion
The analysis and interpretation of intercellular heterogeneity
data are of fundamental importance in cell biology. A great
deal of interest is found in the scientific community to under-
stand the role of heterogeneity in cellular homeostasis and
pathogenesis.28,38
In recent years, innovative technologies
have been developed to perform biological studies at the
single-cell level,24–28
including single-cell oxygen consumption
measurements. Despite the availability of these technologies,
their real potential can only be exploited utilizing effective
analytical methods capable of performing robust de-noising
and feature extraction steps on the novel type of information.
Through preliminary studies, we have identified three major
challenges when dealing with real-time phenotypic measure-
ments at the single-cell level: random noise, presence of
multiple functional states, and reliable differentiation of cell
behavior within and across different cell types (Fig. 1). In this
study, using single-cell OC data as example, we made the
initial effort to establish a statistical framework for multi-
parameter analysis of the experimental data at the single-cell
level. In our approach to analyze single-cell data we applied
several sets of statistical tools used in signal processing
and statistics for data modeling and feature extraction. The
validation of the method showed that experimental data can be
modeled and their features extracted reliably. The quantitative
features extracted from the single-cell experimental data using
our analysis method revealed subtle differences between
non-interacting, single cells as well as between interacting cells of
both types. This demonstrates the feasibility of the developed
methodology to reliably process the measurement data and
characterize oxygen consumption kinetics. Because of its general
applicability, our statistical framework can be utilized to address
similar challenges that arise in other single-cell data acquisition
and experimental modalities.
Methods
Dataset
Description of oxygen consumption measurements. As a first
step in acquiring and analyzing multiparameter data, our
center has developed an experimental platform for metabolic
phenotype characterization, including oxygen consumption, at
the single-cell level.27,28
Single-cell oxygen consumption rates
are on a scale of fmoles minÀ1
cellÀ1
. Because oxygen sensing
is based on the dynamic quenching of sensor luminescence
by oxygen, the signal-to-noise ratio of the measurement varies
as a function of oxygen concentration in the microchamber.
This factor needs to be taken into account especially when
applying various signal processing algorithms for de-noising
purposes. In addition, other sources of noise include detector
readout noise, intensity variations of the excitation source, and
stochastic sensor noise. For the two cell types studied in this
work, the average time required for an isolated cell to consume
all oxygen within the finite volume (B140 pL) of cell media
ranges between 30–90 min. Noise levels resulting from the
various sources can be significant, requiring the data to be
analyzed utilizing a rigorous statistical framework capable of
reducing noise extracting quantitative features.
We analyzed several sets of oxygen consumption kinetics
data from two Barrett’s esophageal epithelial cell lines (meta-
plastic CP-A and dysplastic CP-C) obtained with the single-
cell technology. The number of OC curves studied for CP-A
and CP-C were 154 and 256, respectively. The cells were
loaded into microwells and incubated for 15–30 hours before
measurements were performed. The incubation time was
selected based on previous studies of cell viability and
morphology. After incubation, microwells with cells were
hermetically sealed with a lid containing an extracellular
optical oxygen sensor. The sensor emission intensity was
collected as a function of time until oxygen concentration in
the microchamber reached zero.27
Table 2 Confusion matrices obtained with Random Forest classifica-
tion models
(A) All features included:
True class
(Num. curves)
Predicted class
Class
error (%)CP-A_1 CP-A_2 CP-C_1 CP-C_2
CP-A_1 (154) 75 24 51 4 51.3
CP-A_2 (118) 4 81 1 32 31.4
CP-C_1 (256) 61 22 165 8 35.5
CP-C_2 (44) 5 20 2 17 61.4
(B) Without TimetoZero feature:
True class
(Num. curves)
Predicted class
Class
error (%)CP-A_1 CP-A_2 CP-C_1 CP-C_2
CP-A_1 (154) 74 29 45 6 51.9
CP-A_2 (118) 20 61 13 24 48.3
CP-C_1 (256) 60 28 142 26 44.5
CP-C_2 (44) 7 17 8 12 72.7
Individual error rates per cell type and different number of cells within
a microwell are shown for Random Forest models constructed using
all features and with the TimetoZero feature excluded from the
analysis. The numbers represent the number of curves classified as
the specific predicted class by the nonlinear model. Classification error
is calculated by the percentage of curves that were misclassified.
Misclassified signals are shown in the gray boxes.
Table 3 Variable importance scores from Random Forest classifica-
tion models
Features
Mean decrease gini (%)
All features Without TimetoZero
Change-point.Time 9.03 12.93
Change-point.Oxygen 10.99 13.23
Left.B0.Coef 10.17 12.89
Left.B1.Coef 10.37 12.53
Right.B1.Coef 13.81 12.49
TimetoZero 17.12 —
Kurtosis 9.14 11.88
Skewness 8.88 11.35
MSE.min 10.49 12.70
These variable importance scores are calculated based on the average
over all trees of a scoring measure. This scoring measure is computed
as the difference of correctly classified cases when the feature matrix
values are evaluated onto the grown tree minus correctly misclassified
items when the variable to be scored is permuted prior tree model
evaluation.
DownloadedbyArizonaStateUniversityon14March2012
Publishedon05January2012onhttp://pubs.rsc.org|doi:10.1039/C2MB05429A
View Online
11. 814 Mol. BioSyst., 2012, 8, 804–817 This journal is c The Royal Society of Chemistry 2012
Noise reduction techniques
The noise levels in OC data were reduced using two main
signal processing components: (1) Low-pass filtering and (2)
Outlier smoothing.
Low-pass filtering. Two common low-pass filtering techni-
ques were evaluated. A low-pass filter reduces the amplitude of
high frequencies while leaving low frequencies unchanged.
These two methods along with their parameters are briefly
described here. In addition, we discuss a goodness-of-fit
assessment to decide which of the filtering techniques performs
better for the measured OC kinetics curves.
The Savitzky–Golay (SG) filter is also called least-squares
polynomial smoothing filter and is a finite impulse response
(FIR) filter.39
The technique fits a polynomial of fixed degree
n to a small window of the data of size (2m + 1) to estimate
a midpoint as shown in eqn (1) and (2). This process is
repeated by sliding the data window along the total span.39,40
This type of convoluted filter minimizes the least-squares error
of fitting a polynomial to window frames of the noisy data and
is quite popular in areas such as spectroscopy and analytical
chemistry because of its simplicity and speed.41,42
If the
data are evenly spaced and continuous then the smoothed
value ðyÃ
t Þ is the weighted summation of the points in the
window frame as described in eqn (3). Savitzky–Golay’s early
methodology implementation results in the truncation of
m points at the start and end of the data signal which are
not able to be smoothed out. Therefore, extensions to the
Savitzky–Golay filter addressing initial and endpoint estimation
found in the literature were also implemented in this study.40,43
yÃ
t ¼
Xn
k¼0
bktk
¼ b0 þ b1t þ b2t2
þ Á Á Á þ bntn
;
t ¼ ½Àm; Àðm À 1Þ; . . . ; 0; . . . mŠ
ð1Þ
@
@bk
Xm
t¼Àm
ðyÃ
t À ytÞ2
" #
¼ 0 ð2Þ
yÃ
j ¼
Pm
t¼Àm
ctyjþt
N
ð3Þ
In our study, a second-order polynomial fit was tested; as it is
commonly used in practice.41
Another important parameter
needed in the SG filtering is the window length (m). Common
values for this parameter are m = 11 and m = 21. We evaluated
root-mean-squared-error (RMSE) for a range of values under
both conditions (e.g., CP-A and CP-C) as shown in Fig. S2
(ESIw). Data filtering in this study was performed using a window
size of 11, since the smoothing performance was found to be
better than with m = 21 in terms of preservation of local signal
patterns.
The second filter we applied was the Exponentially
Weighted Moving Average (EWMA). It is an infinite impulse
response (IIR) filter and represents a special case of the
moving average filter where the weights of the data points to
be averaged decay exponentially with the distance from the
most recent data point (eqn (4)). The smoothed value of yt is
obtained through
yÃ
t ¼ lyt þ ð1 À lÞyÃ
tÀ1 ð4Þ
where l represents the decay rate ranging from 0 r l r 1.
A small value of l gives more weight to older data and less to
new data and vice versa.29,44
To detect small signal changes
l = 0.2 was used during the smoothing of the data curves in
this study. An RMSE evaluation across a range of l values
was performed as shown in Fig. S2 (ESIw). In practice,
l values between 0.2–0.3 are used.45
To assess the performance of EWMA and SG filtering
techniques, we evaluated average root-mean-squared-error
(RMSE) between smoothed and raw data as a goodness-
of-fit criterion. The goodness-of-fit statistics describe how well
smoothed values fit experimental data (i.e., coefficient of
determination (R2
), mean squared error (MSE), and root-
mean-squared-error (RMSE)). Small values of the average
RMSE indicate a good fit. Both techniques showed similar
performances for the commonly chosen parameters as displayed
in Fig. S3 (ESIw).
Outlier detection and smoothing. The OC kinetics data
contained random sharp peaks in certain areas due to signal
loss or stochastic sensor intensity fluctuations. We detected
these outliers using traditional control charts theory using the
following equation
L = %x Æ w^s, (5)
where L represents the upper (+) and lower (À) control limits,
%x is the mean value of the response, w is the parameter that
determines the width of the limits, and ^s is an estimated value
of variation. Data points outside the limits calculated using
eqn (5) were considered outliers. ^s was estimated through an
initial filtering step. Each signal undergoes a filtering step as
the ones described in the earlier subsections on low-pass
filtering to estimate its individual variation metric. Smoothed
values resulting from this step are obtained, and the variation
of the raw data points is computed using the Root-Mean-
Squared-Error (RMSE) metric. We assumed ^s to be a
constant, which is not necessarily true. However, because ^s
is utilized for the detection of distant outliers only, this
assumption is adequate. To determine the w parameter
(control width constant) we studied several options. The value
for w was chosen to be equal to 2, as with this value of w on
average 10% of all data points within an OC kinetics curve are
detected as outliers (Fig. S4, ESIw). As expected, higher or
lower values of w resulted in smaller or larger fractions,
respectively, of the data to be outside the imposed boundaries
and detected as outliers. Choosing w = 2 resulted in about
10% of the points within the curve to be classified as outliers.
Naturally, higher values of w, i.e. 3, 4, and 5, showed smaller
percentages ranging from 0% to B5% and smaller values
(w = 1) resulted in a higher percentage (B25%) of data points
detected as outliers (Fig. S4, ESIw). Hence, w = 2 seemed a
reasonable estimation to reduce random noise due to outliers
without excluding too much of the actual signal data from the
analysis. After detection, the outliers were smoothed out by
using a simple 2-neighbor averaging procedure where the
DownloadedbyArizonaStateUniversityon14March2012
Publishedon05January2012onhttp://pubs.rsc.org|doi:10.1039/C2MB05429A
View Online
12. This journal is c The Royal Society of Chemistry 2012 Mol. BioSyst., 2012, 8, 804–817 815
outlier values are replaced with values computed as the
average of its two adjacent neighbor’s values. The low-pass
filter was re-applied to the entire dataset afterwards.
Feature extraction models
Cumulative sum control (CUSUM) charts: change detection.
With the use of the cumulative sum (CUSUM) control charts,
small changes in the mean value are more efficiently detected
than Shewhart control charts.29
To apply the CUSUM
procedure, the OC curves were order-reversed to identify the
deviation from zero (tail). The OC response signals portray the
behavior of oxygen consumption over time. When it reaches
its minimum value (zero) the signal shows a constant behavior
or a tail of zeros from that time point on. Hence, the time
point at which the signal reaches zero can be obtained by
capturing a deviation within the constant region of zero values
which occur at the end of the signal. Reversing the order of the
signal facilitates the application of CUSUM charts to detect
deviations from zero.
Two input parameters are needed to calculate the CUSUM
statistic (Ck): the subgroup size (k) and the in control mean
(in this study m0 = 0). The parameter Ck is defined in eqn (6) by
k, m0, and the computed mean of the sub-sample of size k ( %xk).
Ck is calculated along the entire sample range.
Ck ¼
Xk
j¼1
ðxk À m0Þ ð6Þ
Other parameters needed to be determined when the process is
out of control (in this study m0 a 0) are decision interval and
amount of shift to detect (slack). Recommended values for
these parameters are decision interval of size 5 and a slack
value of 3.46–48
Piecewise linear regression model. The methodology imple-
mented in this paper for feature extraction consists of fitting a
piecewise linear regression model to each OC kinetics curve. In
general, the piecewise linear regression is used to describe a
nonlinear behavior by fitting the data to a number of linear
segments. In the methodology implemented here two linear
regression models were constrained to connect at the same
breakpoint. We considered a special case of two linear regres-
sions intersecting at a single point at time tc (‘‘change-point’’)
as shown in eqn (7) with the indicator variable It Z tc
= 1, when
t Z tc.49
Both linear regressions were described in one
function y with the use of an indicator variable It Z tc
to define
both regression functions each with constrained slopes b1 and
b1 + b2 as shown in eqn (7). The slope parameters were
constrained to non-positive values due to decreasing oxygen
concentration in the microchambers.
y = b0 + b1t + b2(t À tc)It Z tc
(7)
b1 r 0 and b1+b2 r 0 8 curves
To find the change-point, a likelihood method was used to
minimize the sum squared error (SSE) of the fit of the kinetics
data to two linear regressions. During the fit, an exhaustive
search was performed along the time axis to determine the
change-point and the coefficient estimates that minimize SSE.
Once the change-point was found, the features (Table 1) were
extracted from the piecewise linear model for different experi-
mental conditions (i.e., CP-A, CP-C). The fit to the
constrained piecewise linear regression with one-breakpoint
was statistically compared to the fit to a simple linear regression
model using an F test. To perform the F test, an F statistic is
computed as shown in eqn (8) where SSEModel1 and SSEModel2
refer to the sum squared error of the simple linear regression
and the constrained piecewise linear regression models respec-
tively. Other inputs in eqn (8) are p and n; p is the number of
parameters estimated for each model (i.e., Model1 or Model2)
and n is the total number of data points in the signal.
F ¼
SSEModel1 À SSEModel2
pModel2 À pModel1
SSEModel2
n À pModel2
ð8Þ
if - F Fa,pModel2ÀpModel1
,nÀpModel2
- Model2 performs better.
The model comparison by an F test was performed for
every single curve resulting in a multiple hypothesis testing
problem. A commonly known problem in multiple hypotheses
testing is the increase of false positives. Several approaches such
as the Bonferroni correction exist to alleviate this
problem. This widely used technique is applied when multiple
statistical tests are computed simultaneously in order to reduce
false positives by reducing the value of a, the significance
level of the test. Another way in which the value of a can be
reduced is by adjusting all the p-values from the individual tests
as shown in eqn (9), where n is the number of
comparisons.31,50,51
pvalue.adjusted[c] = min(pvalue[c] Â n, 1) c A [1,n] (9)
Comparisons and classification techniques
Statistical significance tests. The extracted features were
studied and compared between the two cell lines using tradi-
tional statistical tools such as histograms, confidence intervals
and statistical tests of the mean and median. The statistical
significance of the difference between the means was deter-
mined using the analysis of variance (ANOVA) test which
generalizes the t-test for more than two groups but relies on
several assumptions that may or may not be met for this
particular data structure. ANOVA was performed with caution
to get a general sense of the groups’ mean from the ANOVA
hypothesis shown in eqn (10). In addition to ANOVA, we
performed significance tests for the differences between the
median values using nonparametric tests which waive the strict
assumptions inherent to ANOVA. The median or rank test was
performed using the Mann–Whitney–Wilcoxon test52,53
for a
two-level group test and the Kruskal–Wallis test54
for more
than two groups. Both tests are nonparametric approaches for
evaluating differences in the location shift of the distribution of x
for each group. Eqn (11) represents the analytical expression of
the Kruskal–Wallis test, where ni is the number of observations
in group i, rij is the rank of observation j from group i, and N is
the total number of observations for all groups. The p-value
corresponding to a particular K is approximated through the
w2
distribution.54
H0: m1 = m2 =Á Á Á= mn (10)
DownloadedbyArizonaStateUniversityon14March2012
Publishedon05January2012onhttp://pubs.rsc.org|doi:10.1039/C2MB05429A
View Online
13. 816 Mol. BioSyst., 2012, 8, 804–817 This journal is c The Royal Society of Chemistry 2012
K ¼ ðN À 1Þ
Pg
i¼1
niðri À rÞ2
Pg
i¼1
Pni
j¼1
ðrij À rÞ2
ð11Þ
Ensemble classifier: Random Forest. To further explore
potential relationships among several groups of OC curves,
we applied an ensemble classifier based on decision trees. The
two cell lines (CP-A and CP-C) at the single-cell or two-cell
levels (i.e., CP-A_1, CP-A_2, CP-C_1, and CP-C_2) were
defined as the four classes for the classifier model with features
from the OC curves used as predictors. The decision trees
can be applied in almost all scenarios. Therefore, they provide
a good starting point for modeling heterogeneous and
large data sets. The decision trees apply to either a numerical
or categorical response and are nonlinear, simple, and fast.
The decision trees are scale-invariant and robust to missing
values. However, a single tree is produced by a greedy algo-
rithm that generates an unstable model.34
Consequently,
ensemble methods have been used to counteract the instability
of a single tree.
Supervised ensemble methods build a set of simple models
called base learners and use a weighted outcome for each base
learner in a voting scheme to predict future data. In other
words, ensemble methods merge outputs from multiple base
learners to create a voting committee to improve performance.
Many empirical studies have shown that ensemble methods
often outperform any single base learner.35
The Random Forest classifier is an improved bagging
method which basically exploits the benefits of bootstrapping
sampling through modeling. It grows a forest of random
decision trees on bagged samples yielding accurate results,
comparable with the best known classifiers.34
An advantageous
property of Random Forest classifiers is that they limit over
fitting through embedded out-of-bag (OOB) error estimation.
The out-of-bag error estimation for the ith tree in the Random
Forest model is computed using a percentage of cases not used
in the learning for this ith tree. Other advantages of Random
Forest models are: simple to train and tune in many appli-
cations, computationally efficient, can handle a large number
of variables, provide variable importance scores, embedded
method to estimate missing data, generation of a proximity
matrix among cases, handle variable interactions, can be
adapted to balance error due to datasets with unbalanced
numbers of samples, and capable of extending to unlabeled data
for unsupervised clustering, data views and outlier detection.34
Algorithm: a simple pseudocode for Random Forest classifier
construction is shown below.34,35
Select a number of cases independently, with replacement
from the original dataset to build the training data.
Use training data to grow a tree:
3 Select v variables at random from the total number of
input variables (V) where v { V.
3 Best variable among the v predictors is chosen to maximize
the information gain of the split.
3 Split the chosen node into two daughter nodes based on
the best variable.
Repeat Step 2 until all trees are built.
Output the ensemble of trees.
Important features of Random Forest classifiers are OOB
sampling, variable importance, and proximity plots. OOB
sampling is identical to cross-validation and, since Random
Forest is performed in parallel trees, a cross-validation can be
done along the way. Variable importance is a key feature of
Random Forests. The variables are ranked based on their
improvement in the empirical loss function among all trees,
meaning that variables that are chosen often in the trees
provide better predictive power or they minimize the loss
function. These proximity distances are measured by putting
all the data, training and out-of-bag, through the grown trees.
If instances i and j are in the same terminal node their
proximity increases by one and so on through all the trees.34
Then proximities are normalized by the number of trees in
the model.
State-of-the-art visualization methods such as multidimensional
scaling36
are used to illustrate how well features discriminate
among different conditions. Multidimensional scaling represents
high-dimensional data in a lower-dimensional space (often two or
three dimensions) in order to better visualize any structure in the
data. The algorithm generates points in the lower-dimensional
space that approximately preserve the pair-wise distances between
the points in the high-dimensional space.55
Conflict of Interest: none declared.
Acknowledgements
The authors would like to thank the personnel and support of
the Center for Biosignatures Discovery Automation in the
Biodesign Institute at Arizona State University. Funding: this
research is supported by the National Institutes of Health
(NIH), National Human Genome Research Institute
(NHGRI), Center of Excellence in Genomic Science (CEGS),
grant number 5 P50 HG002360 to Deirdre R. Meldrum.
References
1 M. Lidstrom and D. R. Meldrum, Life-on-a-chip, Nat. Rev.
Microbiol., 2003, 158, 164.
2 D. J. Wang and S. Bodovitz, Single cell analysis: the new frontier in
‘omics’, Trends Biotechnol., 2010, 28(6), 281–290.
3 T. Kalisky and S. R. Quake, Single-cell genomics, Nat. Methods,
2011, 8(4), 311–314.
4 N. Navin, J. Kendall, J. Troge, P. Andrews, L. Rodgers,
J. McIndoo, K. Cook, A. Stepansky, D. Levy, D. Esposito,
L. Muthuswamy, A. Krasnitz, W. R. McCombie, J. Hicks and
M. Wigler, Tumour evolution inferred by single-cell sequencing,
Nature, 2011, 472(7341), U90–U119.
5 E. J. Kostelich and T. Schreiber, Noise reduction in chaotic time-
series data: A survey of common methods, Phys. Rev. E: Stat. Phys.,
Plasmas, Fluids, Relat. Interdiscip. Top., 1993, 48, 1752–1763.
6 S. J. Orfanidis, Introduction to Signal Processing, Prentice-Hall,
Englewood Cliffs, NJ, 1996.
7 J. Brocker, U. Parlitz and M. Ogorzalek, Nonlinear Noise
Reduction, Proc. IEEE, 2002, 90(5), 898–918.
8 M. Schena, D. Shalon, R. W. Davis and P. O. Brown, Quantitative
monitoring of gene expression patterns with a complementary
DNA microarray, Science, 1995, 270(5235), 467–470.
9 D. A. Lashkari, J. L. DeRisi, J. H. McCusker, A. F. Namath,
C. Gentile, S. Y. Hwang, P. O. Brown and R. W. Davis, Yeast
microarrays for genome wide parallel genetic and gene expression
analysis, Proc. Natl. Acad. Sci. U. S. A., 1997, 94(24), 13057–13062.
10 V. G. Cheung, M. Morley, F. Aguilar, A. Massimi,
R. Kucherlapati and G. Childs, Making and reading microarrays,
Nat. Genet., 1999, 21, 15–19.
DownloadedbyArizonaStateUniversityon14March2012
Publishedon05January2012onhttp://pubs.rsc.org|doi:10.1039/C2MB05429A
View Online
14. This journal is c The Royal Society of Chemistry 2012 Mol. BioSyst., 2012, 8, 804–817 817
11 S. K. Moore, Making chips to probe genes, IEEE Spectrum, 2001,
38(3), 54–60.
12 W. Torres-Garcia, W. W. Zhang, R. Johnson, G. Runger and
D. R. Meldrum, Integrative analysis of transcriptomic, proteomic
data of Desulfovibrio vulgaris: a nonlinear model to predict abundance
of undetected proteins, Bioinformatics, 2009, 25, 1905–1914.
13 W. Torres-Garcia, S. D. Brown, R. H. Johnson, W. W. Zhang,
G. Runger and D. R. Meldrum, Integrative analysis of transcrip-
tomic and proteomic data of Shewanella oneidensis: missing value
imputation using temporal datasets, Mol. BioSyst., 2011, 7(4),
1093–1104.
14 M. L. T. Lee, F. C. Kuo, G. A. Whitmore and J. Sklar, Importance
of replication in microarray gene expression studies: Statistical
methods and evidence from repetitive cDNA hybridizations, Proc.
Natl. Acad. Sci., 2000, 97(18), 9834–9839.
15 D. E. Carter, J. F. Robinson, E. M. Allister, M. W. Huff and
R. A. Hegele, Quality assessment of microarray experiments, Clin.
Biochem., 2005, 38(7), 639–642.
16 J. Seo, M. Bakay, Y. W. Chen, S. Hilmer, B. Shneiderman and
E. P Hoffman, Interactively optimizing signal-to-noise ratios in
expression profiling: project-specific algorithm selection and detection
p-value weighting in Affymetrix microarrays, Bioinformatics, 2004,
20(16), 2534–2544.
17 T. Howlader and Y. P. Chaubey, Noise Reduction of cDNA
Microarray Images Using Complex Wavelets, IEEE Trans. Image
Process., 2010, 19(8), 1953–1967.
18 Y. Saeys, I. Inza and P. Larran˜ aga, A review of feature selection
techniques in bioinformatics, Bioinformatics, 2007, 23(19),
2507–2517.
19 J. P. Stevens, Intermediate Statistics. A Modern Approach,
Lawrence Erlbaum Associates Publishers, Mahwah, NJ, Second edn,
1999.
20 J. X. Pan and K. T. Fang, Growth Curve Models and Statistical
Diagnostics, Springer Series in Statistics, 2002.
21 S. E. Maxwell and H. D. Delaney, Designing Experiments and
Analyzing Data: A Model Comparison Perspective, Lawrence
Erlbaum, Second edn, 2003.
22 S. Weerahandi, Generalized inference in repeated measures: Exact
methods in MANOVA and mixed models, Wiley-Interscience, 2004.
23 Applied regression analysis and other multivariable methods, ed.
D. G. Kleinbaum, L. L. Kupper and K. E. Muller, PWS Publishing
Co., Boston, MA, USA, 4th edn, 2008.
24 Y. Anis, M. Holl and D. Meldrum, Automated selection and
placement of single cells using vision-based feedback control, IEEE
Trans. Autom. Sci. Eng., 2010, 7(3), 598–606.
25 H. Zhu, M. Holl, T. Ray, S. Bhushan and D. R. Meldrum,
Characterization of deep wet etching of fused silica glass for single
cell and optical sensor deposition, J. Micromech. Microeng., 2009,
19, 6.
26 Y. Tian, B. R. Shumway, C. Youngbull, Y. Li, A. K. Y. Jen,
R. H. Johnson and D. R. Meldrum, Dually fluorescent sensing
of ph and dissolved oxygen using a membrane made from poly-
merizable sensing monomers, Sens. Actuators, B, 2010, 47(2),
714–722.
27 S. Ashili, L. Kelbauskas, J. Houkal, D. Smith, Y. Tian,
C. Youngbull, H. Zhu, Y. Anis, M. Hupp, K. Lee, A. Kumar,
J. Vela, A. Shabilla, R. Johnson, M. Holl and D. Meldrum,
Automated platform for multiparameter stimulus response studies
of metabolic activity at the single-cell level, Proceedings Vol. 7929,
Microfluidics, BIOMEMS, and Medical Microsystems IX, 2011.
28 L. Kelbauskas, S. Ashili, J. Houkal, D. Smith, A. Mohammadreza,
K. Lee, A. Kumar, Y. Anis, T. Paulson, C. Youngbull, Y. Tian,
R. Johnson, M. Holl and D. Meldrum, A novel method for multi-
parameter physiological phenotype characterization at the since-cell
level, Proceedings Vol. 7902, Imaging, Manipulation and Analysis of
Biomolecules, Cells, and Tissues IX, 2011.
29 D. Montgomery, Introduction to Statistical Quality Control,
Wiley Higher Education, 2005.
30 T. Molter, S. C. McQuaide, M. Zhang, M. R. Holl, L. W. Burgess,
M. E. Lidstrom and D. R. Meldrum, Algorithm advancements for
the measurement of single cell oxygen consumption rates, IEEE
International Conference CASE 2007, Automation Science and
Engineering, 2007, 386–391.
31 J. P. Shaffer, Multiple Hypothesis Testing, Annu. Rev. Psychol.,
1995, 46, 561–584.
32 J. K. Joseph, D. Bunnachak, T. J. Burke and R. W. Schrier,
A novel method of inducing and assuring total anoxia during in vitro
studies of O2 deprivation injury, J. Am. Soc. Nephrol., 1990, 1, 837–840.
33 K. C. Ho, J. K. Leach, K. Eley, R. B. Mikkelsen and P. S. Lin,
A simple method of producing low oxygen conditions with Oxyrase
for cultured cells exposed to radiation and Tirapazamine, Am. J. Clin.
Oncol., 2003, 26(4), e86–e91.
34 L. Breiman, Random forests, Mach. Learn., 2001, 45, 5–32.
35 T. Hastie, R. Tibshirani and J. H. Friedman, The Elements of
Statistical Learning—Data Mining, Inference, Prediction, Springer
Verlag, 2nd edn, 2009.
36 T. F. Cox and M. A. Cox, Multidimensional scaling, Chapman and
Hall, London, 1994.
37 L. Breiman, J. Friedman, C. J. Olshen and R. A. Stone, Classification
and Regression Trees, Wadsworth International, Belmont, CA, 1984.
38 S. J. Altschuler and L. F. Wu, Cellular Heterogeneity: Do Differences
Make a Difference?, Cell, 2010, 141(4), 559–563.
39 A. Savitzky and M. J. E. Golay, Smoothing and differentiation of
data by simplified least squares procedures, Anal. Chem., 1964,
36(8), 1627–1639.
40 R. A. Leach, C. A. Carter and J. M. Harrister, Least-squares
polynomial filters for initial point and slope estimation, Anal.
Chem., 1984, 56(13), 2304–2307.
41 P. Persson and G. Strang, Mathematical systems theory in biology,
communications, computation, and finance, Springer, 2002.
42 Z. B. Alfassi, Z. Boger and Y. Ronen, Statistical Treatment of
Analytical Data, CRC Press, Blackwell Science, Boca Raton, FL,
2005.
43 P. A. Gorry, General least-squares smoothing and differentiation
by the convolution (Savitzky–Golay) method, Anal. Chem., 1990,
62(6), 570–573.
44 B. Walczak, Wavelets in chemistry, Elsevier Science, 2000, vol. 22.
45 J. Hunter, The exponentially weighted moving average, J. Qual.
Technol., 1996, 18(4), 203–210.
46 J. Pignatiello and G. C. Runger, Comparison of multivariate
CUSUM charts, J. Qual. Technol., 1990, 22, 173–186.
47 S. S. Prabhu, G. C. Runger and D. C. Montgomery, Selection of
the subgroup size and sampling interval for a CUSUM control
chart, IEEE Trans., 1997, 29, 451–457.
48 V. Golosnoy, S. Ragulin, W. Schmid, Multivariate CUSUM chart:
properties and enhancements, AStA Advances in Statistical Analysis,
Springer, 2009, vol. 93(3), 263–279.
49 R. A. Berk, Statistical Learning from a Regression Perspective,
Springer Science + Business Media, LLC, New York, 2008.
50 Y. Benjamini and Y. Hochberg, Controlling the false discovery rate: a
practical and powerful approach to multiple testing, J. R. Stat. Soc.
Ser. B, 1995, 57, 289–300.
51 Y. Benjamini and D. Yekutieli, The control of the false discovery
rate in multiple testing under dependency, Ann. Stat., 2001, 29,
1165–1188.
52 F. Wilcoxon, Individual comparisons by ranking methods,
Biometrics Bull., 1945, 6, 80–83.
53 H. B. Mann and D. R. Whitney, On a Test of Whether one of Two
Random Variables is Stochastically Larger than the Other, Ann.
Math. Stat., 1947, 18(1), 50–60.
54 W. H. Kruskal and W. A. Wallis, Use of ranks in one-criterion
variance analysis, J. Am. Stat. Assoc., 1952, 47(260), 583–621.
55 C. H. Chen, W. Hardle, A. Unwin, M. Cox and T. F. Cox,
Handbook of data visualization. In Springer Handbooks Comp.
Statistics, chapter Multidimensional Scaling, Springer, Berlin
Heidelberg, 2008, pp. 315–347.
DownloadedbyArizonaStateUniversityon14March2012
Publishedon05January2012onhttp://pubs.rsc.org|doi:10.1039/C2MB05429A
View Online