HDAT: Web Tools for High-Throughput Screening Data
, Taimur Hassan1
, Robert Rallo2
, and Yoram Cohen3
California Nanosystems Institute, University of California, Los Angeles,
Departament d’Enginyeria Informatica i Matematiques, Universitat Rovira i
Virgili, Av. Paisos Catalans 26, 43007 Tarragona, Catalunya, Spain
Chemical and Biomolecular Engineering Department, University of California,
Los Angeles, CA 90095
Abstract. With the increased use of High-Throughput Screening (HTS) in
toxicity studies for Engineered Nano-Materials (ENMs), there is a need for tools
that can process and analyze a vast amount of HTS data efficiently and reliably.
In order to meet this need, a set of online HTS Data Analysis Tools (HDAT) were
developed, providing certain statistical methods suitable for ENM toxicity data.
As a publicly available computational nanoinformatics infrastructure, HDAT
provides several plate normalization methods and HTS summarization statistics,
Self-Organizing Maps (SOM) based clustering analysis, and visualization using
both heatmap and SOM. HDAT has been successfully used in a number of HTS
studies for ENM toxicity. In order to help researchers utilize this nanoinformatics
tool for their own HTS studies, this work introduces the main features of HDAT
along with a usage demonstration using a real HTS data set of ENM toxicity.
Nano-sized materials are increasingly utilized as common elements in many modern industrial
products and processes primarily due to their novel beneficial properties (technological, medical,
and economical benefits) arising at the nano-scale [1, 2]. At the same time, it becomes evident
that some Engineered Nano-Materials (ENMs) may have adverse impacts on environment and
human health [3-5]. As a result, there is increased public concern regarding the safety of ENMs
throughout their lifecycle . Efforts are now underway to map the general principles that
govern the toxicity potential associated with health and safety impacts of ENMs [7-13]. In this
regards, toxicity screening is critical for characterization of the potential hazard of ENMs in
order to provide essential information for risk assessment and the establishment of safe-use
guidelines for ENMs [7, 14-19]. However, it is a formidable task to generate the required toxicity
characterization data necessary to cope with the expected growth in use and diversity of ENMs.
One solution to this challenge is High-Throughput Screening (HTS) [20-23]. HTS has
been introduced to the toxicological researchers to replace labor-intensive and descriptive
toxicological approaches . The National Academy of Sciences (NAS) has put forth a vision
and strategy for using HTS approaches as a fast, robust, and mechanistic platform to assess
multiple toxicants . US-EPA also utilized the HTS approach in its ToxCast program .
Recently, considerable effort has been devoted to the development and use of HTS methods for
ENM toxicity assessment in order to cope with the large number of existing and expected ENMs.
Advances in this research direction [9-13, 25-27] have demonstrated that HTS is a suitable
approach that can efficiently generate ENM toxicity data, as required by risk assessment
strategies and environmental and health regulatory policy development for ENMs [19, 21].
As the application of HTS expands in ENM toxicity studies, researchers are confronted
with the challenge of processing/analyzing a vast amount of HTS data efficiently for reliable
inference about ENM toxicity. Moreover, compared with chemical compounds, ENM toxicity
data generated using HTS has high noise level due to various uncontrolled nano-effects [19, 21].
Therefore, despite the existence of publicly available tools for general HTS data analysis [28, 29],
the statistical methods provided by these tools are inadequate for robust and reliable inferences
of ENM toxicity from HTS data. Although researchers in environmental and health risk
assessment of ENMs have identified certain statistical methods well suited for ENM toxicity data
(e.g., Strictly Standardized Mean Difference (SSMD) [30-33]), these methods are not currently
available in publicly available HTS data analysis tools.
In order to provide the statistical methods suitable for HTS data analysis in ENM risk
assessment, a set of online HTS Data Analysis Tools (HDAT, publicly available at
nanoinfo.cein.ucla.edu/public/hdat) have been developed as one of the fundamental
nanoinformatics infrastructures [19, 34-37] of UC Center for Environmental Implications of
Nanotechnology (UC-CEIN, www.cein.ucla.edu). HDAT provides several plate normalization
methods and HTS summarization statistics, as well as Self-Organizing Maps (SOM)  based
clustering analysis and visualization using both heatmap and SOM. In the present work, the main
features of HDAT are introduced along with a usage demonstration using a real HTS data set for
ENM toxicity , so that researchers can utilize this computational nanoinformatics resource for
their own HTS data analyses.
2. HTS data analysis workflow
In ENM toxicity studies [9, 10, 12, 13, 19, 38], HTS data analysis usually follows the workflow
depicted in Figure 1.
Figure 1. Workflow of HTS data analysis for ENM toxicity
The first step of HTS data analysis is plate operations, including plate visualization,
outlier removal, and plate normalization. Plate visualization allows initial visual inspection of
data from each HTS plate (e.g., consistence in sample replicates and effectiveness of
positive/negative controls) and helps choose suitable statistical methods for the data. Outlier
removal is required by HTS data analysis to exclude abnormally deviated value for robust and
reliable inference of ENM toxicity [38, 39]. Plate normalization is an important plate operation,
which is required by HTS data analysis to account for plate-to-plate variability, remove
systematic errors (e.g., positional effects ), and compare/combine data from different plates
In HTS experiments, replicates are commonly used in order to compensate experimental
variability . Replicated measurements can significantly improve the reliability of estimates
for sample activity (e.g., ENM toxicity) . In plate normalization, sample wells are treated
individually irrespective of replicates. Therefore, a HTS process step is required in order to make
reliable estimation for sample activity by summarizing replicated measurements using various
statistics. Based on summarized HTS data, hit-identification  can be performed to select
samples of high activity (i.e., “hits”) for further confirmation. Heatmap generation which is a
HTS process function which depicts sample activity (summarized HTS data) in colors for visual
inspection, is also a feature.
Once HTS data is statistically summarized, various data mining tasks  can be
performed to extract useful information for ENM risk assessment and decision making, for
example, clustering analysis [12, 40] can group together ENMs of similar HTS toxicity profiles,
indicating that these ENMs might possess common toxicity mechanisms. Activity-activity
relationships identified for different HTS toxicity assays can be used to guide experiment design
(e.g., choose independent toxicity assays for HTS experiments). Structure-Activity-Relationships
(SARs) [38, 41] can also be deduced from HTS toxicity data, which predicts toxicity of ENMs
based on their physicochemical properties.
In the workflow shown in Figure 1, plate operation, HTS process, and clustering analysis
are essential for HTS data analysis [21, 39], for which HDAT provides various methods. .
Although activity-activity relationships and SAR are also important information, these analyses
are usually performed separately using specialized tools since they require sophisticated model
development as well as additional data (e.g., physicochemical properties for ENMs) [38, 41].
3. Main features of HDAT
The web inference of HDAT is illustrated in Figure 2, through which formatted HTS data can be
uploaded for analysis and visualization. Main features of HDAT are described in the following
subsections. A real HTS data set obtained from a recent HTS study for toxicity of metal oxide
nanoparticles (NPs)  is used in the demonstration of HDAT features. The data set provided
measured toxicological responses (via four HTS assays, including surface membrane
permeability (by PI uptake), intracellular calcium flux (by Fluo4 fluorescence Indicator),
Reactive Oxygen Species (ROS) production (by MitoSox Red fluorescence indicator), and
mitochondrial membrane potential (by JC1 fluorescence indicator)) of murine myeloid (RAW
264.7) cells to eight metal oxide NPs (Al2O3, CeO2, CoO, Gd2O3, HfO2, In2O3, Mn2O3, and
Ni2O3) in the size range of ~15-140 nm, over exposure concentration of 0.39-200 mg/L, and
exposure periods of up to 24 h . A set of 384-multiwell plates (Greiner Bio-One, Monroe, NC)
for different assays were used in the HTS experiment . Each plate contained quadruplicates of
the eight NPs at each concentration as well as two columns of negative control wells (i.e., in
which cells were not exposed to NPs). This example HTS data is provided with HDAT.
Figure 2. Web interface of HDAT.
3.1 Standardized data format with flexible configuration
HTS utilizes a standardized HTS plate data format, which contains both data and configuration
sections (Figure 3). The configuration section describes how the samples and controls are
arranged (see Figure 3a), followed by data sections providing the actual HTS data (Figure 3b). In
the configuration section, samples of the same name are recognized by HDAT as replicates
irrespective of their individual location in HTS plate. Special labels “-1”, “1”, and “0” are
reserved for identification of negative control, positive control, and ignored wells respectively.
By labeling wells as “ignored”, missing values, erroneous data, and undesired data can be easily
excluded from subsequent analyses. Like sample wells, these special plate wells can be arranged
at any well location. The flexible configuration feature is especially useful when HTS
experiments adopt randomized arrangement of samples and controls to reduce positional effects
Figure 3. Plate data format of HDAT, which is comprised by (a) Plate configuration section and
(b) Plate data section.
In an input file of HDAT (comma delimited CVS file), plate data sections are listed
below their configuration section. An input file can have multiple configurations, which do not
have to be the same size and can be from different HTS experiments (Figure 4). This design
allows HDAT to perform batch analyses for multiple HTS experiments.
Figure 4. Input file structure of HDAT.
3.2. Plate operations
The plate operations offered by HDAT include plate visualization, outlier removal, and plate
normalization. The plate visualization of HDAT is highly customizable, allowing users to tune
map cell size, color scheme, and lower/upper color scale limits to best represent their plate data
(see Figure 5a for the visualization of plate “Fluo-T2” (Fluo4 assay values after 2 h exposure
period) of the example HTS data).
HDAT adopts box-plot approach  to identify and remove outliers (abnormally
deviated values) in control wells. Given a set of data points, box-plot identifies those outside the
range [Q1-1.5(Q3-Q1), Q3+1.5(Q3-Q1)] as outliers (in which Q1 and Q3 are the first and third
quartiles of the data, respectively) . For a normally distributed population, data points outside
the above range are unlikely (<1%) members of the control population. Figure 5b shows the
“Fluo-T2” plate after control outliers were removed. In HDAT, outlier removal is only
conducted to (positive/negative) control wells not sample wells due to the usually limited
replicates are used for samples in HTS experiments .
Figure 5. Visualization of plate “Fluo-T2” from the example HTS data. (a) Raw plate data, (b)
outlier removed from negative controls. In plate visualization, negative and positive control wells
are identified by map cells of green and red borders, respectively. The sample wells are without
border and ignored wells are left empty. In outlier removal, control wells identified as outliers
are labeled as “ignored”, which will be excluded from subsequent statistical analyses.
The plat normalization methods [29, 39, 42-45] available in HDAT are listed below (in
their formulas xi denotes a sample value with negative/positive control value represented by c-/c+;
μ and σ denote average and standard deviation, respectively):
a. Signal to negative control ratio : μxi/μc- (as known as fold increase).
b. Signal to positive control ratio : μxi/μc+.
c. Signal to noise ratio : (μxi - μc-)/σc-.
d. Normalized percent inhibition : (μc+ - μxi)/(μc+ - μc-).
e. Z-score [29, 39]: (xi - μx)/σx.
f. Robust Z-score : A robust version of Z-score using median and Median Absolute
Deviation (MAD = median(xi-median(x))) in place of the average and standard deviation
g. Median Polish [39, 42, 43, 45]: A method to reduce the positional effects . Median
polish works by alternately removing the row and column medians, and continues until
the proportional reduction in the sum of absolute residuals is less than a given threshold.
The residual of the plate well in i-th row and j-th column obtained by median polish is
given by rij = xij - x'ij = xij - (μ' + Ri + Cj), in which μ' is the estimated average of the plate
with Ri and Cj denoting the estimate systematic for the i-th row and j-column,
h. B-score [29, 39, 42-45]: Another robust analog of the Z-score which intends to reduce
measurement bias due to positional effects and is resistant to statistical outliers. B-score
can be calculated based on median polish as rij/MAD(rij).
It is important to note the above plate normalization methods associated with different
hypotheses about data [29, 39]. Suitability of a normalization method for a given HTS data
depends on whether its hypothesis holds on the data.
3.3. HTS process
The HTS Process of HDAT provides a number of statistical methods to make reliable estimation
for sample activity from replicated measurements, which includes mean (with standard
deviation), median (with MAD), Z factor [29, 33, 46], SSMD (together with its standard
deviation) [30-33]. Among them, the mean and median are simple statistics to estimate sample
activity from replicated measurements, while Z factor and especially SSMD are advanced
statistics that consider both mean and variance of sample and control. For a simple x, its Z factor
and SSMD are defined by 1-3(σx+σc)/|μx-μc| and 22
)( cxcx   , respectively.
HDAT offers customizable heatmap (which map cell size, color scheme, and lower/upper
color scale limits can be set by users) for illustration of summarized HTS data (see Figure 6 for
the heatmap generated by HDAT for the SSMD of the example HTS data). HDAT also provides
a hit-identification  feature using the above summary statistics to detect samples that induce
significantly up and/or down regulation to certain assays. A “cleaned” heatmap will be generated
for those identified as “hits” (see Figure 7 for the “hits” identified by HDAT using a threshold of
SSMD>3, which indicates that the sample activity is significantly higher than control value).
Figure 6. Heatmap for the SSMD of the example HTS data. Each row of the heatmap represents
the SSMD values calculated for each plate data (identified by assay name and exposure period (-
T##)). NP concentrations (0.39-200 mg/L) are identified by the numbers (01-10) appended to the
Figure 7. “Hits” identified from the example HTS data (denoted by red cells in the map).
3.4 SOM based clustering analysis and visualization
SOM [12, 47] is provided by HDAT for identification of clusters of similar samples from multi-
dimensional HTS data (e.g., HTS using multiple toxicity assays). SOM analysis builds an
ordered two-dimensional visualization from summarized HTS data, where most similar samples
are grouped into the same SOM unit with the similarity between different SOM units indicated
by their geometric distance (i.e., the original distance relationships (topology) are preserved) [12,
47, 48]. Based on the geometric distance, the SOM units are further grouped together into
different clusters representing major groups of similar samples (see Figure 8 for the clustered
SOM built by HDAT from the SSMD of the example HTS data).
Figure 8. Clustered SOM built based on the SSMD of the example HTS data. Five clusters
(marked in different colors) of similar NPs (in terms of the HTS profiles including the four
assays over the exposure period) were identified by the SOM analysis. Most similar NPs were
grouped into the same SOM unit (hexagon). The numbers (01-10) appended to the NP names
denotes the concentrations (0.39-200 mg/L).
3.5 User assistance
To assist uses, several resources such as quick start instructions tips, a video demo for basic
operations, and an instruction file for data formatting and upload (see Figure 1) are provided. In
addition, an example HTS data file (i.e., the metal oxide NP toxicity data used in the
demonstration of the main features of HDAT) is provided for users to explore and practice
various functions of HDAT. Also, users can report problems found with HDAT and leave their
comments on line.
4. Applications and merits of HDAT
HDAT is the current standard HTS data analysis platform of UC-CEIN, which is a testament to
its utility towards achieving UC-CEIN’s goals of adopting HTS for nano-toxicological studies.
Examples of UC-CEIN’s HTS studies that utilized HDAT for data analyses include HTS based
hazard ranking for NPs (metal/oxide and quantum dots) , HTS evaluation of toxicity-related
cell signaling pathways for metal/oxide NPs , HTS assessment for the different response of
undifferentiated and differentiated primary human bronchial epithelial cells to cationic
mesoporous silica NPs (coated with polyethyleneimine) , HTS investigation for cellular
oxidative stress induced by metal oxide NPs , HTS based toxicity labeling for nano-SAR
development (for metal oxide NPs) . Additionally, several demonstrations were held for UC-
CEIN HTS researchers to introduce HDAT statistical features and their suitability towards
various HTS data analysis.
Moreover, HDAT together with a Central Data Manager (CDM) comprises UC-CEIN’s
core nanoinformatics infrastructure. The CDM is not only a data management system for various
ENM data (e.g., HTS/non-HTS toxicity data, characterization data, ecological data, and
experiment protocols) but also a platform for a set of web applications for search, filtration, and
organization of these ENM data.
HDAT has also generated widespread interest in the nanoinformatics community. A
number of presentations, webinars, and demonstrations for HDAT have been given to major
nanoinformatics forums, including Nanotechnology Working Group (nanoWG) meeting,
nanoinformatics workshops, and the International Conference on the Environmental Implications
of Nanotechnology (ICEIN) conferences. Other than UC-CEIN, HDAT is also being used by
other institutes for HTS data analyses. For example, US-EPA is using HDAT in its ENM risk
assessment and computational toxicology researches. The users of HDAT also include RTI
International (www.rti.org). For the past year, HDAT has been accessed 20,000+ times by users
from 20+ countries, leveraging the international initiative in the development of nanoinformatics
resources and tools for acquisition and processing of information relevant to ENMs. Moreover,
as a general HTS data analysis tool, the applications of HDAT is not limited to ENM related
research and can be readily used for any other areas that utilize HTS approach for fast data
This material is based upon work supported by the National Science Foundation and the
Environmental Protection Agency under Cooperative Agreement Number DBI-0830117. Any
opinions, findings, and conclusions or recommendations expressed in this material are those of
the author(s) and do not necessarily reflect the views of the National Science Foundation or the
Environmental Protection Agency. This work has not been subjected to EPA review and no
official endorsement should be inferred. Robert Rallo also acknowledges support provided by
CICYT (Project CTQ2009-14627), Generalitat de Catalunya (2009SGR-01529) and the EU
Commission (OSIRIS, Contract No. 037017).
1.  Guo, Z. and L. Tan, Fundamentals and Applications of Nanomaterials. 1st ed. 2009: Artech House
2.  The Project on Emerging Nanotechnologies: Consumer Products Inventory.  2010  February,
2013]; Available from: www.nanotechproject.org/inventories/consumer/.
3.  Ray, P.C., H.T. Yu, and P.P. Fu, Toxicity and Environmental Risks of Nanomaterials: Challenges
and Future Needs. Journal of Environmental Science and Health Part C‐Environmental
Carcinogenesis & Ecotoxicology Reviews, 2009. 27(1): p. 1‐35.
4.  Kahru, A. and H.‐C. Dubourguier, From ecotoxicology to nanoecotoxicology. Toxicology, 2010.
269(2–3): p. 105‐119.
5.  Jiang, W., et al., Nanoparticle‐mediated cellular response is size‐dependent. Nature
Nanotechnology, 2008. 3(3): p. 145‐150.
6.  Colvin, V.L., The potential environmental impact of engineered nanomaterials. Nature
Biotechnology, 2003. 21(10): p. 1166‐70.
7.  Nel, A., et al., Toxic potential of materials at the nanolevel. Science, 2006. 311(5761): p. 622‐627.
8.  Cattaneo, A.G., et al., Nanotechnology and human health: risks and benefits. Journal of Applied
Toxicology, 2010. 30(8): p. 730‐744.
9.  Zhang, H., et al., Use of Metal Oxide Nanoparticle Band Gap To Develop a Predictive Paradigm
for Oxidative Stress and Acute Pulmonary Inflammation. Acs Nano, 2012. 6(5): p. 4349–4368.
10.  Zhang, H., et al., Differential Expression of Syndecan‐1 Mediates Cationic Nanoparticle Toxicity in
Undifferentiated versus Differentiated Normal Human Bronchial Epithelial Cells. Acs Nano, 2011.
5(4): p. 2756‐2769.
11.  Lin, S.J., et al., High Content Screening in Zebrafish Speeds up Hazard Ranking of Transition Metal
Oxide Nanoparticles. Acs Nano, 2011. 5(9): p. 7284‐7295.
12.  Rallo, R., et al., Self‐Organizing Map Analysis of Toxicity‐Related Cell Signaling Pathways for
Metal and Metal Oxide Nanoparticles. Environmental Science & Technology, 2011. 45(4): p.