1. 1
Gooey data sets: high throughput structural data from complex amphiphiles
Charlotte Broadbent
Civil Engineering, Columbia University, New York, NY 10027
Elaine DiMasi
Photon Sciences Department, Brookhaven National Lab, Upton, NY 11973
2. 2
Abstract
Environmental awareness among consumers has prompted many industries, including
cosmetic companies, to turn to “green” alternatives for their products. Complex amphiphiles,
self-assembled structures of surfactant molecules, give products such as shampoo and liquid
hand soap their physical properties; new formulations can be both physically benign and
environmentally safe to synthesize. In July 2012, in order to determine the molecular properties
of certain complex amphiphiles under varying conditions, small angle x-ray scattering (SAXS)
patterns of over 150 samples were obtained at the National Synchrotron Light Source (NSLS)
X6B beamline. Current techniques (manually treating one data frame at a time) for analyzing
such data are insufficient for the large amount of data that has been acquired; and so, a much
faster method is necessary. This project created a method for consistently treating, visualizing,
and statistically analyzing these large data sets. Using tools such as Python, we developed large-
scale processes in Datasqueeze, a program specifically designed for SAXS images, and
MATLAB, whose graphing and imaging capabilities we utilized. This will enable us to finally
analyze the July 2012 data in an automatic, time-efficient manner. Such techniques can then be
applied not only to studies on complex amphiphiles, but also any other SAXS study at NSLS-II
or similar facility.
3. 3
I. Introduction
Over the past decade, many industries have turned to “green” alternatives for their
products in response to increased environmental awareness among consumers. Lubrizol Corp. is
no exception to this. They produce chemicals that are sold to companies in the personal care
industry, for products such as shampoos, liquid hand soaps, and similar products. Complex
amphiphiles, self-assembled structures of surfactant molecules, give these products several
physical-chemical properties, such as rheology and clarity, and could reduce skin and eye
irritation. In order to test new formulations of complex amphiphiles that are both physically
benign and environmentally safe to synthesize, Lubrizol was interested in using SAXS imaging,
a fast method of determining the structure, and by extension the physical properties, of the new
formulations. In July 2012, representatives from Lubrizol took SAXS images at the NSLS X6B
beamline of over 150 samples of complex amphiphiles; this high-throughput data was enabled by
well-plates (see Figure 1), which could hold many samples at once during automated scans.
Figure 1. Example of a well-plate used in SAXS scans. Holes in the Teflon block are covered by mylar film to
contain liquid and gel samples, then plates clamp the assembly together.
4. 4
However, the July 2012 experiments produced too much data for current techniques to
handle. These current techniques involve manually unwarping (a process to correct from
distortion from the detector) each individual frame, extracting the relevant information, and then
using that information. Not only were there over 150 samples1
, but the experiments produced
over 300 data frames, in part due to the beamline technology; automated scans would scan every
well in a well-plate (or row of a well-plate), regardless of content (see Figure 1). Therefore, the
data from this experiment has been left largely untouched due to the huge amount of time that
would have been necessary to thoroughly analyze it.
The main focus of this project was to develop a method to handle the large amount of
SAXS images from the July 2012 Lubrizol experiments, a process which can hopefully be
applied to other SAXS experiments involving high throughput data.
II. Methods
Thorough analysis of the data first required thorough knowledge of the data. Before any
automated programs were created, I made a spreadsheet so that most of the information was in
the same place. The first column contained the file name; the second, a 1 if the file corresponded
to a sample, a 0 if the file was a background; and the third, the sample name if the previous
column contained a 1. Several other columns were also added, but were to be filled later with the
help of Python.
Secondly, since the detector that was used to take this data did not unwarp the data
automatically, we had to decide whether or not it was necessary. To do this, plots of intensity
versus q were made for two different frames of Silver Behenate (used to calibrate detector
position in the Datasqueeze2
software; see Figure 2): one that was raw data, and one that had
5. 5
been unwarped. Significant differences in the two graphs at higher q ranges told us that
unwarping was necessary (See Figure 3). We then used Python to automatically unwarp all of the
files, by way of a special script created for the beamline.
The next step in the progression was to create a MATLAB program that could plot SAXS
images. We decided to create a program that specifically plotted them based on their well-plate
position; this decision was made in part because we didn’t know relevant information about the
samples (such as surfactant and salt concentration) until late in the experiment. The well-plate
position, however, was contained in the file header. Python code was created to extract this
information, and then import it into the master spreadsheet. From the spreadsheet, it was
imported into MATLAB, where the program I created plotted them. This was done for all
eighteen well-plates.
After that, the main Python code was written to create a “data dictionary”. This code first
used a directory search to find all the file names of the raw data, and for these files extracted the
relevant information from the file header: the well-plate position and the x-ray monitor counts
(this information was also imported to the master spreadsheet). Then, the code read the master
Figure 2. Bragg-rings of Silver
Behenate (AgBe). Known d-spacings
are used to calibrate detector position
in Datasqueeze Software.
Figure 3. AgBe raw data (blue) versus unwarped (red). The
differences in the peaks at higher Q indicate that all the images
must be unwarped in order to maintain accuracy.
6. 6
spreadsheet to assign appropriate values to the variables “background” (true or false) and
“sample” (the name of the actual sample). Then, all the monitor count values were averaged, and
a new variable, “normalized”, was created, which divided the average value by the value of that
specific data frame. Lastly, for those data frames that corresponded to samples, the “associated
background” was a background frame on that same well plate. For each data frame, all of these
variables were saved in a dictionary.
This data dictionary could then be used to obtain the desired results in Datasqueeze.
Python was used to create a batch file for Datasqueeze which read in each data frame that
corresponded to a sample, normalized it, and subtracted its normalized associated background.
We were thus left with diffraction data from only the sample itself. Using this data, plots of
intensity versus q were made, for the whole data frame as well as “slices” of the pattern, in
addition to plots of intensity versus chi (angle) to check for anisotropic samples. Fits of these
peaks are currently in the process of being made.
III. Results
Figure 4.
Example of one
of the 18
different well-
plate scans.
7. 7
Figure 4 shows an example of one of the outputs of the MATLAB program. Figure 5 shows
the more than 350 SAXS images distributed over eighteen well-plates. This program allows
large-scale visualization of the data so that immediate conclusions can be drawn.
Figure 5. Five different sets of surfactant solutions distributed across 18 well-plates.
Figure 6 shows an example of one of the “data dictionaries”. The dictionaries can be
stored in a file which can be read in to any other Python program, thereby allowing the user to
utilize any aspect of the data.
8. 8
Figure 6. Example of a dictionary for one SAXS image.
Figure 7 shows an example of a normalized, background-subtracted isotropic sample.
Figure 8 shows the same image for an anisotropic sample. This particular case is one of the few
of all samples that showed significant anisotropy (non-uniform scattering patterns). Figure 9 is
an example of an intensity versus q graph for an isotropic sample; Figure 10 is the same for an
anisotropic sample. Based on these two graphs, one cannot distinguish between isotropic and
anisotropic. So, although these are particularly important in the analysis of the data, as they
reveal information about the structure of the amphiphiles, another method is needed to identify
the anisotropic samples. Figure 11 shows the plot of intensity versus chi, averaged over q, for the
isotropic sample; Figure 12, the same for the anisotropic. Notice the peaks in Figure 12 where
the sample has higher intensity at that angle, versus the relatively flat line in Figure 11.
9. 9
Figure 7. Example of an isotropic sample that has
been normalized and had the background
subtracted.
Figure 8. Example of an anisotropic sample that
has been normalized and had the background
subtracted.
Figure 9 (above). Example of an intensity versus q
plot for an isotropic sample.
Figure 10 (below). Example of an intensity
versus q plot for an anisotropic sample.
10. 10
IV. Discussion
The development of these programs results in the capability of high throughput SAXS
data analysis. The MATLAB program allows visualization of many data frames at once, in our
case as many as seventy-two, so that immediate conclusions can be drawn. This program can
also be slightly modified so that instead of being plotted by well-plate, the samples are plotted by
surfactant concentration versus salt concentration, or other relevant variables, so that the impact
on the diffraction patterns is obvious.
Figure 11 (above). Example of a plot of intensity
versus chi for isotropic sample. No significant peak
shows that the sample is isotropic.
Figure 12 (below). Example of a plot of intensity
versus chi for anisotropic sample. Significant peaks
show that the sample is anisotropic.
11. 11
While this project has come far, there are still several steps that need to be taken to ensure
thorough analysis of the data. The fit parameters of the plots need to be ascertained, and then
added to the dictionary. Most importantly, the data needs to actually be analyzed, a task that is
made considerably easier by these programs. The SAXS data can be used to study the
morphology of these surfactant micelles and phase behavior in aqueous solutions. The
contribution to I(q) (see Figures 9 and 10) arising from the micellar electron density (Figure 13
(b)) is termed the form factor; the contribution to I(q) from the variation in electron density in
ordered domains (Figure 13 (c)) is referred to as the structure factor. The two are combined in
the
“interaction peak” (Figure 143
). While the interaction peak will always contain a form factor
peak for any sample of amphiphiles, the degree of the prominence of the structure factor peak is
what will vary significantly between samples and what can provide the most insight into the
properties of the material for application to industry.
V. References
1
All samples were provided by Lubrizol Corporation (Ohio).
2
Heiney, Paul A. Datasqueeze. Computer software. Datasqueeze Software. Vers. 3.0.4.
N.p., 7 Feb. 2015. Web. 27 July 2015.
Figure 13. (a) Self assembled amphiphiles in water. (b) Across a
micelle, the electron density ρ(r) varies in regions dominated by
water, denser headgroups, and slightly less dense alkyl chains. (c)
Across a domain of ordered micelles, the electron density is
periodic on the larger length scale of the micelle spacing.
Figure 14. Scattering curve that
illustrates typical contributions of
form factor (dotted line) and structure
factor (dashed line)3
.
12. 12
3
Itri, R., and L. Q. Amaral. "Micellar-shape Anisometry near Isotropic–liquid-crystal
Phase Transitions." Physical Review E Phys. Rev. E 47.4 (1993): 2551-557. Print.
VI. Acknowledgements
This project was supported in part by the Brookhaven National Laboratory (BNL) Photon
Sciences Department under the BNL Supplemental Undergraduate Research Program (SURP)
(U. S. Department of Energy contract numbers DE-AC02-98CH10886 and DE-SC0012704). I
would also like to thank Vesna Stanic of LNLS and Ramiro Galleguilos of Lubrizol Corp. for
taking the original data and providing me with some essential background information about the
project. Lastly, I would like to thank my mentor, Elaine DiMasi, for providing me with the
opportunity to work on this project and giving me guidance.