LIMS for maize mapping project

BIOINFORMATICS
Vol. 19 no. 16 2003, pages 2022–2030
DOI: 10.1093/bioinformatics/btg274
Development of an integrated laboratory
information management system for the maize
mapping project
H. Sanchez-Villeda1, S. Schroeder1, M. Polacco1,3,
M. McMullen1,3, S. Havermann1, G. Davis1, I. Vroh-Bi1,
K. Cone2, N. Sharopova1, Y. Yim1, L. Schultz1, N. Duru1,
T. Musket1, K. Houchins3, Z. Fang1, J. Gardiner1
and E. Coe1,3,∗
1Department of Agronomy, 2Division of Biological Sciences and 3USDA-ARS, University
of Missouri, Columbia, MO 65211, USA
Received on February 4, 2003; revised on April 18, 2003; accepted on May 3, 2003
ABSTRACT
Motivation: The development of an integrated genetic and
physical map for the maize genome involves the generation
of an enormous amount of data. Managing this data requires
a system to aid in genotype scoring for different types of
markers coming from both local and remote users. In addi-
tion, researchers need an efficient way to interact with genetic
mapping software and with data files from automated DNA
sequencing. They also need ways to manage primer data
for mapping and sequencing and provide views of the integ-
rated physical and genetic map and views of genetic map
comparisons.
Results: The MMP-LIMS system has been used successfully
in a high-throughput mapping environment. The genotypes
from 957 SSR, 1023 RFLP, 189 SNP, and 177 InDel mark-
ers have been entered and verified via MMP-LIMS.The system
is flexible, and can be easily modified to manage data for other
species. The software is freely available.
Availability: To receive a copy of the iMap or cMap software,
please fill out the form on our website. The other MMP-LIMS
software is freely available at http://www.maizemap.org/
bioinformatics.htm.
Contact: coee@missouri.edu
1 INTRODUCTION
The maize mapping project (MMP) aims to develop an integ-
rated physical and genetic map of maize. This resource will be
useful for marker-assisted selection, map-based cloning, and
comparative genomics of crops, and will undergird sequen-
cing of the maize genome (Cone et al., 2002). To achieve
this goal, the MMP has utilized and developed DNA markers
∗To whom correspondence should be addressed.
including 1023 restriction fragment length polymorphisms
(RFLPs), 957 simple sequence repeats (SSRs), 10 000 over-
gos, 189 single nucleotide polymorphisms (SNPs), and
177 insertion/deletion (InDel) polymorphisms (Davis et al.,
1999; Sharopova et al., 2002). These markers have been
used to develop a high-resolution genetic map used as the
framework to anchor bacterial artificial chromosome (BAC)
contigs. This process requires high-throughput sequencing,
and high-throughput SNP/InDel genotyping. The amount
of data produced is enormous. The Missouri compon-
ent of the MMP team is divided into different laborator-
ies dispersed throughout the campus, simultaneously using
and producing different parts of the same core data. The
genetic mapping populations involve different subsets of indi-
viduals with their respective molecular marker data to be
stored, managed, and integrated into the maps. Optimal
use of these data requires effective methods of analysis
and management. Furthermore, the data produced in the
MMP must be disseminated to the scientific community
through informatics tools capable of handling the high volume
of data.
The requirements for laboratory databases vary consider-
ably from project to project. At present, many laboratories use
spreadsheets to manage their data. In this paper, we present
the MMP laboratory information management system (MMP-
LIMS) that we have developed to provide several functions:
(1) allow data management with detailed record keeping,
reporting, and retrieving; (2) ensure data quality and access-
ibility to the scientific community and (3) disseminate the
integrated map of maize to the scientific community through
web-based tools. This research is an example of application of
informatics to practical biology and agronomy questions. An
overview of the MMP-LIMS components and their functions
is shown in Table 1.
2022 Published by Oxford University Press.
byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom

Development of an integrated LIMS for the MMP
Table 1. MMP-LIMS functions
MMP-LIMS component Summary
MMP-LIMS Scoring Tool Serves as a laboratory notebook for wet lab researchers
Allows researchers in different laboratories to interact with MMP-LIMS database
Manages genotype data from RFLP, SSR, SNP, InDel marker types
Validates genotype scores based on repeat reads
Interfaces with ABI Prism Genotyper software, converting trace file data into SNP scores
Allows user to make custom templates for genotype output/entry
Creates input files for MapMaker by chromosome or data set
Integrates information returned by the MapMaker software
Also exists in a publicly available standalone version utilizing an Access database and includes an example
database
Community IBM Map Data Entry Tool
(CIMDE)
Allows researchers at remote locations to enter genotype scores into the MMP-LIMS database via web-based
interface
Provides a mechanism for uploading tab-delimited score file and for uploading scores for a single marker
Validates genotype scores for control loci
Validates genotype scores based on marker type
SSR Finder Locates SSRs in DNA sequences
Designs unique primer pairs to amplify SSR sequences
SNP Discovery Primer Design Designs primers for finding potential SNPs
Performs BLAST search against existing primers
SNP/InDel Finder Calculates base frequencies in each position in a sequence alignment
Searches for gaps in a sequence alignment representing InDels
Mapped Sequence Locator (MSL) Accepts sequence via a web-based interface and performs a BLAST search against all public maize sequence
Returns BLAST scores, genetic map locations, and links related to sequences if available
iMap Graphically displays an integrated genetic and physical map
Displays genetic marker data and contig data
Performs search for a map location based on locus, probe, GenBank accession, or contig number
Provides links to current WebFPC (Soderlund et al., 2000) assembly
Displays anchors based on a set of data filters that remove ambiguous assignments
cMap Displays comparative associations between two genetic maps
Gives the user text lists of the shared loci between the compared maps
2 SYSTEMS AND METHODS
Several technologies were employed during the development
of MMP-LIMS. The programming languages used in both
the user interface for the local wet lab researchers and the
remote researchers reflect an interest in creating a highly mod-
ular system and in providing each user with an efficient and
intuitive user interface.
The user interface for the local wet lab researchers was
implemented as a Visual Basic® 6.0 client application. This
also provides not only efficient performance for the user,
but also a well-structured environment for development.
The system’s client–server architecture permits many users
concurrent access to a central database. Object Database
Connectivity (ODBC) provides interoperability, connects the
client application to the database, and allows interaction with
MaizeDB (MaizeDB, 2003, http://www.agron.missouri.edu/)
through proxy tables.
The web-based user interface for remote researchers util-
izes HTML for the static content. Java™ applets are used
for the other functions and give the user a more interact-
ive and straightforward interface than those attainable with
HTML forms. Java™ servlets and Java Database Connectivity
(JDBC™) transfer data to and from the database.
The sequence analysis modules of MMP-LIMS were imple-
mented in Perl and make use of other publicly available
programs including Primer3 (Rozen and Skaletsky, 2000),
BLAST (Altschul et al., 1990), phred (Ewing et al., 1998),
phrap (Ewing and Green, 1998), and clustalw (Thompson
et al., 1994). The web-based sequence comparison module
utilizes Perl (CGI/DBI) along with XML/XSLT and the
BLAST program.
The web-based integrated genetic and physical map dis-
play application and the comparative mapping viewer were
adapted from software used in the Rice Genome Project (Rice
Genome Research Program, 2002, http://rgp.dna.affrc.go.jp/).
Originally utilizing data stored via flat files, the code for the
integrated map viewer was converted to allow data retrieval
from the database via servlet communication. The user
2023

H.Sanchez-Villeda et al.
Client-Server
Tool
MMP-
LIMS
Scoring
Tool
Sequence Analysis
Modules
SNP/InDel
Pipeline
SNP
Discovery
Primer
Design
SNP/InDel
Finder
SSR
Finder
MMP-LIMS
Web-Based
Tools
CIMDE
Mapped
Sequence
Locator
(MSL)
iMap
Viewer
MMP-LIMS
Users
Local
Wet Lab
Users
Remote
Wet Lab
Users
Web
Users
cMap
Viewer
Fig. 1. MMP-LIMS context diagram. The modules of MMP-LIMS are shown, including the client–server tool, multiple web-based tools and
sequence analysis modules for SSRs and SNPs/InDels.
interface employs a combination of a Java™ applet and Perl
CGI (Cone et al., 2002).
The MMP-LIMS data are stored in a Sybase® Adaptive
Server Enterprise 11.9.2 relational database. The database
resides on a Dell Precision 330 running Redhat 7.3 with a
2.4.18-5 kernel. An additional standalone version of MMP-
LIMS exists with the same functionality and was designed to
work with a Microsoft® Access database.
3 IMPLEMENTATION
MMP-LIMS provides data management for the processes
involved in generating a high-resolution genetic map includ-
ing managing SNP/InDel data, managing SSR and RFLP
data, generating the genetic map, and providing public views
of the data. The modules comprising MMP-LIMS can be
viewed as elements of a system context diagram (Fig. 1).
The modules include the MMP-LIMS Scoring Tool (Fig. 2),
the Community IBM Map Data Entry Tool (CIMDE), SSR
Finder, SNP Discovery Primer Design, SNP/InDel Finder, the
Mapped Sequence Locator (MSL), the integrated genetic and
physical map viewer (iMap), and the comparative mapping
viewer (cMap).
3.1 Managing SNP/InDel information
MMP-LIMS manages data from each step in the process of
placing SNPs/InDels on the genetic map, from finding poten-
tial SNPs/InDels with the SNP/InDel pipeline to managing
the genotype score data with MMP-LIMS Scoring Tool and
generating files for MapMaker (Lander et al., 1987) software.
The SNP/InDel pipeline works in two steps (Fig. 3A). First,
SNP primers are designed with the SNP Discovery Primer
Design module. Then the resulting primers are used to process
sequences in order to find SNPs and/or InDels via SNP/InDel
Finder. The first step in SNP discovery is to sequence a region
of DNA across multiple lines of maize to detect nucleotide
polymorphisms. The DNA segments for sequencing are amp-
lified using primer pairs designed with the SNP Discovery
Primer Design module.
DNA sequence is entered into the module, along with para-
meters including distance between primer pairs and region
of the sequence to search for primer pairs. Using the given
parameters, this script builds an input file for Primer3. The
resulting primers are returned from Primer3, and the SNP Dis-
covery Primer Design module checks for repeats in the primer
sequence and rejects those with repeats. The script can also
be set to perform a BLAST search with the primers against
all previously designed primers. The output of the script is a
list of unique SNP discovery primers.
The primers from the SNP Discovery Primer Design mod-
ule are used to amplify and sequence DNA in 12 different
lines of maize. Base calling of the resulting forward and
reverse sequencing trace files is performed by phred. For-
ward and reverse output sequence is trimmed based on the
primers and quality scores and each sequence is stored in
a single file. The quality scores are stored in a separate
2024

Genotype
Scores
Genotype
Scores
SNP Genotype
Data
Genotyping
Information Added
via Catalogs
Genotyping
Information
User Interface for
Genotype Score
Entry and
Verification
Experimental Conditions /
Setup Information
Map Data /
Segregation File Data
Genetic Map
Data
Genotype Data
for MapMaker
Input File
Generation
MMP-
LIMS
Database
MMP-LIMS
Scoring Tool
Lab
Notebook
Function
Catalogs
Primers Samples Populations Templates
CIMDE
(Remote Genotype
Score Entry and
Verification)
External
Software
ABI Prism®
Genotyper®
Software
MapMaker
Fig. 2. MMP-LIMS scoring tool overview. The genotype score management functions of MMP-LIMS Scoring Tool, including catalog-based
management of sample data and lab notebook, are shown. The diagram also includes the interfaces with MapMaker and Genotyper software,
and the remote genotype score entry tool—CIMDE.
file. Sequence assembly is then performed by phrap. Next,
a script combines the sequences into 12-sequence groups
with each group corresponding to a single SNP discovery
(dSNP) primer pair. The clustalw program then aligns the
sequences for each primer pair and sends the output into the
SNP/InDel Finder script to calculate base frequencies at each
position in an alignment. If 12 out of 12 sequences contain
the same base at a position, then no SNP is present. If one
sequence is different at a position than the other 11 (1 : 11),
then the possibility of a SNP is questionable. Candidate SNPs
are defined by positions where at least two sequences are
different from the other 10 (2 : 10) or better—(3 : 9), (4 : 8),
(5 : 7) or (6 : 6). SNP/InDel Finder also looks for gaps in the
alignment representing insertions/deletions (InDels). These
polymorphisms can then be used for genotype analysis by the
wet lab group.
To manage SNP data, the MMP-LIMS Scoring Tool enables
wet lab researchers in several different laboratories to perform
genotypescoringandmanagegenotypingdata. WhiletheIBM
mapping population has 360 individuals, the tool can handle a
virtually unlimited number of individuals. MMP-LIMS uses
catalogs to manage and maintain data related to these fields
(Fig. 2). For example, through an interface for the catalogs,
the user can add, edit or delete SNP or InDel primers. The
system validates the information and checks the integrity of
the data among the other tables. When deleting from the cata-
log, the system checks the database tables for consistency. In
particular, if a primer is already in use in a record, then a user
cannot delete that primer from MMP-LIMS. Only the master
user has the ability to perform this type of ‘cascading’ delete,
deleting all references to that primer.
The templates catalog allows the user to create a subset of a
population’s samples for use in specific experiments. The user
can create a samples template, and then link the appropriate
samples to the template.
The MMP-LIMS Scoring Tool provides interfaces to
convert ABI Prism® Genotyper® (Applied Biosystems, 2003,
http://www.appliedbiosystems.com/products/)filesintosegre-
gation files and to import them to MMP-LIMS database. To
convert ABI Genotyper® files, the MMP-LIMS provides a
color template where users enter values of the base pair peaks
generated in the ABI sequencer for the two parental lines
used in the IBM population (B73/Mo17). Then MMP-LIMS
receives the ABI Genotyper® file, which contains the allele
2025

Add each new SSR to the database
SNP Discovery
Primers Used to
Amplify and
Sequence in N
Different
Genetic Lines
1. Use Primer3 to create primers
2. Filter out primers with inverted repeats
Sequence and
SNP Parameters
1. Perform base calling with phred
2. Trim for primers (sequence and quality)
3. Use phrap for sequence assembly
4. Group sequences with same dSNP primer
5. Align sequences for 1 primer pair (clustalw)
6. Look for base frequencies at each position
in the alignment with SNP/InDel Finder
File of Positional
Base Frequencies
(i.e. SNPs and
InDels)
SNP/InDel Pipeline
1. Find repeats and generate primers
2. Check against previously discovered primers
Sequence
Formatted List
of Primers with
Ordering
Information
Primer
DBlast
Database
SSR Finder
A
B
Fig. 3. Sequence analysis modules. The functions performed by the
two sequence analysis modules of MMP-LIMS are shown. The steps
performed by the SNP/InDel Pipeline to find primers and discover
potential SNPs/InDels are given in (A), and the process performed
by SSR Finder to locate SSRs in sequence and design unique primers
is outlined in (B).
information for the SNP experiment, and based on the color
template, processes the information and converts it into a
segregation file. The segregation data, consisting of a list of
scores for each SNP marker, are stored in the MMP-LIMS
database.
TheMMP-LIMSScoringToolalsofunctionsasalaboratory
notebook for wet lab researchers. Users may store information
about specific experimental conditions including gel compos-
ition and the primer sequences used. The notebook also stores
data related to setup, such as microtiter plate layout.
The MMP-LIMS Scoring Tool also offers a web-based
query-by-example interface that allows users to create their
ownqueriesbasedonmarkers, samples, probesorenzymesfor
exporting information from the LIMS database into a standard
Microsoft® Excel spreadsheet for analysis.
Inaddition, astandaloneversionoftheMMP-LIMSScoring
Tool is available that works with an Access database. The
standalone version includes an example database populated
with maize data.
Several security features protect the data in MMP-LIMS
database accessed via MMP-LIMS Scoring Tool. To use the
MMP-LIMS Scoring Tool, the user is required to have a valid
user account and password. The different types of MMP-
LIMS Scoring Tool user accounts provide various levels of
system protection. For example, the administrator is able to
add new users and grant permissions to users for particular
system functions while, by default, new users can only view
the information and enter genotype scores.
The MMP-LIMS system takes advantage of a relational
database management system (RDBMS) for information stor-
age and retrieval. The RDBMS provides several important
functions including inserting, deleting, updating, retrieval,
managing concurrent requests, and handling transaction
issues such as rollback. The main MMP-LIMS database is
composed of more than 50 tables that record primer, locus,
enzymes, probes, samples, templates, users, passwords, note-
books and score information for the daily processes in the lab.
The physical data model can be found on the MMP website
(Maize Mapping Project, 2003, http://www.maizemap.org).
The model design is based on the third normal form approach
(Date, 2002). The MMP-LIMS database dedicates a large por-
tion of its tables to the MMP-LIMS Scoring Tool because of
the high level of functionality that this module provides.
3.2 Managing SSR and RFLP data
InadditiontomanagingSNP/InDeldata, MMP-LIMShandles
data generated to place SSR and RFLP markers on the genetic
map. The tools enable the researcher to locate potential SSRs
andmanagethegenotypescoredata, andencouragecollabora-
tion by providing resources for researchers in remote locations
to enter genotype scores.
The SSR Finder tool serves three major purposes (Fig. 3B).
First, SSR Finder locates SSRs in DNA sequences. Second,
the program designs primer pairs to amplify the SSR-
containing sequence regions. Finally, SSR Finder checks
these primer pairs for uniqueness, removing any redundant
primers.
First, the sequence of interest is entered into the SSR Repeat
Finder module. SSR Repeat Finder returns a list of repeats
(SSRs) and the flanking (surrounding) sequence, which is then
sentasinputintoSSRPrimerDesigner. Thismodulebuildsthe
inputfileforPrimer3foreachSSR,withuser-definedparamet-
ers for primer length, Tm, G/C content, and distance between
forward and reverse primers. The list of potential primers and
associated data from Primer3 is sent to the SSR Primer Rep
module, which runs the SSR Repeat Finder module against
the potential primer pairs and removes primer pairs that con-
tain a simple sequence repeat within the primer sequence.
The SSR Primer BLAST script takes the remaining primers
and their associated data, and uses the SSR sequence plus
the flanking sequence and performs a BLAST search against
all the primer pairs previously discovered in the project. It
also adds each new SSR to the Primer DBlast database after
it is checked. The program formatdb is run to regenerate the
Primer DBlast database. Next, the SSR Primer BLAST mod-
ule returns the BLAST scores for the primers. Based on these
scores, the Order Filter script creates a list of primers with
no BLAST hits and sends the list to Order Formatter. Finally,
2026

Order Formatter returns a formatted list along with ordering
information.
The MMP-LIMS Scoring Tool discussed previously is also
used effectively for managing SSR and RFLP genotype score
data. Individuals in the lab can perform the scoring in two
steps. First, one user analyzes the gel images or autoradio-
graphs, entering the score for each sample in a population
or template. Next, a second user verifies the scores by inde-
pendently entering each score again in the row underneath the
original entered scores. Because the letters representing the
scores are color-coded, it is easy for the second individual to
see that the scores match the original scores. The system can
also automatically check for mismatched scores and allow the
user to move from one cell containing a mismatched score to
the next to verify the data.
The CIMDE is a subsystem of the MMP-LIMS system
developed to allow members of the maize community to
remotely enter genotype scores into MMP-LIMS for a subset
of 94 individual lines from the intermated B73xMo17 (IBM)
(Davis et al., 2001) mapping population.
The system is composed of two main components designed
to allow flexibility in the way researchers submit and edit
their scores, while providing an intuitive and easy-to-use inter-
face. First, the file upload function allows the user to quickly
populate the database with a batch of probes and their associ-
ated genotype scores by uploading a tab-delimited text file
via a web-based form. The second component consists of
an in-browser application that gives remote researchers the
opportunity to submit scores manually, edit scores previously
submitted, or delete scores. CIMDE is used primarily for SSR
data, but also allows the researcher to enter SNP and RFLP
information.
MMP-LIMS creates a MapMaker file from the scores sub-
mitted by the user via CIMDE. The PostScript™ version of
the map built by MapMaker is converted to a tabular format
by MMP-LIMS. Both the PostScript™ file and the table are
then e-mailed to the user.
Both components of CIMDE perform extensive validation
of the data based on the type of probe before addition to
the database is permitted. During a submission using the
file upload function, CIMDE checks the validity and the
number of scores for each record. If scores for an RFLP
probe are being processed, the system performs an addi-
tional check to ensure that a restriction enzyme name is
given for each probe. Because each probe name needs to
be unique, the system checks that the probe name does not
already exist in the database under any user’s account. An
insertion or update of records that causes the duplication
of marker names is not permitted by the system, and if
attempted, the system displays a list containing each duplic-
ate record. The manual data-editing tool also validates probe
data. The table in the graphical user interface will not allow
the user to insert invalid score values into its cells, while valid
scores are color-coded for ease of recognition. This validation
and color-coding varies based on the type of probe being
edited.
In order to guard user data and MMP-LIMS data, CIMDE
is equipped with protection features. To ensure that each user
only has access to his or her data, each user must create a user-
name and password and register an individual account with
the system. The user must also make a one-time submission
of a set of control scores to be validated by the system. If the
user’s control scores are correct, it means that the researcher
has performed the experiments correctly and that the gen-
otype scores that he or she is submitting are accepted as
valid scores. Once the user has logged in and has submit-
ted valid control scores, he or she can access both the file
upload and manual data editing functions of the application.
Control scores do not have to be submitted upon subsequent
use of CIMDE.
3.3 Production of the genetic map
Genetic map generation requires both converting genotype
scores from a set of samples into a format readable by Map-
Maker and interpreting the results returned by the software.
MMP-LIMS Scoring Tool creates input files for MapMaker
by retrieving data from MMP-LIMS database. Users can cre-
ate files using all of the mapping data or they can generate a
file for a subset of the data by creating a group and selecting
the markers and samples of interest. The MMP-LIMS Scor-
ing Tool then automatically creates the MapMaker input file.
Data from remote researchers can also be used in the creation
of the file. When needed, the system can convert scores. For
example, for recombinant inbred populations, the score ‘H’
is converted to ‘−’, while for F2 populations, the ‘H’ score
remains unchanged in the MapMaker input file.
Output from MapMaker is also managed by the MMP-
LIMS Scoring Tool. The MMP-LIMS Scoring Tool extracts
genetic map information such as chromosome, map coordin-
ate, framework versus off-frame status for each probe from
the PostScript™ file returned by the MapMaker software and
results are stored in the MMP-LIMS database.
3.4 Public views of MMP data
It is imperative that the data produced by the MMP be easily
viewable by the public. MMP-LIMS provides several displays
of the mapping data, including the MSL, iMap, and cMap.
The MSL provides a web-based interface to accept input
sequence and perform a BLAST search against all public
maize sequences, including the DuPont-MMP Cornsensus
(Maize Mapping Project, 2002, http://www.agron.missouri.
edu/files_dl/MMP/Cornsensus/) unigene set. It returns the
BLAST scores, the map location, and links to related
sequencesifavailable. Theuserentersthenucleotidesequence
via the Common Gateway Interface (CGI) WWW form along
with the name of the sequence and BLAST parameters.
2027

The CGI then performs a BLAST against a database con-
taining >300 000 Zea mays sequences from GenBank, and
>10 000 sequences from the Cornsensus Unigene set. The
BLAST results are returned in XML format and converted
by the CGI via XSLT. The CGI retrieves the accession num-
bers of related sequences from MaizeDB, and creates an
HTML table containing links to the related map and sequence
data in various databases including GenBank, MaizeDB,
The Institute for Genomic Research (TIGR) (The Institute
for Genomic Research, 2003, http://www.tigr.org/), Gra-
mene (Gramene, 2003, http://www.gramene.org/), the Ari-
zona Genomics Institute (AGI) (Arizona Genomics Institute,
2002, http://www.genome.arizona.edu/), and the Clemson
University Genomics Institute (CUGI) (Clemson University
GenomicsInstitute,2003, http://www.genome.clemson.edu/),
and Zea mays DataBase (ZmDB) (ZmDB, 2003, http://www.
zmdb.iastate.edu/).
The integrated genetic and physical map visualization tool
of MMP-LIMS, iMap (Cone et al., 2002), allows researchers
to access data related to loci on the genetic map along with
their associated contigs on the physical map. The graphical
interface displays the positions of the loci and contigs on the
genetic map and physical map, respectively. Searches may
be conducted based on the locus, probe, GenBank accession
number, or contig number.
The cMap (Fang et al., 2003) function of MMP-LIMS per-
mits the user to select and compare two genetic maps at a time
with dynamic links to data resources and text lists of the shared
loci between the compared maps. Searches can be conducted
based on locus, probe, or GenBank accession number.
4 DISCUSSION
The MMP-LIMS was designed to meet the challenges of a
high-throughput mapping project. Currently, MMP-LIMS is
being used at the Maize Mapping Project at the University of
Missouri—Columbia. The system has been used to enter and
verify 957 SSR markers, 1023 RFLP markers, 189 SNPs, and
177 InDels. MMP-LIMS is used primarily for the maize IBM
mapping population consisting of 360 samples.
MMP-LIMS has also been used for managing 590 SSRs of
the IF2 population with 56 samples and 359 SSRs of the C6
population with 93 samples. The two other populations were
used to map SSRs that are monomorphic in the IBM popula-
tion. The SSR loci from these two populations are integrated
within an enhanced version of the IBM map called the IBM
Neighbors map by interpolating the location of the marker
loci with loci shared between the IBM map and the other
maps (Cone et al., 2002).
Users performing research on a species other than maize can
customize the functions of the MMP-LIMS system by adding
populations, samples, and markers specific to the species
of interest. For example, members of the Soybean Gen-
omics Consortium (Soybean Genomics Consortium, 2003,
http://www.soybeangenome.org) have requested MMP-LIMS
for customization as the system to manage the data produced
in the generation of a genetic map for the soybean genome.
A variety of LIMS systems (Table 2), including the Lab-
Base (Goodman et al., 1998) system, dnaLIMS™ by dnaTools
(dnaTools,2002, http://www.dnatools.com/dnalims.html),the
GeneTrials LIMS system by Waban Software (Waban
Software Inc., 2002, http://www.wabansoftware.com/
Lims.htm), Sapphire Informatics 3.0 by LabVantage
(LabVantage, 2002, http://www.labvantage.com/products_
sapphireinfo.htm), theNautilissystembyThermoLabSystems
(Thermo LabSystems, 2002, http://www.thermolabsystems.
com/news/press/articles/020906-nautilus2002r2.asp), thesys-
tem by Clive G. Brown and Richard Mott from the
Bioinformatics Group at the Wellcome Trust Centre for
Human Genetics (Wellcome Trust Centre for Human
Genetics, 2001, http://bioinformatics.well.ox.ac.uk/project-
lims.html), CimBiosis™ Genotyping Workflow System
(Cimarron Software, Inc., http://www.cimsoft.com/products.
html), and Applied Biosystems GeneMapper™ Software
(Applied Biosystems, 2003) are currently available. However,
thesesoftwarepackagesdonotprovidethesamesetoffeatures
as MMP-LIMS. Several of the software packages provide only
generic interfaces that must be customized before storing lab
data. In addition, these systems do not provide a method for
validating and verifying genotyping scores or for using differ-
ent types of markers to generate an output file for a standard
mapping tool such as MapMaker. Only some of the systems
provide the user with an interface to data from ABI DNA
sequencers. While some systems are entirely web-based, few
of the systems provide a combination of both client/server lab
software in addition to web-based data query and visualization
tools to accommodate both local and remote users. In addi-
tion, the incorporation of sequence analysis tools for SSR and
SNP/InDel experiments is not found in the other packages.
Most of the systems were not designed to specifically handle
different types of genetic markers such as SSRs, RFLPs and
SNP/InDels.
MMP-LIMS is a complete system and the software is freely
available to the public. The system includes several levels of
security, a genotype scoring tool, a data entry tool for remote
researchers to submit data, scripts for designing SSR primers
and for locating potential SNP/InDels, a system for finding
sequences that are similar to a query sequence along with
related database links, and viewers for both an integrated
genetic/physical map and for comparison of genetic maps.
ACKNOWLEDGEMENTS
We would like to thank the members of our advisory com-
mittee including Sue Wessler (chair), Brad Barbazuk, Vicki
Chandler, Joe Ecker, Stan Letovsky and Antoni Rafalski.
Names are necessary to factually report on available data;
2028

Table 2. Feature comparisons
Legend
n/l—Not listed
in article or on
software website
MMP-LIMS LabBase dnaLIMS™ GeneTrials™
LIMS
Sapphire
Informatics 3.0
Nautilis System
by Brown
and Mott
CimBiosis™
Genotyping
Workflow
System
Applied
Biosystems
GeneMapper™
Freely available to public y y n n n n y n n
Interface customized for
genetic map data
y n n y n/l n/l n n y
Validation and
verification of
genotype scores
y n n n/l n/l n/l n/l n/l y
Generation of mapMaker
input file with multiple
marker types
y n n n/l n/l n/l n/l n/l n/l
Different security levels y n/l n/l n/l n/l n/l n/l n/l n/l
Interface to data from
ABI DNA sequencers
y n y n/l n/l n/l y y y
Combination of both
client/server lab
software and
web-based data query
and visualization tools
y n n n/l y y y y n/l
Incorporation of sequence
analysis tools for SSR
and SNP/InDel
experiments
y n y n/l n/l n/l n/l n/l n/l
Handling a variety of
genetic marker data
(i.e. SSRs, RFLPs,
SNPs/InDels)
y n n n/l n/l n/l y y y
y, provided; n, not provided
however, neither USDA nor the University of Missouri guar-
antees nor warrants the standard of the product, and the use of
the name implies no approval of the product to the exclusion of
others that may also be suitable. This research was supported
by the National Science Foundation (DBI 9872655).
REFERENCES
Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J.
(1990) Basic local alignment search tool. J. Mol. Biol., 215,
403–410.
Applied Biosystems (2003) Applied Biosystems | Main. Accessed
2003 Feb 4.
Arizona Genomics Institute (2002) Dec 20. AGI Home Page.
Accessed 2003 Feb 4.
Cimarron Software, Inc. (2002) March 1. Cimarron Software, Inc.—
Products. Accessed 2003 Feb 4.
Clemson University Genomics Institute (2003) Jan 29. CUGI:
Clemson University Genomics Institute. Accessed 2003 Feb 4.
Coe,E., Cone,K., McMullen,M., Chen,S., Davis,G., Gardiner,J.,
Liscum,E., Polacco,M., Paterson,A., Sanchez-Villeda,H.,
Soderlund,C. and Wing,R. (2002) Access to the maize genome:
Anintegratedphysicalandgeneticmap.PlantPhysiol., 128, 9–12.
Cone,K., McMullen,M., Vroh Bi,I., Davis,G., Yim,Y.-S.,
Gardiner,J., Polacco,M., Sanchez-Villeda,H., Fang,Z.,
Schroeder,S. et al. (2002) Genetic, physical and informatic
resources for maize: On the road to an integrated map. Plant
Physiol., 130, 1594–1601.
Date,C.J. (2002) An Introduction to Database Systems (Seventh
Edition). Addison Wesley Longman, Inc., Reading, MA.
Davis,G., McMullen,M., Baysdorfer,C., Musket,T., Grant,D.,
Staebell,M.S., Xu,G., Polacco,M., Koster,L., Melia-Hancock,S.
et al. (1999) A maize map standard with sequenced core mark-
ers, grass genome reference points, and 932 ESTs in a 1736-locus
map. Genetics, 152, 1137–1172.
Davis,G., Musket,T., Melia-Hancock,S., Duru,N., Sharopova,N.,
Schultz,L., McMullen,M.D., Sanchez-Villeda,H., Schroeder,S.
and Garcia,A.A. (2001) The intermated B73 x Mo17 genetic map:
a community resource. Maize Genetics Conference Abstracts,
43:W15, 62.
dnaTools (2002) Sep 28. dnaTools. Accessed 2003 Feb 4.
Ewing,B. and Green,P. (1998) Base-calling of automated sequen-
cer traces using phred. II. Error probabilities. Genome Res., 8,
186–194.
Ewing,B., Hillier,L., Wendl,M.C. and Green,P. (1998) Base-calling
of automated sequencer traces using phred. I. Accuracy assess-
ment. Genome Res., 8, 175–185.
Fang,Z., Polacco,M., Chen,S., Schroeder,S., Hancock,D.,
Sanchez,H. and Coe,E. (2003) cMap: the comparative genetic
map viewer. Bioinformatics, 19, 416–417.
2029

Goodman,N., Rozen,S., Stein,L. and Smith,A. (1998) The Lab-
Base system for data management in large scale biology research
laboratories. Bioinformatics, 14, 562–574.
Gramene (2003) Jan 19. Gramene. Accessed 2003 Feb 4.
LabVantage (2002) Aug 26. Sapphire Informatics 3.0 is a
browser/server-based solution. Accessed 2003 Feb 4.
Lander,E.S., Green,P., Abrahamson,J., Barlow,A., Daly,M.J.,
Lincoln,S.E. and Newburg,I. (1987) MAPMAKER: an interactive
computer package for constructing primary genetic linkage maps
of experimental and natural populations. Genomics, 1, 174–181.
MaizeDB (2003) Jan 27. Maize Genome Database—MaizeDB.
Maize Mapping Project (2003) Jan 30. Maize Mapping Project.
Maize Mapping Project (2002) Oct 8. Cornsensus Sequence Files.
Rice Genome Research Program 2002 Nov 15. Rice Genome
Research Program (RGP) Home Page. Accessed 2003 Feb 4.
Rozen,S. and Skaletsky,H.J. (2000) Primer3 on the WWW for
general users and for biologist programmers. In Krawetz,S.
and Misener,S. (eds), Bioinformatics Methods and Protocols:
Methods in Molecular Biology. Humana Press, Totowa, NJ,
pp. 365–386.
Sharopova,N., McMullen,M.D., Schultz,L., Schroeder,S.,
Sanchez-Villeda,H., Gardiner,J., Bergstrom,D., Houchins,K.,
Melia-Hancock,S., Musket,T. et al. (2002) Development and
mapping of SSR markers for maize. Plant Mol. Biol., 48,
463–481.
Soderlund,C., Humphray,S., Dunham,A. and French,L. (2000) Con-
tigs built with ﬁngerprints, markers and FPC V4.7. Genome Res.,
10, 1772–1787.
Soybean Genomics Consortium (2003) Mar 6. Soybean Genomics
Consortium Accessed 2003 Apr 17.
The Institute for Genomic Research (2003) Jan 23. The Institute for
Genomic Research. Accessed 2003 Feb 4.
Thermo LabSystems (2002) Sep 6. Thermo LabSystems—
Company—News—Press—Thermo LabSystems delivers
Nautilus™ 2002 R2 LIMS. Accessed 2003 Feb 4.
Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTAL W:
Improving the sensitivity of progressive multiple sequence
alignment through sequence weighting, positions-speciﬁc gap
penalties and weight matrix choice. Nucleic Acids Res., 22,
4673–4680.
Waban Software Inc. (2002) Mar 26. Waban Software Inc. Accessed
2003 Feb 4.
Wellcome Trust Centre for Human Genetics (2001) Jan 21.
WTCHG Bioinformatics Website: Homepage Accessed 2003
Feb 4.
ZmDB (2003) Jan 22. ZmDB: Maize Genome Database Accessed
2003 Feb 4.
2030

LIMS for maize mapping project

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to LIMS for maize mapping project

Similar to LIMS for maize mapping project (20)

Recently uploaded

Recently uploaded (20)

LIMS for maize mapping project