SlideShare a Scribd company logo
BIOINFORMATICS
Vol. 19 no. 16 2003, pages 2022–2030
DOI: 10.1093/bioinformatics/btg274
Development of an integrated laboratory
information management system for the maize
mapping project
H. Sanchez-Villeda1, S. Schroeder1, M. Polacco1,3,
M. McMullen1,3, S. Havermann1, G. Davis1, I. Vroh-Bi1,
K. Cone2, N. Sharopova1, Y. Yim1, L. Schultz1, N. Duru1,
T. Musket1, K. Houchins3, Z. Fang1, J. Gardiner1
and E. Coe1,3,∗
1Department of Agronomy, 2Division of Biological Sciences and 3USDA-ARS, University
of Missouri, Columbia, MO 65211, USA
Received on February 4, 2003; revised on April 18, 2003; accepted on May 3, 2003
ABSTRACT
Motivation: The development of an integrated genetic and
physical map for the maize genome involves the generation
of an enormous amount of data. Managing this data requires
a system to aid in genotype scoring for different types of
markers coming from both local and remote users. In addi-
tion, researchers need an efficient way to interact with genetic
mapping software and with data files from automated DNA
sequencing. They also need ways to manage primer data
for mapping and sequencing and provide views of the integ-
rated physical and genetic map and views of genetic map
comparisons.
Results: The MMP-LIMS system has been used successfully
in a high-throughput mapping environment. The genotypes
from 957 SSR, 1023 RFLP, 189 SNP, and 177 InDel mark-
ers have been entered and verified via MMP-LIMS.The system
is flexible, and can be easily modified to manage data for other
species. The software is freely available.
Availability: To receive a copy of the iMap or cMap software,
please fill out the form on our website. The other MMP-LIMS
software is freely available at http://www.maizemap.org/
bioinformatics.htm.
Contact: coee@missouri.edu
1 INTRODUCTION
The maize mapping project (MMP) aims to develop an integ-
rated physical and genetic map of maize. This resource will be
useful for marker-assisted selection, map-based cloning, and
comparative genomics of crops, and will undergird sequen-
cing of the maize genome (Cone et al., 2002). To achieve
this goal, the MMP has utilized and developed DNA markers
∗To whom correspondence should be addressed.
including 1023 restriction fragment length polymorphisms
(RFLPs), 957 simple sequence repeats (SSRs), 10 000 over-
gos, 189 single nucleotide polymorphisms (SNPs), and
177 insertion/deletion (InDel) polymorphisms (Davis et al.,
1999; Sharopova et al., 2002). These markers have been
used to develop a high-resolution genetic map used as the
framework to anchor bacterial artificial chromosome (BAC)
contigs. This process requires high-throughput sequencing,
and high-throughput SNP/InDel genotyping. The amount
of data produced is enormous. The Missouri compon-
ent of the MMP team is divided into different laborator-
ies dispersed throughout the campus, simultaneously using
and producing different parts of the same core data. The
genetic mapping populations involve different subsets of indi-
viduals with their respective molecular marker data to be
stored, managed, and integrated into the maps. Optimal
use of these data requires effective methods of analysis
and management. Furthermore, the data produced in the
MMP must be disseminated to the scientific community
through informatics tools capable of handling the high volume
of data.
The requirements for laboratory databases vary consider-
ably from project to project. At present, many laboratories use
spreadsheets to manage their data. In this paper, we present
the MMP laboratory information management system (MMP-
LIMS) that we have developed to provide several functions:
(1) allow data management with detailed record keeping,
reporting, and retrieving; (2) ensure data quality and access-
ibility to the scientific community and (3) disseminate the
integrated map of maize to the scientific community through
web-based tools. This research is an example of application of
informatics to practical biology and agronomy questions. An
overview of the MMP-LIMS components and their functions
is shown in Table 1.
2022 Published by Oxford University Press.
byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
Development of an integrated LIMS for the MMP
Table 1. MMP-LIMS functions
MMP-LIMS component Summary
MMP-LIMS Scoring Tool Serves as a laboratory notebook for wet lab researchers
Allows researchers in different laboratories to interact with MMP-LIMS database
Manages genotype data from RFLP, SSR, SNP, InDel marker types
Validates genotype scores based on repeat reads
Interfaces with ABI Prism Genotyper software, converting trace file data into SNP scores
Allows user to make custom templates for genotype output/entry
Creates input files for MapMaker by chromosome or data set
Integrates information returned by the MapMaker software
Also exists in a publicly available standalone version utilizing an Access database and includes an example
database
Community IBM Map Data Entry Tool
(CIMDE)
Allows researchers at remote locations to enter genotype scores into the MMP-LIMS database via web-based
interface
Provides a mechanism for uploading tab-delimited score file and for uploading scores for a single marker
Validates genotype scores for control loci
Validates genotype scores based on marker type
SSR Finder Locates SSRs in DNA sequences
Designs unique primer pairs to amplify SSR sequences
SNP Discovery Primer Design Designs primers for finding potential SNPs
Performs BLAST search against existing primers
SNP/InDel Finder Calculates base frequencies in each position in a sequence alignment
Searches for gaps in a sequence alignment representing InDels
Mapped Sequence Locator (MSL) Accepts sequence via a web-based interface and performs a BLAST search against all public maize sequence
Returns BLAST scores, genetic map locations, and links related to sequences if available
iMap Graphically displays an integrated genetic and physical map
Displays genetic marker data and contig data
Performs search for a map location based on locus, probe, GenBank accession, or contig number
Provides links to current WebFPC (Soderlund et al., 2000) assembly
Displays anchors based on a set of data filters that remove ambiguous assignments
cMap Displays comparative associations between two genetic maps
Gives the user text lists of the shared loci between the compared maps
2 SYSTEMS AND METHODS
Several technologies were employed during the development
of MMP-LIMS. The programming languages used in both
the user interface for the local wet lab researchers and the
remote researchers reflect an interest in creating a highly mod-
ular system and in providing each user with an efficient and
intuitive user interface.
The user interface for the local wet lab researchers was
implemented as a Visual Basic® 6.0 client application. This
also provides not only efficient performance for the user,
but also a well-structured environment for development.
The system’s client–server architecture permits many users
concurrent access to a central database. Object Database
Connectivity (ODBC) provides interoperability, connects the
client application to the database, and allows interaction with
MaizeDB (MaizeDB, 2003, http://www.agron.missouri.edu/)
through proxy tables.
The web-based user interface for remote researchers util-
izes HTML for the static content. Java™ applets are used
for the other functions and give the user a more interact-
ive and straightforward interface than those attainable with
HTML forms. Java™ servlets and Java Database Connectivity
(JDBC™) transfer data to and from the database.
The sequence analysis modules of MMP-LIMS were imple-
mented in Perl and make use of other publicly available
programs including Primer3 (Rozen and Skaletsky, 2000),
BLAST (Altschul et al., 1990), phred (Ewing et al., 1998),
phrap (Ewing and Green, 1998), and clustalw (Thompson
et al., 1994). The web-based sequence comparison module
utilizes Perl (CGI/DBI) along with XML/XSLT and the
BLAST program.
The web-based integrated genetic and physical map dis-
play application and the comparative mapping viewer were
adapted from software used in the Rice Genome Project (Rice
Genome Research Program, 2002, http://rgp.dna.affrc.go.jp/).
Originally utilizing data stored via flat files, the code for the
integrated map viewer was converted to allow data retrieval
from the database via servlet communication. The user
2023
byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
H.Sanchez-Villeda et al.
Client-Server
Tool
MMP-
LIMS
Scoring
Tool
Sequence Analysis
Modules
SNP/InDel
Pipeline
SNP
Discovery
Primer
Design
SNP/InDel
Finder
SSR
Finder
MMP-LIMS
Web-Based
Tools
CIMDE
Mapped
Sequence
Locator
(MSL)
iMap
Viewer
MMP-LIMS
Users
Local
Wet Lab
Users
Remote
Wet Lab
Users
Web
Users
cMap
Viewer
Fig. 1. MMP-LIMS context diagram. The modules of MMP-LIMS are shown, including the client–server tool, multiple web-based tools and
sequence analysis modules for SSRs and SNPs/InDels.
interface employs a combination of a Java™ applet and Perl
CGI (Cone et al., 2002).
The MMP-LIMS data are stored in a Sybase® Adaptive
Server Enterprise 11.9.2 relational database. The database
resides on a Dell Precision 330 running Redhat 7.3 with a
2.4.18-5 kernel. An additional standalone version of MMP-
LIMS exists with the same functionality and was designed to
work with a Microsoft® Access database.
3 IMPLEMENTATION
MMP-LIMS provides data management for the processes
involved in generating a high-resolution genetic map includ-
ing managing SNP/InDel data, managing SSR and RFLP
data, generating the genetic map, and providing public views
of the data. The modules comprising MMP-LIMS can be
viewed as elements of a system context diagram (Fig. 1).
The modules include the MMP-LIMS Scoring Tool (Fig. 2),
the Community IBM Map Data Entry Tool (CIMDE), SSR
Finder, SNP Discovery Primer Design, SNP/InDel Finder, the
Mapped Sequence Locator (MSL), the integrated genetic and
physical map viewer (iMap), and the comparative mapping
viewer (cMap).
3.1 Managing SNP/InDel information
MMP-LIMS manages data from each step in the process of
placing SNPs/InDels on the genetic map, from finding poten-
tial SNPs/InDels with the SNP/InDel pipeline to managing
the genotype score data with MMP-LIMS Scoring Tool and
generating files for MapMaker (Lander et al., 1987) software.
The SNP/InDel pipeline works in two steps (Fig. 3A). First,
SNP primers are designed with the SNP Discovery Primer
Design module. Then the resulting primers are used to process
sequences in order to find SNPs and/or InDels via SNP/InDel
Finder. The first step in SNP discovery is to sequence a region
of DNA across multiple lines of maize to detect nucleotide
polymorphisms. The DNA segments for sequencing are amp-
lified using primer pairs designed with the SNP Discovery
Primer Design module.
DNA sequence is entered into the module, along with para-
meters including distance between primer pairs and region
of the sequence to search for primer pairs. Using the given
parameters, this script builds an input file for Primer3. The
resulting primers are returned from Primer3, and the SNP Dis-
covery Primer Design module checks for repeats in the primer
sequence and rejects those with repeats. The script can also
be set to perform a BLAST search with the primers against
all previously designed primers. The output of the script is a
list of unique SNP discovery primers.
The primers from the SNP Discovery Primer Design mod-
ule are used to amplify and sequence DNA in 12 different
lines of maize. Base calling of the resulting forward and
reverse sequencing trace files is performed by phred. For-
ward and reverse output sequence is trimmed based on the
primers and quality scores and each sequence is stored in
a single file. The quality scores are stored in a separate
2024
byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
Development of an integrated LIMS for the MMP
Genotype
Scores
Genotype
Scores
SNP Genotype
Data
Genotyping
Information Added
via Catalogs
Genotyping
Information
User Interface for
Genotype Score
Entry and
Verification
Experimental Conditions /
Setup Information
Map Data /
Segregation File Data
Genetic Map
Data
Genotype Data
for MapMaker
Input File
Generation
MMP-
LIMS
Database
MMP-LIMS
Scoring Tool
Lab
Notebook
Function
Catalogs
Primers Samples Populations Templates
CIMDE
(Remote Genotype
Score Entry and
Verification)
External
Software
ABI Prism®
Genotyper®
Software
MapMaker
Fig. 2. MMP-LIMS scoring tool overview. The genotype score management functions of MMP-LIMS Scoring Tool, including catalog-based
management of sample data and lab notebook, are shown. The diagram also includes the interfaces with MapMaker and Genotyper software,
and the remote genotype score entry tool—CIMDE.
file. Sequence assembly is then performed by phrap. Next,
a script combines the sequences into 12-sequence groups
with each group corresponding to a single SNP discovery
(dSNP) primer pair. The clustalw program then aligns the
sequences for each primer pair and sends the output into the
SNP/InDel Finder script to calculate base frequencies at each
position in an alignment. If 12 out of 12 sequences contain
the same base at a position, then no SNP is present. If one
sequence is different at a position than the other 11 (1 : 11),
then the possibility of a SNP is questionable. Candidate SNPs
are defined by positions where at least two sequences are
different from the other 10 (2 : 10) or better—(3 : 9), (4 : 8),
(5 : 7) or (6 : 6). SNP/InDel Finder also looks for gaps in the
alignment representing insertions/deletions (InDels). These
polymorphisms can then be used for genotype analysis by the
wet lab group.
To manage SNP data, the MMP-LIMS Scoring Tool enables
wet lab researchers in several different laboratories to perform
genotypescoringandmanagegenotypingdata. WhiletheIBM
mapping population has 360 individuals, the tool can handle a
virtually unlimited number of individuals. MMP-LIMS uses
catalogs to manage and maintain data related to these fields
(Fig. 2). For example, through an interface for the catalogs,
the user can add, edit or delete SNP or InDel primers. The
system validates the information and checks the integrity of
the data among the other tables. When deleting from the cata-
log, the system checks the database tables for consistency. In
particular, if a primer is already in use in a record, then a user
cannot delete that primer from MMP-LIMS. Only the master
user has the ability to perform this type of ‘cascading’ delete,
deleting all references to that primer.
The templates catalog allows the user to create a subset of a
population’s samples for use in specific experiments. The user
can create a samples template, and then link the appropriate
samples to the template.
The MMP-LIMS Scoring Tool provides interfaces to
convert ABI Prism® Genotyper® (Applied Biosystems, 2003,
http://www.appliedbiosystems.com/products/)filesintosegre-
gation files and to import them to MMP-LIMS database. To
convert ABI Genotyper® files, the MMP-LIMS provides a
color template where users enter values of the base pair peaks
generated in the ABI sequencer for the two parental lines
used in the IBM population (B73/Mo17). Then MMP-LIMS
receives the ABI Genotyper® file, which contains the allele
2025
byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
H.Sanchez-Villeda et al.
Add each new SSR to the database
SNP Discovery
Primers Used to
Amplify and
Sequence in N
Different
Genetic Lines
1. Use Primer3 to create primers
2. Filter out primers with inverted repeats
Sequence and
SNP Parameters
1. Perform base calling with phred
2. Trim for primers (sequence and quality)
3. Use phrap for sequence assembly
4. Group sequences with same dSNP primer
5. Align sequences for 1 primer pair (clustalw)
6. Look for base frequencies at each position
in the alignment with SNP/InDel Finder
File of Positional
Base Frequencies
(i.e. SNPs and
InDels)
SNP/InDel Pipeline
1. Find repeats and generate primers
2. Check against previously discovered primers
Sequence
Formatted List
of Primers with
Ordering
Information
Primer
DBlast
Database
SSR Finder
A
B
Fig. 3. Sequence analysis modules. The functions performed by the
two sequence analysis modules of MMP-LIMS are shown. The steps
performed by the SNP/InDel Pipeline to find primers and discover
potential SNPs/InDels are given in (A), and the process performed
by SSR Finder to locate SSRs in sequence and design unique primers
is outlined in (B).
information for the SNP experiment, and based on the color
template, processes the information and converts it into a
segregation file. The segregation data, consisting of a list of
scores for each SNP marker, are stored in the MMP-LIMS
database.
TheMMP-LIMSScoringToolalsofunctionsasalaboratory
notebook for wet lab researchers. Users may store information
about specific experimental conditions including gel compos-
ition and the primer sequences used. The notebook also stores
data related to setup, such as microtiter plate layout.
The MMP-LIMS Scoring Tool also offers a web-based
query-by-example interface that allows users to create their
ownqueriesbasedonmarkers, samples, probesorenzymesfor
exporting information from the LIMS database into a standard
Microsoft® Excel spreadsheet for analysis.
Inaddition, astandaloneversionoftheMMP-LIMSScoring
Tool is available that works with an Access database. The
standalone version includes an example database populated
with maize data.
Several security features protect the data in MMP-LIMS
database accessed via MMP-LIMS Scoring Tool. To use the
MMP-LIMS Scoring Tool, the user is required to have a valid
user account and password. The different types of MMP-
LIMS Scoring Tool user accounts provide various levels of
system protection. For example, the administrator is able to
add new users and grant permissions to users for particular
system functions while, by default, new users can only view
the information and enter genotype scores.
The MMP-LIMS system takes advantage of a relational
database management system (RDBMS) for information stor-
age and retrieval. The RDBMS provides several important
functions including inserting, deleting, updating, retrieval,
managing concurrent requests, and handling transaction
issues such as rollback. The main MMP-LIMS database is
composed of more than 50 tables that record primer, locus,
enzymes, probes, samples, templates, users, passwords, note-
books and score information for the daily processes in the lab.
The physical data model can be found on the MMP website
(Maize Mapping Project, 2003, http://www.maizemap.org).
The model design is based on the third normal form approach
(Date, 2002). The MMP-LIMS database dedicates a large por-
tion of its tables to the MMP-LIMS Scoring Tool because of
the high level of functionality that this module provides.
3.2 Managing SSR and RFLP data
InadditiontomanagingSNP/InDeldata, MMP-LIMShandles
data generated to place SSR and RFLP markers on the genetic
map. The tools enable the researcher to locate potential SSRs
andmanagethegenotypescoredata, andencouragecollabora-
tion by providing resources for researchers in remote locations
to enter genotype scores.
The SSR Finder tool serves three major purposes (Fig. 3B).
First, SSR Finder locates SSRs in DNA sequences. Second,
the program designs primer pairs to amplify the SSR-
containing sequence regions. Finally, SSR Finder checks
these primer pairs for uniqueness, removing any redundant
primers.
First, the sequence of interest is entered into the SSR Repeat
Finder module. SSR Repeat Finder returns a list of repeats
(SSRs) and the flanking (surrounding) sequence, which is then
sentasinputintoSSRPrimerDesigner. Thismodulebuildsthe
inputfileforPrimer3foreachSSR,withuser-definedparamet-
ers for primer length, Tm, G/C content, and distance between
forward and reverse primers. The list of potential primers and
associated data from Primer3 is sent to the SSR Primer Rep
module, which runs the SSR Repeat Finder module against
the potential primer pairs and removes primer pairs that con-
tain a simple sequence repeat within the primer sequence.
The SSR Primer BLAST script takes the remaining primers
and their associated data, and uses the SSR sequence plus
the flanking sequence and performs a BLAST search against
all the primer pairs previously discovered in the project. It
also adds each new SSR to the Primer DBlast database after
it is checked. The program formatdb is run to regenerate the
Primer DBlast database. Next, the SSR Primer BLAST mod-
ule returns the BLAST scores for the primers. Based on these
scores, the Order Filter script creates a list of primers with
no BLAST hits and sends the list to Order Formatter. Finally,
2026
byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
Development of an integrated LIMS for the MMP
Order Formatter returns a formatted list along with ordering
information.
The MMP-LIMS Scoring Tool discussed previously is also
used effectively for managing SSR and RFLP genotype score
data. Individuals in the lab can perform the scoring in two
steps. First, one user analyzes the gel images or autoradio-
graphs, entering the score for each sample in a population
or template. Next, a second user verifies the scores by inde-
pendently entering each score again in the row underneath the
original entered scores. Because the letters representing the
scores are color-coded, it is easy for the second individual to
see that the scores match the original scores. The system can
also automatically check for mismatched scores and allow the
user to move from one cell containing a mismatched score to
the next to verify the data.
The CIMDE is a subsystem of the MMP-LIMS system
developed to allow members of the maize community to
remotely enter genotype scores into MMP-LIMS for a subset
of 94 individual lines from the intermated B73xMo17 (IBM)
(Davis et al., 2001) mapping population.
The system is composed of two main components designed
to allow flexibility in the way researchers submit and edit
their scores, while providing an intuitive and easy-to-use inter-
face. First, the file upload function allows the user to quickly
populate the database with a batch of probes and their associ-
ated genotype scores by uploading a tab-delimited text file
via a web-based form. The second component consists of
an in-browser application that gives remote researchers the
opportunity to submit scores manually, edit scores previously
submitted, or delete scores. CIMDE is used primarily for SSR
data, but also allows the researcher to enter SNP and RFLP
information.
MMP-LIMS creates a MapMaker file from the scores sub-
mitted by the user via CIMDE. The PostScript™ version of
the map built by MapMaker is converted to a tabular format
by MMP-LIMS. Both the PostScript™ file and the table are
then e-mailed to the user.
Both components of CIMDE perform extensive validation
of the data based on the type of probe before addition to
the database is permitted. During a submission using the
file upload function, CIMDE checks the validity and the
number of scores for each record. If scores for an RFLP
probe are being processed, the system performs an addi-
tional check to ensure that a restriction enzyme name is
given for each probe. Because each probe name needs to
be unique, the system checks that the probe name does not
already exist in the database under any user’s account. An
insertion or update of records that causes the duplication
of marker names is not permitted by the system, and if
attempted, the system displays a list containing each duplic-
ate record. The manual data-editing tool also validates probe
data. The table in the graphical user interface will not allow
the user to insert invalid score values into its cells, while valid
scores are color-coded for ease of recognition. This validation
and color-coding varies based on the type of probe being
edited.
In order to guard user data and MMP-LIMS data, CIMDE
is equipped with protection features. To ensure that each user
only has access to his or her data, each user must create a user-
name and password and register an individual account with
the system. The user must also make a one-time submission
of a set of control scores to be validated by the system. If the
user’s control scores are correct, it means that the researcher
has performed the experiments correctly and that the gen-
otype scores that he or she is submitting are accepted as
valid scores. Once the user has logged in and has submit-
ted valid control scores, he or she can access both the file
upload and manual data editing functions of the application.
Control scores do not have to be submitted upon subsequent
use of CIMDE.
3.3 Production of the genetic map
Genetic map generation requires both converting genotype
scores from a set of samples into a format readable by Map-
Maker and interpreting the results returned by the software.
MMP-LIMS Scoring Tool creates input files for MapMaker
by retrieving data from MMP-LIMS database. Users can cre-
ate files using all of the mapping data or they can generate a
file for a subset of the data by creating a group and selecting
the markers and samples of interest. The MMP-LIMS Scor-
ing Tool then automatically creates the MapMaker input file.
Data from remote researchers can also be used in the creation
of the file. When needed, the system can convert scores. For
example, for recombinant inbred populations, the score ‘H’
is converted to ‘−’, while for F2 populations, the ‘H’ score
remains unchanged in the MapMaker input file.
Output from MapMaker is also managed by the MMP-
LIMS Scoring Tool. The MMP-LIMS Scoring Tool extracts
genetic map information such as chromosome, map coordin-
ate, framework versus off-frame status for each probe from
the PostScript™ file returned by the MapMaker software and
results are stored in the MMP-LIMS database.
3.4 Public views of MMP data
It is imperative that the data produced by the MMP be easily
viewable by the public. MMP-LIMS provides several displays
of the mapping data, including the MSL, iMap, and cMap.
The MSL provides a web-based interface to accept input
sequence and perform a BLAST search against all public
maize sequences, including the DuPont-MMP Cornsensus
(Maize Mapping Project, 2002, http://www.agron.missouri.
edu/files_dl/MMP/Cornsensus/) unigene set. It returns the
BLAST scores, the map location, and links to related
sequencesifavailable. Theuserentersthenucleotidesequence
via the Common Gateway Interface (CGI) WWW form along
with the name of the sequence and BLAST parameters.
2027
byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
H.Sanchez-Villeda et al.
The CGI then performs a BLAST against a database con-
taining >300 000 Zea mays sequences from GenBank, and
>10 000 sequences from the Cornsensus Unigene set. The
BLAST results are returned in XML format and converted
by the CGI via XSLT. The CGI retrieves the accession num-
bers of related sequences from MaizeDB, and creates an
HTML table containing links to the related map and sequence
data in various databases including GenBank, MaizeDB,
The Institute for Genomic Research (TIGR) (The Institute
for Genomic Research, 2003, http://www.tigr.org/), Gra-
mene (Gramene, 2003, http://www.gramene.org/), the Ari-
zona Genomics Institute (AGI) (Arizona Genomics Institute,
2002, http://www.genome.arizona.edu/), and the Clemson
University Genomics Institute (CUGI) (Clemson University
GenomicsInstitute,2003, http://www.genome.clemson.edu/),
and Zea mays DataBase (ZmDB) (ZmDB, 2003, http://www.
zmdb.iastate.edu/).
The integrated genetic and physical map visualization tool
of MMP-LIMS, iMap (Cone et al., 2002), allows researchers
to access data related to loci on the genetic map along with
their associated contigs on the physical map. The graphical
interface displays the positions of the loci and contigs on the
genetic map and physical map, respectively. Searches may
be conducted based on the locus, probe, GenBank accession
number, or contig number.
The cMap (Fang et al., 2003) function of MMP-LIMS per-
mits the user to select and compare two genetic maps at a time
with dynamic links to data resources and text lists of the shared
loci between the compared maps. Searches can be conducted
based on locus, probe, or GenBank accession number.
4 DISCUSSION
The MMP-LIMS was designed to meet the challenges of a
high-throughput mapping project. Currently, MMP-LIMS is
being used at the Maize Mapping Project at the University of
Missouri—Columbia. The system has been used to enter and
verify 957 SSR markers, 1023 RFLP markers, 189 SNPs, and
177 InDels. MMP-LIMS is used primarily for the maize IBM
mapping population consisting of 360 samples.
MMP-LIMS has also been used for managing 590 SSRs of
the IF2 population with 56 samples and 359 SSRs of the C6
population with 93 samples. The two other populations were
used to map SSRs that are monomorphic in the IBM popula-
tion. The SSR loci from these two populations are integrated
within an enhanced version of the IBM map called the IBM
Neighbors map by interpolating the location of the marker
loci with loci shared between the IBM map and the other
maps (Cone et al., 2002).
Users performing research on a species other than maize can
customize the functions of the MMP-LIMS system by adding
populations, samples, and markers specific to the species
of interest. For example, members of the Soybean Gen-
omics Consortium (Soybean Genomics Consortium, 2003,
http://www.soybeangenome.org) have requested MMP-LIMS
for customization as the system to manage the data produced
in the generation of a genetic map for the soybean genome.
A variety of LIMS systems (Table 2), including the Lab-
Base (Goodman et al., 1998) system, dnaLIMS™ by dnaTools
(dnaTools,2002, http://www.dnatools.com/dnalims.html),the
GeneTrials LIMS system by Waban Software (Waban
Software Inc., 2002, http://www.wabansoftware.com/
Lims.htm), Sapphire Informatics 3.0 by LabVantage
(LabVantage, 2002, http://www.labvantage.com/products_
sapphireinfo.htm), theNautilissystembyThermoLabSystems
(Thermo LabSystems, 2002, http://www.thermolabsystems.
com/news/press/articles/020906-nautilus2002r2.asp), thesys-
tem by Clive G. Brown and Richard Mott from the
Bioinformatics Group at the Wellcome Trust Centre for
Human Genetics (Wellcome Trust Centre for Human
Genetics, 2001, http://bioinformatics.well.ox.ac.uk/project-
lims.html), CimBiosis™ Genotyping Workflow System
(Cimarron Software, Inc., http://www.cimsoft.com/products.
html), and Applied Biosystems GeneMapper™ Software
(Applied Biosystems, 2003) are currently available. However,
thesesoftwarepackagesdonotprovidethesamesetoffeatures
as MMP-LIMS. Several of the software packages provide only
generic interfaces that must be customized before storing lab
data. In addition, these systems do not provide a method for
validating and verifying genotyping scores or for using differ-
ent types of markers to generate an output file for a standard
mapping tool such as MapMaker. Only some of the systems
provide the user with an interface to data from ABI DNA
sequencers. While some systems are entirely web-based, few
of the systems provide a combination of both client/server lab
software in addition to web-based data query and visualization
tools to accommodate both local and remote users. In addi-
tion, the incorporation of sequence analysis tools for SSR and
SNP/InDel experiments is not found in the other packages.
Most of the systems were not designed to specifically handle
different types of genetic markers such as SSRs, RFLPs and
SNP/InDels.
MMP-LIMS is a complete system and the software is freely
available to the public. The system includes several levels of
security, a genotype scoring tool, a data entry tool for remote
researchers to submit data, scripts for designing SSR primers
and for locating potential SNP/InDels, a system for finding
sequences that are similar to a query sequence along with
related database links, and viewers for both an integrated
genetic/physical map and for comparison of genetic maps.
ACKNOWLEDGEMENTS
We would like to thank the members of our advisory com-
mittee including Sue Wessler (chair), Brad Barbazuk, Vicki
Chandler, Joe Ecker, Stan Letovsky and Antoni Rafalski.
Names are necessary to factually report on available data;
2028
byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
Development of an integrated LIMS for the MMP
Table 2. Feature comparisons
Legend
n/l—Not listed
in article or on
software website
MMP-LIMS LabBase dnaLIMS™ GeneTrials™
LIMS
Sapphire
Informatics 3.0
Nautilis System
by Brown
and Mott
CimBiosis™
Genotyping
Workflow
System
Applied
Biosystems
GeneMapper™
Freely available to public y y n n n n y n n
Interface customized for
genetic map data
y n n y n/l n/l n n y
Validation and
verification of
genotype scores
y n n n/l n/l n/l n/l n/l y
Generation of mapMaker
input file with multiple
marker types
y n n n/l n/l n/l n/l n/l n/l
Different security levels y n/l n/l n/l n/l n/l n/l n/l n/l
Interface to data from
ABI DNA sequencers
y n y n/l n/l n/l y y y
Combination of both
client/server lab
software and
web-based data query
and visualization tools
y n n n/l y y y y n/l
Incorporation of sequence
analysis tools for SSR
and SNP/InDel
experiments
y n y n/l n/l n/l n/l n/l n/l
Handling a variety of
genetic marker data
(i.e. SSRs, RFLPs,
SNPs/InDels)
y n n n/l n/l n/l y y y
y, provided; n, not provided
however, neither USDA nor the University of Missouri guar-
antees nor warrants the standard of the product, and the use of
the name implies no approval of the product to the exclusion of
others that may also be suitable. This research was supported
by the National Science Foundation (DBI 9872655).
REFERENCES
Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J.
(1990) Basic local alignment search tool. J. Mol. Biol., 215,
403–410.
Applied Biosystems (2003) Applied Biosystems | Main. Accessed
2003 Feb 4.
Arizona Genomics Institute (2002) Dec 20. AGI Home Page.
Accessed 2003 Feb 4.
Cimarron Software, Inc. (2002) March 1. Cimarron Software, Inc.—
Products. Accessed 2003 Feb 4.
Clemson University Genomics Institute (2003) Jan 29. CUGI:
Clemson University Genomics Institute. Accessed 2003 Feb 4.
Coe,E., Cone,K., McMullen,M., Chen,S., Davis,G., Gardiner,J.,
Liscum,E., Polacco,M., Paterson,A., Sanchez-Villeda,H.,
Soderlund,C. and Wing,R. (2002) Access to the maize genome:
Anintegratedphysicalandgeneticmap.PlantPhysiol., 128, 9–12.
Cone,K., McMullen,M., Vroh Bi,I., Davis,G., Yim,Y.-S.,
Gardiner,J., Polacco,M., Sanchez-Villeda,H., Fang,Z.,
Schroeder,S. et al. (2002) Genetic, physical and informatic
resources for maize: On the road to an integrated map. Plant
Physiol., 130, 1594–1601.
Date,C.J. (2002) An Introduction to Database Systems (Seventh
Edition). Addison Wesley Longman, Inc., Reading, MA.
Davis,G., McMullen,M., Baysdorfer,C., Musket,T., Grant,D.,
Staebell,M.S., Xu,G., Polacco,M., Koster,L., Melia-Hancock,S.
et al. (1999) A maize map standard with sequenced core mark-
ers, grass genome reference points, and 932 ESTs in a 1736-locus
map. Genetics, 152, 1137–1172.
Davis,G., Musket,T., Melia-Hancock,S., Duru,N., Sharopova,N.,
Schultz,L., McMullen,M.D., Sanchez-Villeda,H., Schroeder,S.
and Garcia,A.A. (2001) The intermated B73 x Mo17 genetic map:
a community resource. Maize Genetics Conference Abstracts,
43:W15, 62.
dnaTools (2002) Sep 28. dnaTools. Accessed 2003 Feb 4.
Ewing,B. and Green,P. (1998) Base-calling of automated sequen-
cer traces using phred. II. Error probabilities. Genome Res., 8,
186–194.
Ewing,B., Hillier,L., Wendl,M.C. and Green,P. (1998) Base-calling
of automated sequencer traces using phred. I. Accuracy assess-
ment. Genome Res., 8, 175–185.
Fang,Z., Polacco,M., Chen,S., Schroeder,S., Hancock,D.,
Sanchez,H. and Coe,E. (2003) cMap: the comparative genetic
map viewer. Bioinformatics, 19, 416–417.
2029
byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
H.Sanchez-Villeda et al.
Goodman,N., Rozen,S., Stein,L. and Smith,A. (1998) The Lab-
Base system for data management in large scale biology research
laboratories. Bioinformatics, 14, 562–574.
Gramene (2003) Jan 19. Gramene. Accessed 2003 Feb 4.
LabVantage (2002) Aug 26. Sapphire Informatics 3.0 is a
browser/server-based solution. Accessed 2003 Feb 4.
Lander,E.S., Green,P., Abrahamson,J., Barlow,A., Daly,M.J.,
Lincoln,S.E. and Newburg,I. (1987) MAPMAKER: an interactive
computer package for constructing primary genetic linkage maps
of experimental and natural populations. Genomics, 1, 174–181.
MaizeDB (2003) Jan 27. Maize Genome Database—MaizeDB.
Accessed 2003 Feb 4.
Maize Mapping Project (2003) Jan 30. Maize Mapping Project.
Accessed 2003 Feb 4.
Maize Mapping Project (2002) Oct 8. Cornsensus Sequence Files.
Accessed 2003 Feb 4.
Rice Genome Research Program 2002 Nov 15. Rice Genome
Research Program (RGP) Home Page. Accessed 2003 Feb 4.
Rozen,S. and Skaletsky,H.J. (2000) Primer3 on the WWW for
general users and for biologist programmers. In Krawetz,S.
and Misener,S. (eds), Bioinformatics Methods and Protocols:
Methods in Molecular Biology. Humana Press, Totowa, NJ,
pp. 365–386.
Sharopova,N., McMullen,M.D., Schultz,L., Schroeder,S.,
Sanchez-Villeda,H., Gardiner,J., Bergstrom,D., Houchins,K.,
Melia-Hancock,S., Musket,T. et al. (2002) Development and
mapping of SSR markers for maize. Plant Mol. Biol., 48,
463–481.
Soderlund,C., Humphray,S., Dunham,A. and French,L. (2000) Con-
tigs built with fingerprints, markers and FPC V4.7. Genome Res.,
10, 1772–1787.
Soybean Genomics Consortium (2003) Mar 6. Soybean Genomics
Consortium Accessed 2003 Apr 17.
The Institute for Genomic Research (2003) Jan 23. The Institute for
Genomic Research. Accessed 2003 Feb 4.
Thermo LabSystems (2002) Sep 6. Thermo LabSystems—
Company—News—Press—Thermo LabSystems delivers
Nautilus™ 2002 R2 LIMS. Accessed 2003 Feb 4.
Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTAL W:
Improving the sensitivity of progressive multiple sequence
alignment through sequence weighting, positions-specific gap
penalties and weight matrix choice. Nucleic Acids Res., 22,
4673–4680.
Waban Software Inc. (2002) Mar 26. Waban Software Inc. Accessed
2003 Feb 4.
Wellcome Trust Centre for Human Genetics (2001) Jan 21.
WTCHG Bioinformatics Website: Homepage Accessed 2003
Feb 4.
ZmDB (2003) Jan 22. ZmDB: Maize Genome Database Accessed
2003 Feb 4.
2030
byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom

More Related Content

What's hot

Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
Anshika Bansal
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
HemantAlhat1
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Somdutt Sharma
 
B.sc biochem i bobi u-1 introduction to bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformaticsB.sc biochem i bobi u-1 introduction to bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformatics
Rai University
 
Bioinformatics lecture 1
Bioinformatics lecture 1Bioinformatics lecture 1
Bioinformatics lecture 1
Hamid Ur-Rahman
 
Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu
KAUSHAL SAHU
 
Bioinformatics on internet
Bioinformatics on internetBioinformatics on internet
Bioinformatics on internet
Bahauddin Zakariya University lahore
 
Bioinformatics Software
Bioinformatics SoftwareBioinformatics Software
Bioinformatics Software
university of education,Lahore
 
Bioinformatic tools in Pheromone technology
Bioinformatic tools in Pheromone technologyBioinformatic tools in Pheromone technology
Bioinformatic tools in Pheromone technology
THILAKAR MANI
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Amna Jalil
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
Vinitha Nair
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
Atai Rabby
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Vidya Kalaivani Rajkumar
 
Bioinformatics & It's Scope in Biotechnology
Bioinformatics & It's Scope in BiotechnologyBioinformatics & It's Scope in Biotechnology
Bioinformatics & It's Scope in Biotechnology
Tuhin Samanta
 
biorepository
biorepositorybiorepository
biorepository
Ellie Nawara
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And Challenges
Amos Watentena
 
1.bioinformatics introduction 32.03.2071
1.bioinformatics introduction 32.03.20711.bioinformatics introduction 32.03.2071
1.bioinformatics introduction 32.03.2071
RajDip Basnet
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
ShailendraSinghKhich
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Bivek Rai
 

What's hot (19)

Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
B.sc biochem i bobi u-1 introduction to bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformaticsB.sc biochem i bobi u-1 introduction to bioinformatics
B.sc biochem i bobi u-1 introduction to bioinformatics
 
Bioinformatics lecture 1
Bioinformatics lecture 1Bioinformatics lecture 1
Bioinformatics lecture 1
 
Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu
 
Bioinformatics on internet
Bioinformatics on internetBioinformatics on internet
Bioinformatics on internet
 
Bioinformatics Software
Bioinformatics SoftwareBioinformatics Software
Bioinformatics Software
 
Bioinformatic tools in Pheromone technology
Bioinformatic tools in Pheromone technologyBioinformatic tools in Pheromone technology
Bioinformatic tools in Pheromone technology
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics & It's Scope in Biotechnology
Bioinformatics & It's Scope in BiotechnologyBioinformatics & It's Scope in Biotechnology
Bioinformatics & It's Scope in Biotechnology
 
biorepository
biorepositorybiorepository
biorepository
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And Challenges
 
1.bioinformatics introduction 32.03.2071
1.bioinformatics introduction 32.03.20711.bioinformatics introduction 32.03.2071
1.bioinformatics introduction 32.03.2071
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 

Similar to LIMS for maize mapping project

Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)
Dag Endresen
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
Sangeeta Das
 
D1803012022
D1803012022D1803012022
D1803012022
IOSR Journals
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...
CSCJournals
 
Trait data mining using FIGS (2006)
Trait data mining using FIGS (2006)Trait data mining using FIGS (2006)
Trait data mining using FIGS (2006)
Dag Endresen
 
2015 Summer - Araport Project Overview Leaflet
2015 Summer - Araport Project Overview Leaflet2015 Summer - Araport Project Overview Leaflet
2015 Summer - Araport Project Overview Leaflet
Araport
 
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 API-Centric Data Integration for Human Genomics Reference Databases: Achieve... API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
Genomika Diagnósticos
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
xRowlet
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysis
Dr. Naveen Gaurav srivastava
 
dkNET Webinar: The Human BioMolecular Atlas Program (HuBMAP) 10/14/2022
dkNET Webinar: The Human BioMolecular Atlas Program (HuBMAP) 10/14/2022dkNET Webinar: The Human BioMolecular Atlas Program (HuBMAP) 10/14/2022
dkNET Webinar: The Human BioMolecular Atlas Program (HuBMAP) 10/14/2022
dkNET
 
Closing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real ScienceClosing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real Science
Justin Johnson
 
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
ijitcs
 
The rat genome database - genome browser
The rat genome database  - genome browserThe rat genome database  - genome browser
The rat genome database - genome browser
Jennifer Smith
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
Michael Atkins
 
Accelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsAccelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methods
Priscill Orue Esquivel
 
Next generation sequencing by Muhammad Abbas
Next generation sequencing by Muhammad AbbasNext generation sequencing by Muhammad Abbas
Next generation sequencing by Muhammad Abbas
MuhammadAbbaskhan9
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiers
Benjamin Good
 
Maize database
Maize database Maize database
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET Journal
 
Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244
Yasel Cruz
 

Similar to LIMS for maize mapping project (20)

Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
D1803012022
D1803012022D1803012022
D1803012022
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...
 
Trait data mining using FIGS (2006)
Trait data mining using FIGS (2006)Trait data mining using FIGS (2006)
Trait data mining using FIGS (2006)
 
2015 Summer - Araport Project Overview Leaflet
2015 Summer - Araport Project Overview Leaflet2015 Summer - Araport Project Overview Leaflet
2015 Summer - Araport Project Overview Leaflet
 
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 API-Centric Data Integration for Human Genomics Reference Databases: Achieve... API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysis
 
dkNET Webinar: The Human BioMolecular Atlas Program (HuBMAP) 10/14/2022
dkNET Webinar: The Human BioMolecular Atlas Program (HuBMAP) 10/14/2022dkNET Webinar: The Human BioMolecular Atlas Program (HuBMAP) 10/14/2022
dkNET Webinar: The Human BioMolecular Atlas Program (HuBMAP) 10/14/2022
 
Closing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real ScienceClosing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real Science
 
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
 
The rat genome database - genome browser
The rat genome database  - genome browserThe rat genome database  - genome browser
The rat genome database - genome browser
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
Accelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsAccelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methods
 
Next generation sequencing by Muhammad Abbas
Next generation sequencing by Muhammad AbbasNext generation sequencing by Muhammad Abbas
Next generation sequencing by Muhammad Abbas
 
Branch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiersBranch: An interactive, web-based tool for building decision tree classifiers
Branch: An interactive, web-based tool for building decision tree classifiers
 
Maize database
Maize database Maize database
Maize database
 
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
 
Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244
 

Recently uploaded

Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 

Recently uploaded (20)

Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 

LIMS for maize mapping project

  • 1. BIOINFORMATICS Vol. 19 no. 16 2003, pages 2022–2030 DOI: 10.1093/bioinformatics/btg274 Development of an integrated laboratory information management system for the maize mapping project H. Sanchez-Villeda1, S. Schroeder1, M. Polacco1,3, M. McMullen1,3, S. Havermann1, G. Davis1, I. Vroh-Bi1, K. Cone2, N. Sharopova1, Y. Yim1, L. Schultz1, N. Duru1, T. Musket1, K. Houchins3, Z. Fang1, J. Gardiner1 and E. Coe1,3,∗ 1Department of Agronomy, 2Division of Biological Sciences and 3USDA-ARS, University of Missouri, Columbia, MO 65211, USA Received on February 4, 2003; revised on April 18, 2003; accepted on May 3, 2003 ABSTRACT Motivation: The development of an integrated genetic and physical map for the maize genome involves the generation of an enormous amount of data. Managing this data requires a system to aid in genotype scoring for different types of markers coming from both local and remote users. In addi- tion, researchers need an efficient way to interact with genetic mapping software and with data files from automated DNA sequencing. They also need ways to manage primer data for mapping and sequencing and provide views of the integ- rated physical and genetic map and views of genetic map comparisons. Results: The MMP-LIMS system has been used successfully in a high-throughput mapping environment. The genotypes from 957 SSR, 1023 RFLP, 189 SNP, and 177 InDel mark- ers have been entered and verified via MMP-LIMS.The system is flexible, and can be easily modified to manage data for other species. The software is freely available. Availability: To receive a copy of the iMap or cMap software, please fill out the form on our website. The other MMP-LIMS software is freely available at http://www.maizemap.org/ bioinformatics.htm. Contact: coee@missouri.edu 1 INTRODUCTION The maize mapping project (MMP) aims to develop an integ- rated physical and genetic map of maize. This resource will be useful for marker-assisted selection, map-based cloning, and comparative genomics of crops, and will undergird sequen- cing of the maize genome (Cone et al., 2002). To achieve this goal, the MMP has utilized and developed DNA markers ∗To whom correspondence should be addressed. including 1023 restriction fragment length polymorphisms (RFLPs), 957 simple sequence repeats (SSRs), 10 000 over- gos, 189 single nucleotide polymorphisms (SNPs), and 177 insertion/deletion (InDel) polymorphisms (Davis et al., 1999; Sharopova et al., 2002). These markers have been used to develop a high-resolution genetic map used as the framework to anchor bacterial artificial chromosome (BAC) contigs. This process requires high-throughput sequencing, and high-throughput SNP/InDel genotyping. The amount of data produced is enormous. The Missouri compon- ent of the MMP team is divided into different laborator- ies dispersed throughout the campus, simultaneously using and producing different parts of the same core data. The genetic mapping populations involve different subsets of indi- viduals with their respective molecular marker data to be stored, managed, and integrated into the maps. Optimal use of these data requires effective methods of analysis and management. Furthermore, the data produced in the MMP must be disseminated to the scientific community through informatics tools capable of handling the high volume of data. The requirements for laboratory databases vary consider- ably from project to project. At present, many laboratories use spreadsheets to manage their data. In this paper, we present the MMP laboratory information management system (MMP- LIMS) that we have developed to provide several functions: (1) allow data management with detailed record keeping, reporting, and retrieving; (2) ensure data quality and access- ibility to the scientific community and (3) disseminate the integrated map of maize to the scientific community through web-based tools. This research is an example of application of informatics to practical biology and agronomy questions. An overview of the MMP-LIMS components and their functions is shown in Table 1. 2022 Published by Oxford University Press. byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
  • 2. Development of an integrated LIMS for the MMP Table 1. MMP-LIMS functions MMP-LIMS component Summary MMP-LIMS Scoring Tool Serves as a laboratory notebook for wet lab researchers Allows researchers in different laboratories to interact with MMP-LIMS database Manages genotype data from RFLP, SSR, SNP, InDel marker types Validates genotype scores based on repeat reads Interfaces with ABI Prism Genotyper software, converting trace file data into SNP scores Allows user to make custom templates for genotype output/entry Creates input files for MapMaker by chromosome or data set Integrates information returned by the MapMaker software Also exists in a publicly available standalone version utilizing an Access database and includes an example database Community IBM Map Data Entry Tool (CIMDE) Allows researchers at remote locations to enter genotype scores into the MMP-LIMS database via web-based interface Provides a mechanism for uploading tab-delimited score file and for uploading scores for a single marker Validates genotype scores for control loci Validates genotype scores based on marker type SSR Finder Locates SSRs in DNA sequences Designs unique primer pairs to amplify SSR sequences SNP Discovery Primer Design Designs primers for finding potential SNPs Performs BLAST search against existing primers SNP/InDel Finder Calculates base frequencies in each position in a sequence alignment Searches for gaps in a sequence alignment representing InDels Mapped Sequence Locator (MSL) Accepts sequence via a web-based interface and performs a BLAST search against all public maize sequence Returns BLAST scores, genetic map locations, and links related to sequences if available iMap Graphically displays an integrated genetic and physical map Displays genetic marker data and contig data Performs search for a map location based on locus, probe, GenBank accession, or contig number Provides links to current WebFPC (Soderlund et al., 2000) assembly Displays anchors based on a set of data filters that remove ambiguous assignments cMap Displays comparative associations between two genetic maps Gives the user text lists of the shared loci between the compared maps 2 SYSTEMS AND METHODS Several technologies were employed during the development of MMP-LIMS. The programming languages used in both the user interface for the local wet lab researchers and the remote researchers reflect an interest in creating a highly mod- ular system and in providing each user with an efficient and intuitive user interface. The user interface for the local wet lab researchers was implemented as a Visual Basic® 6.0 client application. This also provides not only efficient performance for the user, but also a well-structured environment for development. The system’s client–server architecture permits many users concurrent access to a central database. Object Database Connectivity (ODBC) provides interoperability, connects the client application to the database, and allows interaction with MaizeDB (MaizeDB, 2003, http://www.agron.missouri.edu/) through proxy tables. The web-based user interface for remote researchers util- izes HTML for the static content. Java™ applets are used for the other functions and give the user a more interact- ive and straightforward interface than those attainable with HTML forms. Java™ servlets and Java Database Connectivity (JDBC™) transfer data to and from the database. The sequence analysis modules of MMP-LIMS were imple- mented in Perl and make use of other publicly available programs including Primer3 (Rozen and Skaletsky, 2000), BLAST (Altschul et al., 1990), phred (Ewing et al., 1998), phrap (Ewing and Green, 1998), and clustalw (Thompson et al., 1994). The web-based sequence comparison module utilizes Perl (CGI/DBI) along with XML/XSLT and the BLAST program. The web-based integrated genetic and physical map dis- play application and the comparative mapping viewer were adapted from software used in the Rice Genome Project (Rice Genome Research Program, 2002, http://rgp.dna.affrc.go.jp/). Originally utilizing data stored via flat files, the code for the integrated map viewer was converted to allow data retrieval from the database via servlet communication. The user 2023 byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
  • 3. H.Sanchez-Villeda et al. Client-Server Tool MMP- LIMS Scoring Tool Sequence Analysis Modules SNP/InDel Pipeline SNP Discovery Primer Design SNP/InDel Finder SSR Finder MMP-LIMS Web-Based Tools CIMDE Mapped Sequence Locator (MSL) iMap Viewer MMP-LIMS Users Local Wet Lab Users Remote Wet Lab Users Web Users cMap Viewer Fig. 1. MMP-LIMS context diagram. The modules of MMP-LIMS are shown, including the client–server tool, multiple web-based tools and sequence analysis modules for SSRs and SNPs/InDels. interface employs a combination of a Java™ applet and Perl CGI (Cone et al., 2002). The MMP-LIMS data are stored in a Sybase® Adaptive Server Enterprise 11.9.2 relational database. The database resides on a Dell Precision 330 running Redhat 7.3 with a 2.4.18-5 kernel. An additional standalone version of MMP- LIMS exists with the same functionality and was designed to work with a Microsoft® Access database. 3 IMPLEMENTATION MMP-LIMS provides data management for the processes involved in generating a high-resolution genetic map includ- ing managing SNP/InDel data, managing SSR and RFLP data, generating the genetic map, and providing public views of the data. The modules comprising MMP-LIMS can be viewed as elements of a system context diagram (Fig. 1). The modules include the MMP-LIMS Scoring Tool (Fig. 2), the Community IBM Map Data Entry Tool (CIMDE), SSR Finder, SNP Discovery Primer Design, SNP/InDel Finder, the Mapped Sequence Locator (MSL), the integrated genetic and physical map viewer (iMap), and the comparative mapping viewer (cMap). 3.1 Managing SNP/InDel information MMP-LIMS manages data from each step in the process of placing SNPs/InDels on the genetic map, from finding poten- tial SNPs/InDels with the SNP/InDel pipeline to managing the genotype score data with MMP-LIMS Scoring Tool and generating files for MapMaker (Lander et al., 1987) software. The SNP/InDel pipeline works in two steps (Fig. 3A). First, SNP primers are designed with the SNP Discovery Primer Design module. Then the resulting primers are used to process sequences in order to find SNPs and/or InDels via SNP/InDel Finder. The first step in SNP discovery is to sequence a region of DNA across multiple lines of maize to detect nucleotide polymorphisms. The DNA segments for sequencing are amp- lified using primer pairs designed with the SNP Discovery Primer Design module. DNA sequence is entered into the module, along with para- meters including distance between primer pairs and region of the sequence to search for primer pairs. Using the given parameters, this script builds an input file for Primer3. The resulting primers are returned from Primer3, and the SNP Dis- covery Primer Design module checks for repeats in the primer sequence and rejects those with repeats. The script can also be set to perform a BLAST search with the primers against all previously designed primers. The output of the script is a list of unique SNP discovery primers. The primers from the SNP Discovery Primer Design mod- ule are used to amplify and sequence DNA in 12 different lines of maize. Base calling of the resulting forward and reverse sequencing trace files is performed by phred. For- ward and reverse output sequence is trimmed based on the primers and quality scores and each sequence is stored in a single file. The quality scores are stored in a separate 2024 byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
  • 4. Development of an integrated LIMS for the MMP Genotype Scores Genotype Scores SNP Genotype Data Genotyping Information Added via Catalogs Genotyping Information User Interface for Genotype Score Entry and Verification Experimental Conditions / Setup Information Map Data / Segregation File Data Genetic Map Data Genotype Data for MapMaker Input File Generation MMP- LIMS Database MMP-LIMS Scoring Tool Lab Notebook Function Catalogs Primers Samples Populations Templates CIMDE (Remote Genotype Score Entry and Verification) External Software ABI Prism® Genotyper® Software MapMaker Fig. 2. MMP-LIMS scoring tool overview. The genotype score management functions of MMP-LIMS Scoring Tool, including catalog-based management of sample data and lab notebook, are shown. The diagram also includes the interfaces with MapMaker and Genotyper software, and the remote genotype score entry tool—CIMDE. file. Sequence assembly is then performed by phrap. Next, a script combines the sequences into 12-sequence groups with each group corresponding to a single SNP discovery (dSNP) primer pair. The clustalw program then aligns the sequences for each primer pair and sends the output into the SNP/InDel Finder script to calculate base frequencies at each position in an alignment. If 12 out of 12 sequences contain the same base at a position, then no SNP is present. If one sequence is different at a position than the other 11 (1 : 11), then the possibility of a SNP is questionable. Candidate SNPs are defined by positions where at least two sequences are different from the other 10 (2 : 10) or better—(3 : 9), (4 : 8), (5 : 7) or (6 : 6). SNP/InDel Finder also looks for gaps in the alignment representing insertions/deletions (InDels). These polymorphisms can then be used for genotype analysis by the wet lab group. To manage SNP data, the MMP-LIMS Scoring Tool enables wet lab researchers in several different laboratories to perform genotypescoringandmanagegenotypingdata. WhiletheIBM mapping population has 360 individuals, the tool can handle a virtually unlimited number of individuals. MMP-LIMS uses catalogs to manage and maintain data related to these fields (Fig. 2). For example, through an interface for the catalogs, the user can add, edit or delete SNP or InDel primers. The system validates the information and checks the integrity of the data among the other tables. When deleting from the cata- log, the system checks the database tables for consistency. In particular, if a primer is already in use in a record, then a user cannot delete that primer from MMP-LIMS. Only the master user has the ability to perform this type of ‘cascading’ delete, deleting all references to that primer. The templates catalog allows the user to create a subset of a population’s samples for use in specific experiments. The user can create a samples template, and then link the appropriate samples to the template. The MMP-LIMS Scoring Tool provides interfaces to convert ABI Prism® Genotyper® (Applied Biosystems, 2003, http://www.appliedbiosystems.com/products/)filesintosegre- gation files and to import them to MMP-LIMS database. To convert ABI Genotyper® files, the MMP-LIMS provides a color template where users enter values of the base pair peaks generated in the ABI sequencer for the two parental lines used in the IBM population (B73/Mo17). Then MMP-LIMS receives the ABI Genotyper® file, which contains the allele 2025 byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
  • 5. H.Sanchez-Villeda et al. Add each new SSR to the database SNP Discovery Primers Used to Amplify and Sequence in N Different Genetic Lines 1. Use Primer3 to create primers 2. Filter out primers with inverted repeats Sequence and SNP Parameters 1. Perform base calling with phred 2. Trim for primers (sequence and quality) 3. Use phrap for sequence assembly 4. Group sequences with same dSNP primer 5. Align sequences for 1 primer pair (clustalw) 6. Look for base frequencies at each position in the alignment with SNP/InDel Finder File of Positional Base Frequencies (i.e. SNPs and InDels) SNP/InDel Pipeline 1. Find repeats and generate primers 2. Check against previously discovered primers Sequence Formatted List of Primers with Ordering Information Primer DBlast Database SSR Finder A B Fig. 3. Sequence analysis modules. The functions performed by the two sequence analysis modules of MMP-LIMS are shown. The steps performed by the SNP/InDel Pipeline to find primers and discover potential SNPs/InDels are given in (A), and the process performed by SSR Finder to locate SSRs in sequence and design unique primers is outlined in (B). information for the SNP experiment, and based on the color template, processes the information and converts it into a segregation file. The segregation data, consisting of a list of scores for each SNP marker, are stored in the MMP-LIMS database. TheMMP-LIMSScoringToolalsofunctionsasalaboratory notebook for wet lab researchers. Users may store information about specific experimental conditions including gel compos- ition and the primer sequences used. The notebook also stores data related to setup, such as microtiter plate layout. The MMP-LIMS Scoring Tool also offers a web-based query-by-example interface that allows users to create their ownqueriesbasedonmarkers, samples, probesorenzymesfor exporting information from the LIMS database into a standard Microsoft® Excel spreadsheet for analysis. Inaddition, astandaloneversionoftheMMP-LIMSScoring Tool is available that works with an Access database. The standalone version includes an example database populated with maize data. Several security features protect the data in MMP-LIMS database accessed via MMP-LIMS Scoring Tool. To use the MMP-LIMS Scoring Tool, the user is required to have a valid user account and password. The different types of MMP- LIMS Scoring Tool user accounts provide various levels of system protection. For example, the administrator is able to add new users and grant permissions to users for particular system functions while, by default, new users can only view the information and enter genotype scores. The MMP-LIMS system takes advantage of a relational database management system (RDBMS) for information stor- age and retrieval. The RDBMS provides several important functions including inserting, deleting, updating, retrieval, managing concurrent requests, and handling transaction issues such as rollback. The main MMP-LIMS database is composed of more than 50 tables that record primer, locus, enzymes, probes, samples, templates, users, passwords, note- books and score information for the daily processes in the lab. The physical data model can be found on the MMP website (Maize Mapping Project, 2003, http://www.maizemap.org). The model design is based on the third normal form approach (Date, 2002). The MMP-LIMS database dedicates a large por- tion of its tables to the MMP-LIMS Scoring Tool because of the high level of functionality that this module provides. 3.2 Managing SSR and RFLP data InadditiontomanagingSNP/InDeldata, MMP-LIMShandles data generated to place SSR and RFLP markers on the genetic map. The tools enable the researcher to locate potential SSRs andmanagethegenotypescoredata, andencouragecollabora- tion by providing resources for researchers in remote locations to enter genotype scores. The SSR Finder tool serves three major purposes (Fig. 3B). First, SSR Finder locates SSRs in DNA sequences. Second, the program designs primer pairs to amplify the SSR- containing sequence regions. Finally, SSR Finder checks these primer pairs for uniqueness, removing any redundant primers. First, the sequence of interest is entered into the SSR Repeat Finder module. SSR Repeat Finder returns a list of repeats (SSRs) and the flanking (surrounding) sequence, which is then sentasinputintoSSRPrimerDesigner. Thismodulebuildsthe inputfileforPrimer3foreachSSR,withuser-definedparamet- ers for primer length, Tm, G/C content, and distance between forward and reverse primers. The list of potential primers and associated data from Primer3 is sent to the SSR Primer Rep module, which runs the SSR Repeat Finder module against the potential primer pairs and removes primer pairs that con- tain a simple sequence repeat within the primer sequence. The SSR Primer BLAST script takes the remaining primers and their associated data, and uses the SSR sequence plus the flanking sequence and performs a BLAST search against all the primer pairs previously discovered in the project. It also adds each new SSR to the Primer DBlast database after it is checked. The program formatdb is run to regenerate the Primer DBlast database. Next, the SSR Primer BLAST mod- ule returns the BLAST scores for the primers. Based on these scores, the Order Filter script creates a list of primers with no BLAST hits and sends the list to Order Formatter. Finally, 2026 byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
  • 6. Development of an integrated LIMS for the MMP Order Formatter returns a formatted list along with ordering information. The MMP-LIMS Scoring Tool discussed previously is also used effectively for managing SSR and RFLP genotype score data. Individuals in the lab can perform the scoring in two steps. First, one user analyzes the gel images or autoradio- graphs, entering the score for each sample in a population or template. Next, a second user verifies the scores by inde- pendently entering each score again in the row underneath the original entered scores. Because the letters representing the scores are color-coded, it is easy for the second individual to see that the scores match the original scores. The system can also automatically check for mismatched scores and allow the user to move from one cell containing a mismatched score to the next to verify the data. The CIMDE is a subsystem of the MMP-LIMS system developed to allow members of the maize community to remotely enter genotype scores into MMP-LIMS for a subset of 94 individual lines from the intermated B73xMo17 (IBM) (Davis et al., 2001) mapping population. The system is composed of two main components designed to allow flexibility in the way researchers submit and edit their scores, while providing an intuitive and easy-to-use inter- face. First, the file upload function allows the user to quickly populate the database with a batch of probes and their associ- ated genotype scores by uploading a tab-delimited text file via a web-based form. The second component consists of an in-browser application that gives remote researchers the opportunity to submit scores manually, edit scores previously submitted, or delete scores. CIMDE is used primarily for SSR data, but also allows the researcher to enter SNP and RFLP information. MMP-LIMS creates a MapMaker file from the scores sub- mitted by the user via CIMDE. The PostScript™ version of the map built by MapMaker is converted to a tabular format by MMP-LIMS. Both the PostScript™ file and the table are then e-mailed to the user. Both components of CIMDE perform extensive validation of the data based on the type of probe before addition to the database is permitted. During a submission using the file upload function, CIMDE checks the validity and the number of scores for each record. If scores for an RFLP probe are being processed, the system performs an addi- tional check to ensure that a restriction enzyme name is given for each probe. Because each probe name needs to be unique, the system checks that the probe name does not already exist in the database under any user’s account. An insertion or update of records that causes the duplication of marker names is not permitted by the system, and if attempted, the system displays a list containing each duplic- ate record. The manual data-editing tool also validates probe data. The table in the graphical user interface will not allow the user to insert invalid score values into its cells, while valid scores are color-coded for ease of recognition. This validation and color-coding varies based on the type of probe being edited. In order to guard user data and MMP-LIMS data, CIMDE is equipped with protection features. To ensure that each user only has access to his or her data, each user must create a user- name and password and register an individual account with the system. The user must also make a one-time submission of a set of control scores to be validated by the system. If the user’s control scores are correct, it means that the researcher has performed the experiments correctly and that the gen- otype scores that he or she is submitting are accepted as valid scores. Once the user has logged in and has submit- ted valid control scores, he or she can access both the file upload and manual data editing functions of the application. Control scores do not have to be submitted upon subsequent use of CIMDE. 3.3 Production of the genetic map Genetic map generation requires both converting genotype scores from a set of samples into a format readable by Map- Maker and interpreting the results returned by the software. MMP-LIMS Scoring Tool creates input files for MapMaker by retrieving data from MMP-LIMS database. Users can cre- ate files using all of the mapping data or they can generate a file for a subset of the data by creating a group and selecting the markers and samples of interest. The MMP-LIMS Scor- ing Tool then automatically creates the MapMaker input file. Data from remote researchers can also be used in the creation of the file. When needed, the system can convert scores. For example, for recombinant inbred populations, the score ‘H’ is converted to ‘−’, while for F2 populations, the ‘H’ score remains unchanged in the MapMaker input file. Output from MapMaker is also managed by the MMP- LIMS Scoring Tool. The MMP-LIMS Scoring Tool extracts genetic map information such as chromosome, map coordin- ate, framework versus off-frame status for each probe from the PostScript™ file returned by the MapMaker software and results are stored in the MMP-LIMS database. 3.4 Public views of MMP data It is imperative that the data produced by the MMP be easily viewable by the public. MMP-LIMS provides several displays of the mapping data, including the MSL, iMap, and cMap. The MSL provides a web-based interface to accept input sequence and perform a BLAST search against all public maize sequences, including the DuPont-MMP Cornsensus (Maize Mapping Project, 2002, http://www.agron.missouri. edu/files_dl/MMP/Cornsensus/) unigene set. It returns the BLAST scores, the map location, and links to related sequencesifavailable. Theuserentersthenucleotidesequence via the Common Gateway Interface (CGI) WWW form along with the name of the sequence and BLAST parameters. 2027 byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
  • 7. H.Sanchez-Villeda et al. The CGI then performs a BLAST against a database con- taining >300 000 Zea mays sequences from GenBank, and >10 000 sequences from the Cornsensus Unigene set. The BLAST results are returned in XML format and converted by the CGI via XSLT. The CGI retrieves the accession num- bers of related sequences from MaizeDB, and creates an HTML table containing links to the related map and sequence data in various databases including GenBank, MaizeDB, The Institute for Genomic Research (TIGR) (The Institute for Genomic Research, 2003, http://www.tigr.org/), Gra- mene (Gramene, 2003, http://www.gramene.org/), the Ari- zona Genomics Institute (AGI) (Arizona Genomics Institute, 2002, http://www.genome.arizona.edu/), and the Clemson University Genomics Institute (CUGI) (Clemson University GenomicsInstitute,2003, http://www.genome.clemson.edu/), and Zea mays DataBase (ZmDB) (ZmDB, 2003, http://www. zmdb.iastate.edu/). The integrated genetic and physical map visualization tool of MMP-LIMS, iMap (Cone et al., 2002), allows researchers to access data related to loci on the genetic map along with their associated contigs on the physical map. The graphical interface displays the positions of the loci and contigs on the genetic map and physical map, respectively. Searches may be conducted based on the locus, probe, GenBank accession number, or contig number. The cMap (Fang et al., 2003) function of MMP-LIMS per- mits the user to select and compare two genetic maps at a time with dynamic links to data resources and text lists of the shared loci between the compared maps. Searches can be conducted based on locus, probe, or GenBank accession number. 4 DISCUSSION The MMP-LIMS was designed to meet the challenges of a high-throughput mapping project. Currently, MMP-LIMS is being used at the Maize Mapping Project at the University of Missouri—Columbia. The system has been used to enter and verify 957 SSR markers, 1023 RFLP markers, 189 SNPs, and 177 InDels. MMP-LIMS is used primarily for the maize IBM mapping population consisting of 360 samples. MMP-LIMS has also been used for managing 590 SSRs of the IF2 population with 56 samples and 359 SSRs of the C6 population with 93 samples. The two other populations were used to map SSRs that are monomorphic in the IBM popula- tion. The SSR loci from these two populations are integrated within an enhanced version of the IBM map called the IBM Neighbors map by interpolating the location of the marker loci with loci shared between the IBM map and the other maps (Cone et al., 2002). Users performing research on a species other than maize can customize the functions of the MMP-LIMS system by adding populations, samples, and markers specific to the species of interest. For example, members of the Soybean Gen- omics Consortium (Soybean Genomics Consortium, 2003, http://www.soybeangenome.org) have requested MMP-LIMS for customization as the system to manage the data produced in the generation of a genetic map for the soybean genome. A variety of LIMS systems (Table 2), including the Lab- Base (Goodman et al., 1998) system, dnaLIMS™ by dnaTools (dnaTools,2002, http://www.dnatools.com/dnalims.html),the GeneTrials LIMS system by Waban Software (Waban Software Inc., 2002, http://www.wabansoftware.com/ Lims.htm), Sapphire Informatics 3.0 by LabVantage (LabVantage, 2002, http://www.labvantage.com/products_ sapphireinfo.htm), theNautilissystembyThermoLabSystems (Thermo LabSystems, 2002, http://www.thermolabsystems. com/news/press/articles/020906-nautilus2002r2.asp), thesys- tem by Clive G. Brown and Richard Mott from the Bioinformatics Group at the Wellcome Trust Centre for Human Genetics (Wellcome Trust Centre for Human Genetics, 2001, http://bioinformatics.well.ox.ac.uk/project- lims.html), CimBiosis™ Genotyping Workflow System (Cimarron Software, Inc., http://www.cimsoft.com/products. html), and Applied Biosystems GeneMapper™ Software (Applied Biosystems, 2003) are currently available. However, thesesoftwarepackagesdonotprovidethesamesetoffeatures as MMP-LIMS. Several of the software packages provide only generic interfaces that must be customized before storing lab data. In addition, these systems do not provide a method for validating and verifying genotyping scores or for using differ- ent types of markers to generate an output file for a standard mapping tool such as MapMaker. Only some of the systems provide the user with an interface to data from ABI DNA sequencers. While some systems are entirely web-based, few of the systems provide a combination of both client/server lab software in addition to web-based data query and visualization tools to accommodate both local and remote users. In addi- tion, the incorporation of sequence analysis tools for SSR and SNP/InDel experiments is not found in the other packages. Most of the systems were not designed to specifically handle different types of genetic markers such as SSRs, RFLPs and SNP/InDels. MMP-LIMS is a complete system and the software is freely available to the public. The system includes several levels of security, a genotype scoring tool, a data entry tool for remote researchers to submit data, scripts for designing SSR primers and for locating potential SNP/InDels, a system for finding sequences that are similar to a query sequence along with related database links, and viewers for both an integrated genetic/physical map and for comparison of genetic maps. ACKNOWLEDGEMENTS We would like to thank the members of our advisory com- mittee including Sue Wessler (chair), Brad Barbazuk, Vicki Chandler, Joe Ecker, Stan Letovsky and Antoni Rafalski. Names are necessary to factually report on available data; 2028 byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
  • 8. Development of an integrated LIMS for the MMP Table 2. Feature comparisons Legend n/l—Not listed in article or on software website MMP-LIMS LabBase dnaLIMS™ GeneTrials™ LIMS Sapphire Informatics 3.0 Nautilis System by Brown and Mott CimBiosis™ Genotyping Workflow System Applied Biosystems GeneMapper™ Freely available to public y y n n n n y n n Interface customized for genetic map data y n n y n/l n/l n n y Validation and verification of genotype scores y n n n/l n/l n/l n/l n/l y Generation of mapMaker input file with multiple marker types y n n n/l n/l n/l n/l n/l n/l Different security levels y n/l n/l n/l n/l n/l n/l n/l n/l Interface to data from ABI DNA sequencers y n y n/l n/l n/l y y y Combination of both client/server lab software and web-based data query and visualization tools y n n n/l y y y y n/l Incorporation of sequence analysis tools for SSR and SNP/InDel experiments y n y n/l n/l n/l n/l n/l n/l Handling a variety of genetic marker data (i.e. SSRs, RFLPs, SNPs/InDels) y n n n/l n/l n/l y y y y, provided; n, not provided however, neither USDA nor the University of Missouri guar- antees nor warrants the standard of the product, and the use of the name implies no approval of the product to the exclusion of others that may also be suitable. This research was supported by the National Science Foundation (DBI 9872655). REFERENCES Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410. Applied Biosystems (2003) Applied Biosystems | Main. Accessed 2003 Feb 4. Arizona Genomics Institute (2002) Dec 20. AGI Home Page. Accessed 2003 Feb 4. Cimarron Software, Inc. (2002) March 1. Cimarron Software, Inc.— Products. Accessed 2003 Feb 4. Clemson University Genomics Institute (2003) Jan 29. CUGI: Clemson University Genomics Institute. Accessed 2003 Feb 4. Coe,E., Cone,K., McMullen,M., Chen,S., Davis,G., Gardiner,J., Liscum,E., Polacco,M., Paterson,A., Sanchez-Villeda,H., Soderlund,C. and Wing,R. (2002) Access to the maize genome: Anintegratedphysicalandgeneticmap.PlantPhysiol., 128, 9–12. Cone,K., McMullen,M., Vroh Bi,I., Davis,G., Yim,Y.-S., Gardiner,J., Polacco,M., Sanchez-Villeda,H., Fang,Z., Schroeder,S. et al. (2002) Genetic, physical and informatic resources for maize: On the road to an integrated map. Plant Physiol., 130, 1594–1601. Date,C.J. (2002) An Introduction to Database Systems (Seventh Edition). Addison Wesley Longman, Inc., Reading, MA. Davis,G., McMullen,M., Baysdorfer,C., Musket,T., Grant,D., Staebell,M.S., Xu,G., Polacco,M., Koster,L., Melia-Hancock,S. et al. (1999) A maize map standard with sequenced core mark- ers, grass genome reference points, and 932 ESTs in a 1736-locus map. Genetics, 152, 1137–1172. Davis,G., Musket,T., Melia-Hancock,S., Duru,N., Sharopova,N., Schultz,L., McMullen,M.D., Sanchez-Villeda,H., Schroeder,S. and Garcia,A.A. (2001) The intermated B73 x Mo17 genetic map: a community resource. Maize Genetics Conference Abstracts, 43:W15, 62. dnaTools (2002) Sep 28. dnaTools. Accessed 2003 Feb 4. Ewing,B. and Green,P. (1998) Base-calling of automated sequen- cer traces using phred. II. Error probabilities. Genome Res., 8, 186–194. Ewing,B., Hillier,L., Wendl,M.C. and Green,P. (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assess- ment. Genome Res., 8, 175–185. Fang,Z., Polacco,M., Chen,S., Schroeder,S., Hancock,D., Sanchez,H. and Coe,E. (2003) cMap: the comparative genetic map viewer. Bioinformatics, 19, 416–417. 2029 byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
  • 9. H.Sanchez-Villeda et al. Goodman,N., Rozen,S., Stein,L. and Smith,A. (1998) The Lab- Base system for data management in large scale biology research laboratories. Bioinformatics, 14, 562–574. Gramene (2003) Jan 19. Gramene. Accessed 2003 Feb 4. LabVantage (2002) Aug 26. Sapphire Informatics 3.0 is a browser/server-based solution. Accessed 2003 Feb 4. Lander,E.S., Green,P., Abrahamson,J., Barlow,A., Daly,M.J., Lincoln,S.E. and Newburg,I. (1987) MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics, 1, 174–181. MaizeDB (2003) Jan 27. Maize Genome Database—MaizeDB. Accessed 2003 Feb 4. Maize Mapping Project (2003) Jan 30. Maize Mapping Project. Accessed 2003 Feb 4. Maize Mapping Project (2002) Oct 8. Cornsensus Sequence Files. Accessed 2003 Feb 4. Rice Genome Research Program 2002 Nov 15. Rice Genome Research Program (RGP) Home Page. Accessed 2003 Feb 4. Rozen,S. and Skaletsky,H.J. (2000) Primer3 on the WWW for general users and for biologist programmers. In Krawetz,S. and Misener,S. (eds), Bioinformatics Methods and Protocols: Methods in Molecular Biology. Humana Press, Totowa, NJ, pp. 365–386. Sharopova,N., McMullen,M.D., Schultz,L., Schroeder,S., Sanchez-Villeda,H., Gardiner,J., Bergstrom,D., Houchins,K., Melia-Hancock,S., Musket,T. et al. (2002) Development and mapping of SSR markers for maize. Plant Mol. Biol., 48, 463–481. Soderlund,C., Humphray,S., Dunham,A. and French,L. (2000) Con- tigs built with fingerprints, markers and FPC V4.7. Genome Res., 10, 1772–1787. Soybean Genomics Consortium (2003) Mar 6. Soybean Genomics Consortium Accessed 2003 Apr 17. The Institute for Genomic Research (2003) Jan 23. The Institute for Genomic Research. Accessed 2003 Feb 4. Thermo LabSystems (2002) Sep 6. Thermo LabSystems— Company—News—Press—Thermo LabSystems delivers Nautilus™ 2002 R2 LIMS. Accessed 2003 Feb 4. Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680. Waban Software Inc. (2002) Mar 26. Waban Software Inc. Accessed 2003 Feb 4. Wellcome Trust Centre for Human Genetics (2001) Jan 21. WTCHG Bioinformatics Website: Homepage Accessed 2003 Feb 4. ZmDB (2003) Jan 22. ZmDB: Maize Genome Database Accessed 2003 Feb 4. 2030 byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom