Articulo escrito por Hector Sánchez Villeda.
Hector Sánchez ha desarrollado tecnologías de la información para las ciencias biológicas por más de 20 años y actualmente es Fundador y Director de Desarrollo de IT de G2 Apps una empresa de innovación tecnológica basada en Querétaro, México.
G2 APPS se dedica a la implementación de LIMS (Laboratory Information Management Systems) utilizando un enfoque multidisciplinario que desde luego incluye un alto nivel de conocimientos en las ciencias de la vida para llevar a cabo una facil implementación.
1) The document describes a feature extraction program developed to analyze gene expression data from the Gene Expression Omnibus (GEO) database.
2) The program was tested on human transcription factor expression data sets and was able to successfully extract gene expression information.
3) Analyzing specific gene sets from GEO files had previously been a labor-intensive task, but the object-oriented program streamlines this process using C and Perl programming languages.
Bioinformatics can be applied to climate smart horticulture in several ways:
1) It allows for crop improvement through comparative genomics between crop plants and model species to identify important genes.
2) It facilitates plant breeding by providing tools for genome analysis, marker identification, and rational gene annotation.
3) Stress-tolerant varieties can be developed by using bioinformatics databases like KEGG to identify pathways and genes involved in drought resistance.
This document provides an introduction and overview of the field of bioinformatics. It discusses how bioinformatics combines computer science and biology to analyze large amounts of biological data. Specifically, it mentions that bioinformatics uses algorithms and techniques from computer science to solve complex biological problems related to areas like molecular biology, genomics, drug discovery, and more. It also outlines some of the key applications of bioinformatics like sequence analysis, protein structure prediction, genome annotation, and comparative genomics. Finally, it provides brief descriptions of important biological databases and resources that bioinformaticians use to store and analyze genomic and protein sequence data.
Bioinformatics analyzes massive amounts of biological data like DNA sequences to uncover hidden biological information. It has many applications like molecular medicine, drug development, and microbial genome analysis. Common bioinformatics tools like BLAST are used to compare query sequences against databases to find similar sequences. BLAST works through a heuristic algorithm that finds short matches between sequences to locate potential homologs in an efficient manner. Other algorithms like Smith-Waterman and FASTA also perform sequence alignment but with different tradeoffs in accuracy and speed.
Bioinformatics emerged as a field in the 1970s-1980s as areas of biology increasingly relied on computational methods. There were two main types of students in bioinformatics - computer scientists interested in biology and biologists skilled in computing. The bioinformatics market continues to grow worldwide and major employers include pharmaceutical and biotech companies. A career in bioinformatics requires strong skills in biology, computing, programming, data analysis, visualization and teamwork. Opportunities exist in areas like sequence assembly, genomic analysis, functional genomics, and database administration.
This document provides an overview of bioinformatics. It defines bioinformatics as the science of collecting, analyzing and conceptualizing biological data through computational techniques. It discusses that bioinformatics involves managing, organizing and processing biological information from databases, as well as analyzing, visualizing and sharing biological data over the internet. It also outlines some of the goals of bioinformatics like organizing the human and mouse genomes, as well as some applications like genomic and protein sequence analysis, protein structure prediction, and characterizing genomes.
Bioinformatics resources and search tools - report on summer training proj...Sapan Anand
The document summarizes Vir Sapan Pratap Anand's six-week summer training project on exploring advanced concepts of computational biology, scientific communication, and pharmacovigilance. The project was conducted under the supervision of Dr. Harpreet Kaur and Miss Geetu at the Institute of Pharma Inquest. The report documents Anand's work exploring topics like bioinformatics, literature search, medical writing, clinical research, pharmacovigilance, and the Human Adverse Reaction Online Monitoring system. It includes acknowledgments, tables of contents, objectives of the study, literature reviews on relevant topics, conceptual research techniques, and results and conclusions from the training period.
This document provides an overview of bioinformatics and bioinformatics databases. It defines bioinformatics as the application of information technology to molecular biology to analyze and interpret biological data. This includes tasks like mapping and analyzing DNA and protein sequences. The document discusses how bioinformatics databases are used to store and manage the large amounts of biological data generated. It describes the characteristics of biological databases and how they are used for querying and retrieving sequence information. Key areas of bioinformatics research and important sequence databases are also summarized.
1) The document describes a feature extraction program developed to analyze gene expression data from the Gene Expression Omnibus (GEO) database.
2) The program was tested on human transcription factor expression data sets and was able to successfully extract gene expression information.
3) Analyzing specific gene sets from GEO files had previously been a labor-intensive task, but the object-oriented program streamlines this process using C and Perl programming languages.
Bioinformatics can be applied to climate smart horticulture in several ways:
1) It allows for crop improvement through comparative genomics between crop plants and model species to identify important genes.
2) It facilitates plant breeding by providing tools for genome analysis, marker identification, and rational gene annotation.
3) Stress-tolerant varieties can be developed by using bioinformatics databases like KEGG to identify pathways and genes involved in drought resistance.
This document provides an introduction and overview of the field of bioinformatics. It discusses how bioinformatics combines computer science and biology to analyze large amounts of biological data. Specifically, it mentions that bioinformatics uses algorithms and techniques from computer science to solve complex biological problems related to areas like molecular biology, genomics, drug discovery, and more. It also outlines some of the key applications of bioinformatics like sequence analysis, protein structure prediction, genome annotation, and comparative genomics. Finally, it provides brief descriptions of important biological databases and resources that bioinformaticians use to store and analyze genomic and protein sequence data.
Bioinformatics analyzes massive amounts of biological data like DNA sequences to uncover hidden biological information. It has many applications like molecular medicine, drug development, and microbial genome analysis. Common bioinformatics tools like BLAST are used to compare query sequences against databases to find similar sequences. BLAST works through a heuristic algorithm that finds short matches between sequences to locate potential homologs in an efficient manner. Other algorithms like Smith-Waterman and FASTA also perform sequence alignment but with different tradeoffs in accuracy and speed.
Bioinformatics emerged as a field in the 1970s-1980s as areas of biology increasingly relied on computational methods. There were two main types of students in bioinformatics - computer scientists interested in biology and biologists skilled in computing. The bioinformatics market continues to grow worldwide and major employers include pharmaceutical and biotech companies. A career in bioinformatics requires strong skills in biology, computing, programming, data analysis, visualization and teamwork. Opportunities exist in areas like sequence assembly, genomic analysis, functional genomics, and database administration.
This document provides an overview of bioinformatics. It defines bioinformatics as the science of collecting, analyzing and conceptualizing biological data through computational techniques. It discusses that bioinformatics involves managing, organizing and processing biological information from databases, as well as analyzing, visualizing and sharing biological data over the internet. It also outlines some of the goals of bioinformatics like organizing the human and mouse genomes, as well as some applications like genomic and protein sequence analysis, protein structure prediction, and characterizing genomes.
Bioinformatics resources and search tools - report on summer training proj...Sapan Anand
The document summarizes Vir Sapan Pratap Anand's six-week summer training project on exploring advanced concepts of computational biology, scientific communication, and pharmacovigilance. The project was conducted under the supervision of Dr. Harpreet Kaur and Miss Geetu at the Institute of Pharma Inquest. The report documents Anand's work exploring topics like bioinformatics, literature search, medical writing, clinical research, pharmacovigilance, and the Human Adverse Reaction Online Monitoring system. It includes acknowledgments, tables of contents, objectives of the study, literature reviews on relevant topics, conceptual research techniques, and results and conclusions from the training period.
This document provides an overview of bioinformatics and bioinformatics databases. It defines bioinformatics as the application of information technology to molecular biology to analyze and interpret biological data. This includes tasks like mapping and analyzing DNA and protein sequences. The document discusses how bioinformatics databases are used to store and manage the large amounts of biological data generated. It describes the characteristics of biological databases and how they are used for querying and retrieving sequence information. Key areas of bioinformatics research and important sequence databases are also summarized.
Role of bioinformatics in life sciences researchAnshika Bansal
1. The document discusses bioinformatics and summarizes some of its key applications and tools. It describes how bioinformatics merges biology and computer science to solve biological problems by applying computational tools to molecular data.
2. It provides examples of common bioinformatics tasks like retrieving sequences from databases, comparing sequences, analyzing genes and proteins, and viewing 3D structures.
3. The document lists several popular databases for nucleotide sequences, protein sequences, literature, and other biological data. It also introduces common bioinformatics tools for tasks like sequence alignment, translation, and structure analysis.
Bioinformatics is the use of computers for the acquisition, management, and analysis of biological data. It combines biology, computer science, and information technology to analyze and interpret biological data. The field includes molecular medicine, gene therapy, drug development, and other applications. Common software tools used in bioinformatics include BLAST and FASTA. BLAST is an algorithm for comparing biological sequences to identify similar sequences in databases, while FASTA is a software package for protein and DNA sequence alignment.
B.sc biochem i bobi u-1 introduction to bioinformaticsRai University
This document provides an introduction to the field of bioinformatics. It defines bioinformatics as using computer science and software tools to store, retrieve, organize and analyze biological data. The history of bioinformatics began in the 1970s with early work to create protein sequence databases. Today, bioinformatics has many applications including drug design, DNA analysis, and agricultural biotechnology. It also covers several key areas including genomics, proteomics, and systems biology. Necessary skills for bioinformatics include knowledge of molecular biology, mathematics, programming, and computer proficiency.
The document provides an introduction to the field of bioinformatics. It discusses how bioinformatics applies computer science to analyze large amounts of biological data from fields like molecular biology, medicine, and biotechnology. It also outlines some of the main topics that will be covered in the course, including biological databases, gene and protein analysis, phylogenetic analysis, and gene prediction.
Bioinformatics in biotechnology by kk sahu KAUSHAL SAHU
Introduction
Bioinformatics – definition
History
Required skills
Core areas of bioinformatics
Components of bioinformatics
Nomenclature system in bioinformatics
Biological databases
Types of database
Bioinformatics tools
Applications of bioinformatics
Conclusion
References
Bioinformatics on the internet provides many resources and benefits. It allows for easy access and sharing of vast biological databases and genomic data. The internet facilitates collaboration between researchers globally and provides tools for storing, organizing, and analyzing biological information. Key resources available online include biological databases, software for data analysis, educational courses, journals, and tools for sequence analysis, structure prediction, and more. This expands the scope of bioinformatics and allows research to advance more rapidly through improved access to information and resources.
Bioinformatics is an interdisciplinary field that combines computer science, statistics, mathematics and engineering to study and process biological data, such as DNA sequences, in order to better understand biology. It involves developing methods and software tools to analyze large amounts of biological data, including sequencing genomes to understand what makes different organisms function. As data sets have grown enormously in size, bioinformatics relies on high-performance computing to make sense of it all and gain insights into normal cellular processes and how they are altered in disease states.
Bioinformatic tools in Pheromone technologyTHILAKAR MANI
This document discusses the role of bioinformatics tools in pheromone technology. It provides an introduction to bioinformatics and describes some commonly used bioinformatics tools, including UniProt, DDBJ, KEGG, BLAST, and PyMol. UniProt is a database of protein sequences that is composed of UniProtKB/Swiss-Prot which contains manually annotated entries and UniProtKB/TrEMBL which contains automatically annotated entries. DDBJ is a nucleotide sequence database in Japan that collaborates with EMBL and GenBank to share data. KEGG is a database that integrates genomic and chemical information and contains pathway maps and functional hierarchies. BLAST is used for sequence alignment and comparison.
This document provides an overview of bioinformatics, including its history, major areas of research, databases, tools, and applications. Bioinformatics is defined as the use of computer science and information technology to analyze and interpret biological data. The document traces the history of bioinformatics from early genetics experiments in the 1860s to advances in computing and molecular biology in the 1970s that enabled the field. It outlines major research areas like sequence analysis, genome annotation, and computational evolutionary biology. It also discusses biological databases, common bioinformatics tools, and applications of bioinformatics in fields like medicine, agriculture, and comparative genomics.
The document provides an introduction to the field of bioinformatics, including definitions, history, applications and key concepts. It discusses how bioinformatics uses computer algorithms and databases to analyze biological data like genomes, proteins and genes. Major databases that store DNA sequences are described, such as GenBank, EMBL and DDBJ. Tools for analyzing sequences like BLAST are also introduced.
Here are some suggestions for open online bioinformatics lectures and courses from famous universities:
- MIT OpenCourseWare has free bioinformatics course materials and videos from MIT courses.
- edX has massive open online courses (MOOCs) in bioinformatics from universities like Harvard, Berkeley, MIT. Some are free to audit.
- Coursera has bioinformatics courses from top universities like Johns Hopkins, University of Toronto, Peking University.
- YouTube has full lecture videos from bioinformatics courses at universities like Stanford, UC San Diego, University of Cambridge.
- Khan Academy has introductory bioinformatics lectures on topics like sequence alignment, gene finding, protein structure.
- EMBL-
This document provides an overview of bioinformatics. It begins by explaining how bioinformatics emerged from the need to analyze vast amounts of genetic sequence data produced by projects like the Human Genome Project. It then defines bioinformatics as the field that develops tools and methods for understanding biological data by combining computer science, statistics, and other disciplines. The document outlines several goals and applications of bioinformatics, such as identifying genes and their functions, modeling protein structures, comparing genomes, and its uses in medicine, microbial research, and more. It also provides a brief history of important developments in bioinformatics and DNA sequencing.
Bioinformatics & It's Scope in BiotechnologyTuhin Samanta
As an interdisciplinary field of science, bioinformatics consolidates science, software engineering, data building, arithmetic and measurements to dissect and decipher organic information. Bioinformatics has been utilized for in silico investigations of organic inquiries utilizing numerical and measurable methods.
This document describes an automated robotic biorepository system developed by researchers at the University of Virginia to store and retrieve genomic DNA samples on a large scale. The system integrates three primary devices - a robotic arm, liquid handling robot, and microplate storage system. It is capable of automatically processing, analyzing, storing, and retrieving up to 250,000 genomic DNA samples in microplates held at -80°C. The system uses barcoded microplates and tubes to track samples and a database to store associated demographic and analysis data. Its main functions are to create master plates from samples, generate daughter plates for storage and retrieval, and assemble send-out plates of selected samples for distribution.
The document discusses bioinformatics and provides definitions of key terms like bioinformatics and computational biology. It describes how bioinformatics uses computational tools to analyze large biological datasets and how this has become important for managing complex molecular data. The text notes several current bottlenecks in bioinformatics like educating biologists in computational tools and limited availability of databases. It also gives examples of how bioinformatics is used for tasks like genome annotation and comparative genomics.
This document provides an overview of the field of bioinformatics. It defines bioinformatics as the intersection of biology and computer science, using computational tools to analyze and distribute biological information like DNA, RNA, and proteins. The goals of bioinformatics are to better understand cells at the molecular level by analyzing sequence and structure data. Key applications include drug design, DNA analysis, and agricultural biotechnology. The document also describes different types of biological databases like primary databases that contain raw sequence data, and secondary databases that provide additional annotation and analysis of sequences.
Data mining involves using machine learning and statistical methods to discover patterns in large datasets and is useful in bioinformatics for analyzing biological data. Bioinformatics analyzes data from sequences, molecules, gene expressions, and pathways. Data mining can help understand these rapidly growing biological datasets. Common data mining tools in bioinformatics include BLAST for sequence comparisons, Entrez for integrated database searching, and ORF Finder for identifying open reading frames. Data mining approaches are well-suited to the enormous volumes of data in bioinformatics databases.
This document summarizes computational analysis methods for determining expectation values commonly used in bioinformatics databases. It discusses tools like BLAST, FASTA, and databases like NCBI that allow querying and analyzing sequences. The expectation value provides the probability that a match could occur by chance, with lower values indicating higher quality matches. These tools and databases facilitate customizable extraction of data from sequences to enable further analysis and knowledge discovery in bioinformatics.
Role of bioinformatics in life sciences researchAnshika Bansal
1. The document discusses bioinformatics and summarizes some of its key applications and tools. It describes how bioinformatics merges biology and computer science to solve biological problems by applying computational tools to molecular data.
2. It provides examples of common bioinformatics tasks like retrieving sequences from databases, comparing sequences, analyzing genes and proteins, and viewing 3D structures.
3. The document lists several popular databases for nucleotide sequences, protein sequences, literature, and other biological data. It also introduces common bioinformatics tools for tasks like sequence alignment, translation, and structure analysis.
Bioinformatics is the use of computers for the acquisition, management, and analysis of biological data. It combines biology, computer science, and information technology to analyze and interpret biological data. The field includes molecular medicine, gene therapy, drug development, and other applications. Common software tools used in bioinformatics include BLAST and FASTA. BLAST is an algorithm for comparing biological sequences to identify similar sequences in databases, while FASTA is a software package for protein and DNA sequence alignment.
B.sc biochem i bobi u-1 introduction to bioinformaticsRai University
This document provides an introduction to the field of bioinformatics. It defines bioinformatics as using computer science and software tools to store, retrieve, organize and analyze biological data. The history of bioinformatics began in the 1970s with early work to create protein sequence databases. Today, bioinformatics has many applications including drug design, DNA analysis, and agricultural biotechnology. It also covers several key areas including genomics, proteomics, and systems biology. Necessary skills for bioinformatics include knowledge of molecular biology, mathematics, programming, and computer proficiency.
The document provides an introduction to the field of bioinformatics. It discusses how bioinformatics applies computer science to analyze large amounts of biological data from fields like molecular biology, medicine, and biotechnology. It also outlines some of the main topics that will be covered in the course, including biological databases, gene and protein analysis, phylogenetic analysis, and gene prediction.
Bioinformatics in biotechnology by kk sahu KAUSHAL SAHU
Introduction
Bioinformatics – definition
History
Required skills
Core areas of bioinformatics
Components of bioinformatics
Nomenclature system in bioinformatics
Biological databases
Types of database
Bioinformatics tools
Applications of bioinformatics
Conclusion
References
Bioinformatics on the internet provides many resources and benefits. It allows for easy access and sharing of vast biological databases and genomic data. The internet facilitates collaboration between researchers globally and provides tools for storing, organizing, and analyzing biological information. Key resources available online include biological databases, software for data analysis, educational courses, journals, and tools for sequence analysis, structure prediction, and more. This expands the scope of bioinformatics and allows research to advance more rapidly through improved access to information and resources.
Bioinformatics is an interdisciplinary field that combines computer science, statistics, mathematics and engineering to study and process biological data, such as DNA sequences, in order to better understand biology. It involves developing methods and software tools to analyze large amounts of biological data, including sequencing genomes to understand what makes different organisms function. As data sets have grown enormously in size, bioinformatics relies on high-performance computing to make sense of it all and gain insights into normal cellular processes and how they are altered in disease states.
Bioinformatic tools in Pheromone technologyTHILAKAR MANI
This document discusses the role of bioinformatics tools in pheromone technology. It provides an introduction to bioinformatics and describes some commonly used bioinformatics tools, including UniProt, DDBJ, KEGG, BLAST, and PyMol. UniProt is a database of protein sequences that is composed of UniProtKB/Swiss-Prot which contains manually annotated entries and UniProtKB/TrEMBL which contains automatically annotated entries. DDBJ is a nucleotide sequence database in Japan that collaborates with EMBL and GenBank to share data. KEGG is a database that integrates genomic and chemical information and contains pathway maps and functional hierarchies. BLAST is used for sequence alignment and comparison.
This document provides an overview of bioinformatics, including its history, major areas of research, databases, tools, and applications. Bioinformatics is defined as the use of computer science and information technology to analyze and interpret biological data. The document traces the history of bioinformatics from early genetics experiments in the 1860s to advances in computing and molecular biology in the 1970s that enabled the field. It outlines major research areas like sequence analysis, genome annotation, and computational evolutionary biology. It also discusses biological databases, common bioinformatics tools, and applications of bioinformatics in fields like medicine, agriculture, and comparative genomics.
The document provides an introduction to the field of bioinformatics, including definitions, history, applications and key concepts. It discusses how bioinformatics uses computer algorithms and databases to analyze biological data like genomes, proteins and genes. Major databases that store DNA sequences are described, such as GenBank, EMBL and DDBJ. Tools for analyzing sequences like BLAST are also introduced.
Here are some suggestions for open online bioinformatics lectures and courses from famous universities:
- MIT OpenCourseWare has free bioinformatics course materials and videos from MIT courses.
- edX has massive open online courses (MOOCs) in bioinformatics from universities like Harvard, Berkeley, MIT. Some are free to audit.
- Coursera has bioinformatics courses from top universities like Johns Hopkins, University of Toronto, Peking University.
- YouTube has full lecture videos from bioinformatics courses at universities like Stanford, UC San Diego, University of Cambridge.
- Khan Academy has introductory bioinformatics lectures on topics like sequence alignment, gene finding, protein structure.
- EMBL-
This document provides an overview of bioinformatics. It begins by explaining how bioinformatics emerged from the need to analyze vast amounts of genetic sequence data produced by projects like the Human Genome Project. It then defines bioinformatics as the field that develops tools and methods for understanding biological data by combining computer science, statistics, and other disciplines. The document outlines several goals and applications of bioinformatics, such as identifying genes and their functions, modeling protein structures, comparing genomes, and its uses in medicine, microbial research, and more. It also provides a brief history of important developments in bioinformatics and DNA sequencing.
Bioinformatics & It's Scope in BiotechnologyTuhin Samanta
As an interdisciplinary field of science, bioinformatics consolidates science, software engineering, data building, arithmetic and measurements to dissect and decipher organic information. Bioinformatics has been utilized for in silico investigations of organic inquiries utilizing numerical and measurable methods.
This document describes an automated robotic biorepository system developed by researchers at the University of Virginia to store and retrieve genomic DNA samples on a large scale. The system integrates three primary devices - a robotic arm, liquid handling robot, and microplate storage system. It is capable of automatically processing, analyzing, storing, and retrieving up to 250,000 genomic DNA samples in microplates held at -80°C. The system uses barcoded microplates and tubes to track samples and a database to store associated demographic and analysis data. Its main functions are to create master plates from samples, generate daughter plates for storage and retrieval, and assemble send-out plates of selected samples for distribution.
The document discusses bioinformatics and provides definitions of key terms like bioinformatics and computational biology. It describes how bioinformatics uses computational tools to analyze large biological datasets and how this has become important for managing complex molecular data. The text notes several current bottlenecks in bioinformatics like educating biologists in computational tools and limited availability of databases. It also gives examples of how bioinformatics is used for tasks like genome annotation and comparative genomics.
This document provides an overview of the field of bioinformatics. It defines bioinformatics as the intersection of biology and computer science, using computational tools to analyze and distribute biological information like DNA, RNA, and proteins. The goals of bioinformatics are to better understand cells at the molecular level by analyzing sequence and structure data. Key applications include drug design, DNA analysis, and agricultural biotechnology. The document also describes different types of biological databases like primary databases that contain raw sequence data, and secondary databases that provide additional annotation and analysis of sequences.
Data mining involves using machine learning and statistical methods to discover patterns in large datasets and is useful in bioinformatics for analyzing biological data. Bioinformatics analyzes data from sequences, molecules, gene expressions, and pathways. Data mining can help understand these rapidly growing biological datasets. Common data mining tools in bioinformatics include BLAST for sequence comparisons, Entrez for integrated database searching, and ORF Finder for identifying open reading frames. Data mining approaches are well-suited to the enormous volumes of data in bioinformatics databases.
This document summarizes computational analysis methods for determining expectation values commonly used in bioinformatics databases. It discusses tools like BLAST, FASTA, and databases like NCBI that allow querying and analyzing sequences. The expectation value provides the probability that a match could occur by chance, with lower values indicating higher quality matches. These tools and databases facilitate customizable extraction of data from sequences to enable further analysis and knowledge discovery in bioinformatics.
A consistent and efficient graphical User Interface Design and Querying Organ...CSCJournals
We propose a software layer called GUEDOS-DB upon Object-Relational Database Management System ORDMS. In this work we apply it in Molecular Biology, more precisely Organelle complete genome. We aim to offer biologists the possibility to access in a unified way information spread among heterogeneous genome databanks. In this paper, the goal is firstly, to provide a visual schema graph through a number of illustrative examples. The adopted, human-computer interaction technique in this visual designing and querying makes very easy for biologists to formulate database queries compared with linear textual query representation.
Trait Mining, prediction of agricultural traits in plant genetic resources with ecological parameters. Focused Identification of Germplasm Strategy (FIGS). For the Vavilov seminars at the IPK Gatersleben 13th June 2007. Dag Endresen, Michael Mackay, Kenneth Street.
Araport is an online resource for Arabidopsis and plant research that integrates various types of data from different sources. It provides genome annotation for Arabidopsis that has been validated and updated using RNA-seq data. Data is stored and can be accessed through the ThaleMine data warehouse. Araport also features a JBrowse genome viewer and Science Apps that retrieve real-time data through web services. It is an open source project that welcomes community contributions and holds workshops to support developers.
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...Genomika Diagnósticos
API-Centric Data Integration for Human Genomics Reference Databases: Achievements, Lessons Learned and Challenges
X-Meeting 2015
Authors: Jamisson Freitas, Marcel Caraciolo, Victor Diniz, Rodrigo Alexandre and João Bosco Oliveira
This document provides an overview of bioinformatics and discusses key concepts like:
- Bioinformatics combines biology, computer science, and information technology to analyze large amounts of biological data.
- High-throughput DNA sequencing has generated vast genomic data that requires bioinformatics tools and databases accessible via the internet to analyze and share.
- Popular sequence alignment tools like BLAST, FASTA, and ClustalW are used to search databases and compare sequences, helping researchers analyze genes and genomes.
There are many characteristics of biological data. All these characteristics make the management of biological information a particularly challenging problem. Here mainly we will focus on characteristics of biological information and multidisciplinary field called bioinformatics. Bioinformatics, now a days has emerged with graduate degree programs in several universities.
dkNET Webinar: The Human BioMolecular Atlas Program (HuBMAP) 10/14/2022dkNET
Abstract
HuBMAP aims to catalyze the development of an open, global framework for comprehensively mapping the human body at cellular resolution. HuBMAP goals include: (1) Accelerate the development of the next generation of tools and techniques for constructing high resolution spatial tissue maps. (2) Generate foundational 3D tissue atlases. (3) Establish an open data platform. (4) Coordinate and collaborate with other funding agencies, programs, and the biomedical research community. (5) Support projects that demonstrate the value of the resources developed by the program. The HuBMAP Portal can be found at https://portal.hubmapconsortium.org and the Visible Human MOOC describes the compilation and coverage of HuBMAP data, demonstrates new single-cell analysis and mapping techniques, and introduces major features of the HuBMAP portal.
The top 3 key questions that HuBMAP can answer:
1. What assay types are best to map the human body in 3D and across scales?
2. What Common Coordinate System (CCF) is best to construct the Human Reference Atlas?
3. How can others help construct and/or use the Human Reference Atlas?
Presenters:
Katy Börner, PhD, Victor H. Yngve Distinguished Professor of Engineering and Information Science, Department of Intelligent Systems Engineering and Information Science, Indiana University
Jeffrey Spraggins, PhD, Assistant Professor, Department of Cell and Developmental Biology, Vanderbilt University
Upcoming webinars schedule: https://dknet.org/about/webinar
Closing the Gap in Time: From Raw Data to Real ScienceJustin Johnson
This document discusses science as a service (SaaS) and next-generation sequencing (NGS) data analysis. It summarizes challenges with exponential growth of NGS data, including data management, storage, analysis and sharing. It introduces Edge Bio's approach of distributing computational problems across cloud and HPC resources to avoid bottlenecks. Edge Bio provides full-service NGS analysis pipelines leveraging both commercial and open-source tools.
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...ijitcs
Sequencing projects arising from high-throughput technologies including those of sequencing DNA microarray allowed measuring simultaneously the expression levels of millions of genes of a biological sample as well as to annotate and to identify the role (function) of those genes. Consequently, to better manage and organize this significant amount of information, bioinformatics approaches have been developed. These approaches provide a representation and a more 'relevant' integration of data in order to test and validate the researchers’ hypothesis. In this context, this article describes and discusses some techniques used for the functional analysis of gene expression data.
The Rat Genome Database has enhanced their genome browser, called GBrowse, to provide researchers with a comprehensive platform for visualizing rat genomic features and comparative genomics data between rat, human and mouse. The browser displays tracks of protein-coding genes, RNA genes, SNPs, CNVs, SSLPs and more. Synteny tracks show ortholog positions across species to enable comparative analysis. The browser is being updated to accommodate additional rat strain genomes and visualize strain-specific differences, furthering insight into genotype-phenotype relationships.
This document discusses leveraging graph data structures to analyze variant data and related annotations from large genomic datasets in a scalable way. An in-memory graph database was used to model variants, annotations, and their relationships. Simple queries on the graph performed as well or better than a relational database. More complex queries and analysis, like spectral clustering of populations, were also possible with the graph model and helped identify patterns not feasible with relational approaches. The results indicate graph databases are a powerful tool for precision medicine research by enabling both known and novel analysis of large genomic datasets.
It is widely agreed that complex diseases are typically caused by joint effects of multiple genetic variations, rather than a single genetic variation. Multi-SNP interactions, also known as epistatic interactions, have the potential to provide information about causes of complex diseases, and build on GWAS studies that look at associations between single SNPs and phenotypes. However, epistatic analysis methods are both computationally expensive, and have limited accessibility for biologists wanting to analyse GWAS datasets due to being command line based. Here we present APPistatic, a prototype desktop version of a pipeline for epistatic analysis of GWAS datasets. his application combines ease-of-use, via a GUI, with accelerated implementation of BOOST and FaST-LMM epistatic analysis methods.
Next generation sequencing (NGS) allows for the massively parallel sequencing of DNA sequences. NGS technologies can sequence entire genomes in a single run and provide information useful for pathogen identification, outbreak investigation, and molecular diagnostics. NGS workflows involve sample preparation, sequencing using platforms such as Illumina or Ion Torrent, and bioinformatics analysis to assemble and interpret the large amounts of sequencing data produced. NGS has many applications including mutation discovery, microbial genome mapping, and metagenomics.
Branch: An interactive, web-based tool for building decision tree classifiersBenjamin Good
A crucial task in modern biology is the prediction of complex phenotypes, such as breast cancer prognosis, from genome-wide measurements. Machine learning algorithms can sometimes infer predictive patterns, but there is rarely enough data to train and test them effectively and the patterns that they identify are often expressed in forms (e.g. support vector machines, neural networks, random forests composed of 10s of thousands of trees) that are highly difficult to understand. In addition, it is generally unclear how to include prior knowledge in the course of their construction.
Decision trees provide an intuitive visual form that can capture complex interactions between multiple variables. Effective methods exist for inferring decision trees automatically but it has been shown that these techniques can be improved upon via the manual interventions of experts. Here, we introduce Branch, a new Web-based tool for the interactive construction of decision trees from genomic datasets. Branch offers the ability to: (1) upload and share datasets intended for classification tasks (in progress), (2) construct decision trees by manually selecting features such as genes for a gene expression dataset, (3) collaboratively edit decision trees, (4) create feature functions that aggregate content from multiple independent features into single decision nodes (e.g. pathways) and (5) evaluate decision tree classifiers in terms of precision and recall. The tool is optimized for genomic use cases through the inclusion of gene and pathway-based search functions.
Branch enables expert biologists to easily engage directly with high-throughput datasets without the need for a team of bioinformaticians. The tree building process allows researchers to rapidly test hypotheses about interactions between biological variables and phenotypes in ways that would otherwise require extensive computational sophistication. In so doing, this tool can both inform biological research and help to produce more accurate, more meaningful classifiers.
A prototype of Branch is available at http://biobranch.org/
Maize database is the most important database in the bioinformatics. so i hope it is beneficial to the B.Sc.in Agriculture and M.Sc. in Genetics and Plant Breeding.
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...IRJET Journal
This document presents a methodology for analyzing gene mutation data using ontologies and association rule mining. It aims to develop a common knowledge base for genomic and proteomic analysis by integrating multiple data sources. The methodology involves using k-nearest neighbors algorithm to find similar genes, an iterative multiplicative updating algorithm to solve optimization problems, and SNCoNMF to identify co-regulatory modules between genes, microRNAs and transcription factors. The results are represented using a Bayesian rose tree for efficient visualization of associations between genetic components and diseases.
This document summarizes the BioAssay Research Database (BARD), a public database developed to provide access to bioassay data from the NIH Molecular Libraries Program (MLP). BARD has curated and migrated data from over 600 MLP projects, standardizing the metadata using a controlled vocabulary. This allows for systematic cross-assay analysis. BARD supports data depositors, data miners accessing and querying the database, and software developers building new tools using the BARD application programming interface.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Infrastructure Challenges in Scaling RAG with Custom AI models
LIMS for maize mapping project
1. BIOINFORMATICS
Vol. 19 no. 16 2003, pages 2022–2030
DOI: 10.1093/bioinformatics/btg274
Development of an integrated laboratory
information management system for the maize
mapping project
H. Sanchez-Villeda1, S. Schroeder1, M. Polacco1,3,
M. McMullen1,3, S. Havermann1, G. Davis1, I. Vroh-Bi1,
K. Cone2, N. Sharopova1, Y. Yim1, L. Schultz1, N. Duru1,
T. Musket1, K. Houchins3, Z. Fang1, J. Gardiner1
and E. Coe1,3,∗
1Department of Agronomy, 2Division of Biological Sciences and 3USDA-ARS, University
of Missouri, Columbia, MO 65211, USA
Received on February 4, 2003; revised on April 18, 2003; accepted on May 3, 2003
ABSTRACT
Motivation: The development of an integrated genetic and
physical map for the maize genome involves the generation
of an enormous amount of data. Managing this data requires
a system to aid in genotype scoring for different types of
markers coming from both local and remote users. In addi-
tion, researchers need an efficient way to interact with genetic
mapping software and with data files from automated DNA
sequencing. They also need ways to manage primer data
for mapping and sequencing and provide views of the integ-
rated physical and genetic map and views of genetic map
comparisons.
Results: The MMP-LIMS system has been used successfully
in a high-throughput mapping environment. The genotypes
from 957 SSR, 1023 RFLP, 189 SNP, and 177 InDel mark-
ers have been entered and verified via MMP-LIMS.The system
is flexible, and can be easily modified to manage data for other
species. The software is freely available.
Availability: To receive a copy of the iMap or cMap software,
please fill out the form on our website. The other MMP-LIMS
software is freely available at http://www.maizemap.org/
bioinformatics.htm.
Contact: coee@missouri.edu
1 INTRODUCTION
The maize mapping project (MMP) aims to develop an integ-
rated physical and genetic map of maize. This resource will be
useful for marker-assisted selection, map-based cloning, and
comparative genomics of crops, and will undergird sequen-
cing of the maize genome (Cone et al., 2002). To achieve
this goal, the MMP has utilized and developed DNA markers
∗To whom correspondence should be addressed.
including 1023 restriction fragment length polymorphisms
(RFLPs), 957 simple sequence repeats (SSRs), 10 000 over-
gos, 189 single nucleotide polymorphisms (SNPs), and
177 insertion/deletion (InDel) polymorphisms (Davis et al.,
1999; Sharopova et al., 2002). These markers have been
used to develop a high-resolution genetic map used as the
framework to anchor bacterial artificial chromosome (BAC)
contigs. This process requires high-throughput sequencing,
and high-throughput SNP/InDel genotyping. The amount
of data produced is enormous. The Missouri compon-
ent of the MMP team is divided into different laborator-
ies dispersed throughout the campus, simultaneously using
and producing different parts of the same core data. The
genetic mapping populations involve different subsets of indi-
viduals with their respective molecular marker data to be
stored, managed, and integrated into the maps. Optimal
use of these data requires effective methods of analysis
and management. Furthermore, the data produced in the
MMP must be disseminated to the scientific community
through informatics tools capable of handling the high volume
of data.
The requirements for laboratory databases vary consider-
ably from project to project. At present, many laboratories use
spreadsheets to manage their data. In this paper, we present
the MMP laboratory information management system (MMP-
LIMS) that we have developed to provide several functions:
(1) allow data management with detailed record keeping,
reporting, and retrieving; (2) ensure data quality and access-
ibility to the scientific community and (3) disseminate the
integrated map of maize to the scientific community through
web-based tools. This research is an example of application of
informatics to practical biology and agronomy questions. An
overview of the MMP-LIMS components and their functions
is shown in Table 1.
2022 Published by Oxford University Press.
byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
2. Development of an integrated LIMS for the MMP
Table 1. MMP-LIMS functions
MMP-LIMS component Summary
MMP-LIMS Scoring Tool Serves as a laboratory notebook for wet lab researchers
Allows researchers in different laboratories to interact with MMP-LIMS database
Manages genotype data from RFLP, SSR, SNP, InDel marker types
Validates genotype scores based on repeat reads
Interfaces with ABI Prism Genotyper software, converting trace file data into SNP scores
Allows user to make custom templates for genotype output/entry
Creates input files for MapMaker by chromosome or data set
Integrates information returned by the MapMaker software
Also exists in a publicly available standalone version utilizing an Access database and includes an example
database
Community IBM Map Data Entry Tool
(CIMDE)
Allows researchers at remote locations to enter genotype scores into the MMP-LIMS database via web-based
interface
Provides a mechanism for uploading tab-delimited score file and for uploading scores for a single marker
Validates genotype scores for control loci
Validates genotype scores based on marker type
SSR Finder Locates SSRs in DNA sequences
Designs unique primer pairs to amplify SSR sequences
SNP Discovery Primer Design Designs primers for finding potential SNPs
Performs BLAST search against existing primers
SNP/InDel Finder Calculates base frequencies in each position in a sequence alignment
Searches for gaps in a sequence alignment representing InDels
Mapped Sequence Locator (MSL) Accepts sequence via a web-based interface and performs a BLAST search against all public maize sequence
Returns BLAST scores, genetic map locations, and links related to sequences if available
iMap Graphically displays an integrated genetic and physical map
Displays genetic marker data and contig data
Performs search for a map location based on locus, probe, GenBank accession, or contig number
Provides links to current WebFPC (Soderlund et al., 2000) assembly
Displays anchors based on a set of data filters that remove ambiguous assignments
cMap Displays comparative associations between two genetic maps
Gives the user text lists of the shared loci between the compared maps
2 SYSTEMS AND METHODS
Several technologies were employed during the development
of MMP-LIMS. The programming languages used in both
the user interface for the local wet lab researchers and the
remote researchers reflect an interest in creating a highly mod-
ular system and in providing each user with an efficient and
intuitive user interface.
The user interface for the local wet lab researchers was
implemented as a Visual Basic® 6.0 client application. This
also provides not only efficient performance for the user,
but also a well-structured environment for development.
The system’s client–server architecture permits many users
concurrent access to a central database. Object Database
Connectivity (ODBC) provides interoperability, connects the
client application to the database, and allows interaction with
MaizeDB (MaizeDB, 2003, http://www.agron.missouri.edu/)
through proxy tables.
The web-based user interface for remote researchers util-
izes HTML for the static content. Java™ applets are used
for the other functions and give the user a more interact-
ive and straightforward interface than those attainable with
HTML forms. Java™ servlets and Java Database Connectivity
(JDBC™) transfer data to and from the database.
The sequence analysis modules of MMP-LIMS were imple-
mented in Perl and make use of other publicly available
programs including Primer3 (Rozen and Skaletsky, 2000),
BLAST (Altschul et al., 1990), phred (Ewing et al., 1998),
phrap (Ewing and Green, 1998), and clustalw (Thompson
et al., 1994). The web-based sequence comparison module
utilizes Perl (CGI/DBI) along with XML/XSLT and the
BLAST program.
The web-based integrated genetic and physical map dis-
play application and the comparative mapping viewer were
adapted from software used in the Rice Genome Project (Rice
Genome Research Program, 2002, http://rgp.dna.affrc.go.jp/).
Originally utilizing data stored via flat files, the code for the
integrated map viewer was converted to allow data retrieval
from the database via servlet communication. The user
2023
byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
3. H.Sanchez-Villeda et al.
Client-Server
Tool
MMP-
LIMS
Scoring
Tool
Sequence Analysis
Modules
SNP/InDel
Pipeline
SNP
Discovery
Primer
Design
SNP/InDel
Finder
SSR
Finder
MMP-LIMS
Web-Based
Tools
CIMDE
Mapped
Sequence
Locator
(MSL)
iMap
Viewer
MMP-LIMS
Users
Local
Wet Lab
Users
Remote
Wet Lab
Users
Web
Users
cMap
Viewer
Fig. 1. MMP-LIMS context diagram. The modules of MMP-LIMS are shown, including the client–server tool, multiple web-based tools and
sequence analysis modules for SSRs and SNPs/InDels.
interface employs a combination of a Java™ applet and Perl
CGI (Cone et al., 2002).
The MMP-LIMS data are stored in a Sybase® Adaptive
Server Enterprise 11.9.2 relational database. The database
resides on a Dell Precision 330 running Redhat 7.3 with a
2.4.18-5 kernel. An additional standalone version of MMP-
LIMS exists with the same functionality and was designed to
work with a Microsoft® Access database.
3 IMPLEMENTATION
MMP-LIMS provides data management for the processes
involved in generating a high-resolution genetic map includ-
ing managing SNP/InDel data, managing SSR and RFLP
data, generating the genetic map, and providing public views
of the data. The modules comprising MMP-LIMS can be
viewed as elements of a system context diagram (Fig. 1).
The modules include the MMP-LIMS Scoring Tool (Fig. 2),
the Community IBM Map Data Entry Tool (CIMDE), SSR
Finder, SNP Discovery Primer Design, SNP/InDel Finder, the
Mapped Sequence Locator (MSL), the integrated genetic and
physical map viewer (iMap), and the comparative mapping
viewer (cMap).
3.1 Managing SNP/InDel information
MMP-LIMS manages data from each step in the process of
placing SNPs/InDels on the genetic map, from finding poten-
tial SNPs/InDels with the SNP/InDel pipeline to managing
the genotype score data with MMP-LIMS Scoring Tool and
generating files for MapMaker (Lander et al., 1987) software.
The SNP/InDel pipeline works in two steps (Fig. 3A). First,
SNP primers are designed with the SNP Discovery Primer
Design module. Then the resulting primers are used to process
sequences in order to find SNPs and/or InDels via SNP/InDel
Finder. The first step in SNP discovery is to sequence a region
of DNA across multiple lines of maize to detect nucleotide
polymorphisms. The DNA segments for sequencing are amp-
lified using primer pairs designed with the SNP Discovery
Primer Design module.
DNA sequence is entered into the module, along with para-
meters including distance between primer pairs and region
of the sequence to search for primer pairs. Using the given
parameters, this script builds an input file for Primer3. The
resulting primers are returned from Primer3, and the SNP Dis-
covery Primer Design module checks for repeats in the primer
sequence and rejects those with repeats. The script can also
be set to perform a BLAST search with the primers against
all previously designed primers. The output of the script is a
list of unique SNP discovery primers.
The primers from the SNP Discovery Primer Design mod-
ule are used to amplify and sequence DNA in 12 different
lines of maize. Base calling of the resulting forward and
reverse sequencing trace files is performed by phred. For-
ward and reverse output sequence is trimmed based on the
primers and quality scores and each sequence is stored in
a single file. The quality scores are stored in a separate
2024
byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
4. Development of an integrated LIMS for the MMP
Genotype
Scores
Genotype
Scores
SNP Genotype
Data
Genotyping
Information Added
via Catalogs
Genotyping
Information
User Interface for
Genotype Score
Entry and
Verification
Experimental Conditions /
Setup Information
Map Data /
Segregation File Data
Genetic Map
Data
Genotype Data
for MapMaker
Input File
Generation
MMP-
LIMS
Database
MMP-LIMS
Scoring Tool
Lab
Notebook
Function
Catalogs
Primers Samples Populations Templates
CIMDE
(Remote Genotype
Score Entry and
Verification)
External
Software
ABI Prism®
Genotyper®
Software
MapMaker
Fig. 2. MMP-LIMS scoring tool overview. The genotype score management functions of MMP-LIMS Scoring Tool, including catalog-based
management of sample data and lab notebook, are shown. The diagram also includes the interfaces with MapMaker and Genotyper software,
and the remote genotype score entry tool—CIMDE.
file. Sequence assembly is then performed by phrap. Next,
a script combines the sequences into 12-sequence groups
with each group corresponding to a single SNP discovery
(dSNP) primer pair. The clustalw program then aligns the
sequences for each primer pair and sends the output into the
SNP/InDel Finder script to calculate base frequencies at each
position in an alignment. If 12 out of 12 sequences contain
the same base at a position, then no SNP is present. If one
sequence is different at a position than the other 11 (1 : 11),
then the possibility of a SNP is questionable. Candidate SNPs
are defined by positions where at least two sequences are
different from the other 10 (2 : 10) or better—(3 : 9), (4 : 8),
(5 : 7) or (6 : 6). SNP/InDel Finder also looks for gaps in the
alignment representing insertions/deletions (InDels). These
polymorphisms can then be used for genotype analysis by the
wet lab group.
To manage SNP data, the MMP-LIMS Scoring Tool enables
wet lab researchers in several different laboratories to perform
genotypescoringandmanagegenotypingdata. WhiletheIBM
mapping population has 360 individuals, the tool can handle a
virtually unlimited number of individuals. MMP-LIMS uses
catalogs to manage and maintain data related to these fields
(Fig. 2). For example, through an interface for the catalogs,
the user can add, edit or delete SNP or InDel primers. The
system validates the information and checks the integrity of
the data among the other tables. When deleting from the cata-
log, the system checks the database tables for consistency. In
particular, if a primer is already in use in a record, then a user
cannot delete that primer from MMP-LIMS. Only the master
user has the ability to perform this type of ‘cascading’ delete,
deleting all references to that primer.
The templates catalog allows the user to create a subset of a
population’s samples for use in specific experiments. The user
can create a samples template, and then link the appropriate
samples to the template.
The MMP-LIMS Scoring Tool provides interfaces to
convert ABI Prism® Genotyper® (Applied Biosystems, 2003,
http://www.appliedbiosystems.com/products/)filesintosegre-
gation files and to import them to MMP-LIMS database. To
convert ABI Genotyper® files, the MMP-LIMS provides a
color template where users enter values of the base pair peaks
generated in the ABI sequencer for the two parental lines
used in the IBM population (B73/Mo17). Then MMP-LIMS
receives the ABI Genotyper® file, which contains the allele
2025
byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
5. H.Sanchez-Villeda et al.
Add each new SSR to the database
SNP Discovery
Primers Used to
Amplify and
Sequence in N
Different
Genetic Lines
1. Use Primer3 to create primers
2. Filter out primers with inverted repeats
Sequence and
SNP Parameters
1. Perform base calling with phred
2. Trim for primers (sequence and quality)
3. Use phrap for sequence assembly
4. Group sequences with same dSNP primer
5. Align sequences for 1 primer pair (clustalw)
6. Look for base frequencies at each position
in the alignment with SNP/InDel Finder
File of Positional
Base Frequencies
(i.e. SNPs and
InDels)
SNP/InDel Pipeline
1. Find repeats and generate primers
2. Check against previously discovered primers
Sequence
Formatted List
of Primers with
Ordering
Information
Primer
DBlast
Database
SSR Finder
A
B
Fig. 3. Sequence analysis modules. The functions performed by the
two sequence analysis modules of MMP-LIMS are shown. The steps
performed by the SNP/InDel Pipeline to find primers and discover
potential SNPs/InDels are given in (A), and the process performed
by SSR Finder to locate SSRs in sequence and design unique primers
is outlined in (B).
information for the SNP experiment, and based on the color
template, processes the information and converts it into a
segregation file. The segregation data, consisting of a list of
scores for each SNP marker, are stored in the MMP-LIMS
database.
TheMMP-LIMSScoringToolalsofunctionsasalaboratory
notebook for wet lab researchers. Users may store information
about specific experimental conditions including gel compos-
ition and the primer sequences used. The notebook also stores
data related to setup, such as microtiter plate layout.
The MMP-LIMS Scoring Tool also offers a web-based
query-by-example interface that allows users to create their
ownqueriesbasedonmarkers, samples, probesorenzymesfor
exporting information from the LIMS database into a standard
Microsoft® Excel spreadsheet for analysis.
Inaddition, astandaloneversionoftheMMP-LIMSScoring
Tool is available that works with an Access database. The
standalone version includes an example database populated
with maize data.
Several security features protect the data in MMP-LIMS
database accessed via MMP-LIMS Scoring Tool. To use the
MMP-LIMS Scoring Tool, the user is required to have a valid
user account and password. The different types of MMP-
LIMS Scoring Tool user accounts provide various levels of
system protection. For example, the administrator is able to
add new users and grant permissions to users for particular
system functions while, by default, new users can only view
the information and enter genotype scores.
The MMP-LIMS system takes advantage of a relational
database management system (RDBMS) for information stor-
age and retrieval. The RDBMS provides several important
functions including inserting, deleting, updating, retrieval,
managing concurrent requests, and handling transaction
issues such as rollback. The main MMP-LIMS database is
composed of more than 50 tables that record primer, locus,
enzymes, probes, samples, templates, users, passwords, note-
books and score information for the daily processes in the lab.
The physical data model can be found on the MMP website
(Maize Mapping Project, 2003, http://www.maizemap.org).
The model design is based on the third normal form approach
(Date, 2002). The MMP-LIMS database dedicates a large por-
tion of its tables to the MMP-LIMS Scoring Tool because of
the high level of functionality that this module provides.
3.2 Managing SSR and RFLP data
InadditiontomanagingSNP/InDeldata, MMP-LIMShandles
data generated to place SSR and RFLP markers on the genetic
map. The tools enable the researcher to locate potential SSRs
andmanagethegenotypescoredata, andencouragecollabora-
tion by providing resources for researchers in remote locations
to enter genotype scores.
The SSR Finder tool serves three major purposes (Fig. 3B).
First, SSR Finder locates SSRs in DNA sequences. Second,
the program designs primer pairs to amplify the SSR-
containing sequence regions. Finally, SSR Finder checks
these primer pairs for uniqueness, removing any redundant
primers.
First, the sequence of interest is entered into the SSR Repeat
Finder module. SSR Repeat Finder returns a list of repeats
(SSRs) and the flanking (surrounding) sequence, which is then
sentasinputintoSSRPrimerDesigner. Thismodulebuildsthe
inputfileforPrimer3foreachSSR,withuser-definedparamet-
ers for primer length, Tm, G/C content, and distance between
forward and reverse primers. The list of potential primers and
associated data from Primer3 is sent to the SSR Primer Rep
module, which runs the SSR Repeat Finder module against
the potential primer pairs and removes primer pairs that con-
tain a simple sequence repeat within the primer sequence.
The SSR Primer BLAST script takes the remaining primers
and their associated data, and uses the SSR sequence plus
the flanking sequence and performs a BLAST search against
all the primer pairs previously discovered in the project. It
also adds each new SSR to the Primer DBlast database after
it is checked. The program formatdb is run to regenerate the
Primer DBlast database. Next, the SSR Primer BLAST mod-
ule returns the BLAST scores for the primers. Based on these
scores, the Order Filter script creates a list of primers with
no BLAST hits and sends the list to Order Formatter. Finally,
2026
byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
6. Development of an integrated LIMS for the MMP
Order Formatter returns a formatted list along with ordering
information.
The MMP-LIMS Scoring Tool discussed previously is also
used effectively for managing SSR and RFLP genotype score
data. Individuals in the lab can perform the scoring in two
steps. First, one user analyzes the gel images or autoradio-
graphs, entering the score for each sample in a population
or template. Next, a second user verifies the scores by inde-
pendently entering each score again in the row underneath the
original entered scores. Because the letters representing the
scores are color-coded, it is easy for the second individual to
see that the scores match the original scores. The system can
also automatically check for mismatched scores and allow the
user to move from one cell containing a mismatched score to
the next to verify the data.
The CIMDE is a subsystem of the MMP-LIMS system
developed to allow members of the maize community to
remotely enter genotype scores into MMP-LIMS for a subset
of 94 individual lines from the intermated B73xMo17 (IBM)
(Davis et al., 2001) mapping population.
The system is composed of two main components designed
to allow flexibility in the way researchers submit and edit
their scores, while providing an intuitive and easy-to-use inter-
face. First, the file upload function allows the user to quickly
populate the database with a batch of probes and their associ-
ated genotype scores by uploading a tab-delimited text file
via a web-based form. The second component consists of
an in-browser application that gives remote researchers the
opportunity to submit scores manually, edit scores previously
submitted, or delete scores. CIMDE is used primarily for SSR
data, but also allows the researcher to enter SNP and RFLP
information.
MMP-LIMS creates a MapMaker file from the scores sub-
mitted by the user via CIMDE. The PostScript™ version of
the map built by MapMaker is converted to a tabular format
by MMP-LIMS. Both the PostScript™ file and the table are
then e-mailed to the user.
Both components of CIMDE perform extensive validation
of the data based on the type of probe before addition to
the database is permitted. During a submission using the
file upload function, CIMDE checks the validity and the
number of scores for each record. If scores for an RFLP
probe are being processed, the system performs an addi-
tional check to ensure that a restriction enzyme name is
given for each probe. Because each probe name needs to
be unique, the system checks that the probe name does not
already exist in the database under any user’s account. An
insertion or update of records that causes the duplication
of marker names is not permitted by the system, and if
attempted, the system displays a list containing each duplic-
ate record. The manual data-editing tool also validates probe
data. The table in the graphical user interface will not allow
the user to insert invalid score values into its cells, while valid
scores are color-coded for ease of recognition. This validation
and color-coding varies based on the type of probe being
edited.
In order to guard user data and MMP-LIMS data, CIMDE
is equipped with protection features. To ensure that each user
only has access to his or her data, each user must create a user-
name and password and register an individual account with
the system. The user must also make a one-time submission
of a set of control scores to be validated by the system. If the
user’s control scores are correct, it means that the researcher
has performed the experiments correctly and that the gen-
otype scores that he or she is submitting are accepted as
valid scores. Once the user has logged in and has submit-
ted valid control scores, he or she can access both the file
upload and manual data editing functions of the application.
Control scores do not have to be submitted upon subsequent
use of CIMDE.
3.3 Production of the genetic map
Genetic map generation requires both converting genotype
scores from a set of samples into a format readable by Map-
Maker and interpreting the results returned by the software.
MMP-LIMS Scoring Tool creates input files for MapMaker
by retrieving data from MMP-LIMS database. Users can cre-
ate files using all of the mapping data or they can generate a
file for a subset of the data by creating a group and selecting
the markers and samples of interest. The MMP-LIMS Scor-
ing Tool then automatically creates the MapMaker input file.
Data from remote researchers can also be used in the creation
of the file. When needed, the system can convert scores. For
example, for recombinant inbred populations, the score ‘H’
is converted to ‘−’, while for F2 populations, the ‘H’ score
remains unchanged in the MapMaker input file.
Output from MapMaker is also managed by the MMP-
LIMS Scoring Tool. The MMP-LIMS Scoring Tool extracts
genetic map information such as chromosome, map coordin-
ate, framework versus off-frame status for each probe from
the PostScript™ file returned by the MapMaker software and
results are stored in the MMP-LIMS database.
3.4 Public views of MMP data
It is imperative that the data produced by the MMP be easily
viewable by the public. MMP-LIMS provides several displays
of the mapping data, including the MSL, iMap, and cMap.
The MSL provides a web-based interface to accept input
sequence and perform a BLAST search against all public
maize sequences, including the DuPont-MMP Cornsensus
(Maize Mapping Project, 2002, http://www.agron.missouri.
edu/files_dl/MMP/Cornsensus/) unigene set. It returns the
BLAST scores, the map location, and links to related
sequencesifavailable. Theuserentersthenucleotidesequence
via the Common Gateway Interface (CGI) WWW form along
with the name of the sequence and BLAST parameters.
2027
byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
7. H.Sanchez-Villeda et al.
The CGI then performs a BLAST against a database con-
taining >300 000 Zea mays sequences from GenBank, and
>10 000 sequences from the Cornsensus Unigene set. The
BLAST results are returned in XML format and converted
by the CGI via XSLT. The CGI retrieves the accession num-
bers of related sequences from MaizeDB, and creates an
HTML table containing links to the related map and sequence
data in various databases including GenBank, MaizeDB,
The Institute for Genomic Research (TIGR) (The Institute
for Genomic Research, 2003, http://www.tigr.org/), Gra-
mene (Gramene, 2003, http://www.gramene.org/), the Ari-
zona Genomics Institute (AGI) (Arizona Genomics Institute,
2002, http://www.genome.arizona.edu/), and the Clemson
University Genomics Institute (CUGI) (Clemson University
GenomicsInstitute,2003, http://www.genome.clemson.edu/),
and Zea mays DataBase (ZmDB) (ZmDB, 2003, http://www.
zmdb.iastate.edu/).
The integrated genetic and physical map visualization tool
of MMP-LIMS, iMap (Cone et al., 2002), allows researchers
to access data related to loci on the genetic map along with
their associated contigs on the physical map. The graphical
interface displays the positions of the loci and contigs on the
genetic map and physical map, respectively. Searches may
be conducted based on the locus, probe, GenBank accession
number, or contig number.
The cMap (Fang et al., 2003) function of MMP-LIMS per-
mits the user to select and compare two genetic maps at a time
with dynamic links to data resources and text lists of the shared
loci between the compared maps. Searches can be conducted
based on locus, probe, or GenBank accession number.
4 DISCUSSION
The MMP-LIMS was designed to meet the challenges of a
high-throughput mapping project. Currently, MMP-LIMS is
being used at the Maize Mapping Project at the University of
Missouri—Columbia. The system has been used to enter and
verify 957 SSR markers, 1023 RFLP markers, 189 SNPs, and
177 InDels. MMP-LIMS is used primarily for the maize IBM
mapping population consisting of 360 samples.
MMP-LIMS has also been used for managing 590 SSRs of
the IF2 population with 56 samples and 359 SSRs of the C6
population with 93 samples. The two other populations were
used to map SSRs that are monomorphic in the IBM popula-
tion. The SSR loci from these two populations are integrated
within an enhanced version of the IBM map called the IBM
Neighbors map by interpolating the location of the marker
loci with loci shared between the IBM map and the other
maps (Cone et al., 2002).
Users performing research on a species other than maize can
customize the functions of the MMP-LIMS system by adding
populations, samples, and markers specific to the species
of interest. For example, members of the Soybean Gen-
omics Consortium (Soybean Genomics Consortium, 2003,
http://www.soybeangenome.org) have requested MMP-LIMS
for customization as the system to manage the data produced
in the generation of a genetic map for the soybean genome.
A variety of LIMS systems (Table 2), including the Lab-
Base (Goodman et al., 1998) system, dnaLIMS™ by dnaTools
(dnaTools,2002, http://www.dnatools.com/dnalims.html),the
GeneTrials LIMS system by Waban Software (Waban
Software Inc., 2002, http://www.wabansoftware.com/
Lims.htm), Sapphire Informatics 3.0 by LabVantage
(LabVantage, 2002, http://www.labvantage.com/products_
sapphireinfo.htm), theNautilissystembyThermoLabSystems
(Thermo LabSystems, 2002, http://www.thermolabsystems.
com/news/press/articles/020906-nautilus2002r2.asp), thesys-
tem by Clive G. Brown and Richard Mott from the
Bioinformatics Group at the Wellcome Trust Centre for
Human Genetics (Wellcome Trust Centre for Human
Genetics, 2001, http://bioinformatics.well.ox.ac.uk/project-
lims.html), CimBiosis™ Genotyping Workflow System
(Cimarron Software, Inc., http://www.cimsoft.com/products.
html), and Applied Biosystems GeneMapper™ Software
(Applied Biosystems, 2003) are currently available. However,
thesesoftwarepackagesdonotprovidethesamesetoffeatures
as MMP-LIMS. Several of the software packages provide only
generic interfaces that must be customized before storing lab
data. In addition, these systems do not provide a method for
validating and verifying genotyping scores or for using differ-
ent types of markers to generate an output file for a standard
mapping tool such as MapMaker. Only some of the systems
provide the user with an interface to data from ABI DNA
sequencers. While some systems are entirely web-based, few
of the systems provide a combination of both client/server lab
software in addition to web-based data query and visualization
tools to accommodate both local and remote users. In addi-
tion, the incorporation of sequence analysis tools for SSR and
SNP/InDel experiments is not found in the other packages.
Most of the systems were not designed to specifically handle
different types of genetic markers such as SSRs, RFLPs and
SNP/InDels.
MMP-LIMS is a complete system and the software is freely
available to the public. The system includes several levels of
security, a genotype scoring tool, a data entry tool for remote
researchers to submit data, scripts for designing SSR primers
and for locating potential SNP/InDels, a system for finding
sequences that are similar to a query sequence along with
related database links, and viewers for both an integrated
genetic/physical map and for comparison of genetic maps.
ACKNOWLEDGEMENTS
We would like to thank the members of our advisory com-
mittee including Sue Wessler (chair), Brad Barbazuk, Vicki
Chandler, Joe Ecker, Stan Letovsky and Antoni Rafalski.
Names are necessary to factually report on available data;
2028
byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
8. Development of an integrated LIMS for the MMP
Table 2. Feature comparisons
Legend
n/l—Not listed
in article or on
software website
MMP-LIMS LabBase dnaLIMS™ GeneTrials™
LIMS
Sapphire
Informatics 3.0
Nautilis System
by Brown
and Mott
CimBiosis™
Genotyping
Workflow
System
Applied
Biosystems
GeneMapper™
Freely available to public y y n n n n y n n
Interface customized for
genetic map data
y n n y n/l n/l n n y
Validation and
verification of
genotype scores
y n n n/l n/l n/l n/l n/l y
Generation of mapMaker
input file with multiple
marker types
y n n n/l n/l n/l n/l n/l n/l
Different security levels y n/l n/l n/l n/l n/l n/l n/l n/l
Interface to data from
ABI DNA sequencers
y n y n/l n/l n/l y y y
Combination of both
client/server lab
software and
web-based data query
and visualization tools
y n n n/l y y y y n/l
Incorporation of sequence
analysis tools for SSR
and SNP/InDel
experiments
y n y n/l n/l n/l n/l n/l n/l
Handling a variety of
genetic marker data
(i.e. SSRs, RFLPs,
SNPs/InDels)
y n n n/l n/l n/l y y y
y, provided; n, not provided
however, neither USDA nor the University of Missouri guar-
antees nor warrants the standard of the product, and the use of
the name implies no approval of the product to the exclusion of
others that may also be suitable. This research was supported
by the National Science Foundation (DBI 9872655).
REFERENCES
Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J.
(1990) Basic local alignment search tool. J. Mol. Biol., 215,
403–410.
Applied Biosystems (2003) Applied Biosystems | Main. Accessed
2003 Feb 4.
Arizona Genomics Institute (2002) Dec 20. AGI Home Page.
Accessed 2003 Feb 4.
Cimarron Software, Inc. (2002) March 1. Cimarron Software, Inc.—
Products. Accessed 2003 Feb 4.
Clemson University Genomics Institute (2003) Jan 29. CUGI:
Clemson University Genomics Institute. Accessed 2003 Feb 4.
Coe,E., Cone,K., McMullen,M., Chen,S., Davis,G., Gardiner,J.,
Liscum,E., Polacco,M., Paterson,A., Sanchez-Villeda,H.,
Soderlund,C. and Wing,R. (2002) Access to the maize genome:
Anintegratedphysicalandgeneticmap.PlantPhysiol., 128, 9–12.
Cone,K., McMullen,M., Vroh Bi,I., Davis,G., Yim,Y.-S.,
Gardiner,J., Polacco,M., Sanchez-Villeda,H., Fang,Z.,
Schroeder,S. et al. (2002) Genetic, physical and informatic
resources for maize: On the road to an integrated map. Plant
Physiol., 130, 1594–1601.
Date,C.J. (2002) An Introduction to Database Systems (Seventh
Edition). Addison Wesley Longman, Inc., Reading, MA.
Davis,G., McMullen,M., Baysdorfer,C., Musket,T., Grant,D.,
Staebell,M.S., Xu,G., Polacco,M., Koster,L., Melia-Hancock,S.
et al. (1999) A maize map standard with sequenced core mark-
ers, grass genome reference points, and 932 ESTs in a 1736-locus
map. Genetics, 152, 1137–1172.
Davis,G., Musket,T., Melia-Hancock,S., Duru,N., Sharopova,N.,
Schultz,L., McMullen,M.D., Sanchez-Villeda,H., Schroeder,S.
and Garcia,A.A. (2001) The intermated B73 x Mo17 genetic map:
a community resource. Maize Genetics Conference Abstracts,
43:W15, 62.
dnaTools (2002) Sep 28. dnaTools. Accessed 2003 Feb 4.
Ewing,B. and Green,P. (1998) Base-calling of automated sequen-
cer traces using phred. II. Error probabilities. Genome Res., 8,
186–194.
Ewing,B., Hillier,L., Wendl,M.C. and Green,P. (1998) Base-calling
of automated sequencer traces using phred. I. Accuracy assess-
ment. Genome Res., 8, 175–185.
Fang,Z., Polacco,M., Chen,S., Schroeder,S., Hancock,D.,
Sanchez,H. and Coe,E. (2003) cMap: the comparative genetic
map viewer. Bioinformatics, 19, 416–417.
2029
byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom
9. H.Sanchez-Villeda et al.
Goodman,N., Rozen,S., Stein,L. and Smith,A. (1998) The Lab-
Base system for data management in large scale biology research
laboratories. Bioinformatics, 14, 562–574.
Gramene (2003) Jan 19. Gramene. Accessed 2003 Feb 4.
LabVantage (2002) Aug 26. Sapphire Informatics 3.0 is a
browser/server-based solution. Accessed 2003 Feb 4.
Lander,E.S., Green,P., Abrahamson,J., Barlow,A., Daly,M.J.,
Lincoln,S.E. and Newburg,I. (1987) MAPMAKER: an interactive
computer package for constructing primary genetic linkage maps
of experimental and natural populations. Genomics, 1, 174–181.
MaizeDB (2003) Jan 27. Maize Genome Database—MaizeDB.
Accessed 2003 Feb 4.
Maize Mapping Project (2003) Jan 30. Maize Mapping Project.
Accessed 2003 Feb 4.
Maize Mapping Project (2002) Oct 8. Cornsensus Sequence Files.
Accessed 2003 Feb 4.
Rice Genome Research Program 2002 Nov 15. Rice Genome
Research Program (RGP) Home Page. Accessed 2003 Feb 4.
Rozen,S. and Skaletsky,H.J. (2000) Primer3 on the WWW for
general users and for biologist programmers. In Krawetz,S.
and Misener,S. (eds), Bioinformatics Methods and Protocols:
Methods in Molecular Biology. Humana Press, Totowa, NJ,
pp. 365–386.
Sharopova,N., McMullen,M.D., Schultz,L., Schroeder,S.,
Sanchez-Villeda,H., Gardiner,J., Bergstrom,D., Houchins,K.,
Melia-Hancock,S., Musket,T. et al. (2002) Development and
mapping of SSR markers for maize. Plant Mol. Biol., 48,
463–481.
Soderlund,C., Humphray,S., Dunham,A. and French,L. (2000) Con-
tigs built with fingerprints, markers and FPC V4.7. Genome Res.,
10, 1772–1787.
Soybean Genomics Consortium (2003) Mar 6. Soybean Genomics
Consortium Accessed 2003 Apr 17.
The Institute for Genomic Research (2003) Jan 23. The Institute for
Genomic Research. Accessed 2003 Feb 4.
Thermo LabSystems (2002) Sep 6. Thermo LabSystems—
Company—News—Press—Thermo LabSystems delivers
Nautilus™ 2002 R2 LIMS. Accessed 2003 Feb 4.
Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTAL W:
Improving the sensitivity of progressive multiple sequence
alignment through sequence weighting, positions-specific gap
penalties and weight matrix choice. Nucleic Acids Res., 22,
4673–4680.
Waban Software Inc. (2002) Mar 26. Waban Software Inc. Accessed
2003 Feb 4.
Wellcome Trust Centre for Human Genetics (2001) Jan 21.
WTCHG Bioinformatics Website: Homepage Accessed 2003
Feb 4.
ZmDB (2003) Jan 22. ZmDB: Maize Genome Database Accessed
2003 Feb 4.
2030
byguestonApril24,2013http://bioinformatics.oxfordjournals.org/Downloadedfrom