Pharmacoinformatics Database basics(sree)


Published on


Published in: Education
  • why can't i download it?
    Are you sure you want to  Yes  No
    Your message goes here
  • woooooooooooooooooooow it's an amazing ppt .. its worthy and so useful ... could you send it to my following e.mail,please :
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Pharmacoinformatics Database basics(sree)

  1. 1. Pharmacoinformatics
  2. 2. <ul><li>Pharmacoinformatics is an emerging field that draws upon both Bioinformatics and Cheminformatics. </li></ul><ul><li>The scientific or research aspect deals with the use of technology in drug discovery while the service aspect deals with monitoring patients of a drug </li></ul><ul><li>The scope for jobs is essentially with companies involved in drug research and clinical research. </li></ul><ul><li>National Institute of Pharmaceutical Education and Research (NIPER) in Punjab appears to be the only structured course in this area at the post graduate and the Ph D level. </li></ul><ul><li>Bioinformatics Institute of India in NOIDA, Uttar Pradesh also claims to offer a Ph D in this area. </li></ul><ul><li>This is an emerging field, placements are not clear and companies would probably view pharmacoinformatics at par with cheminformatics and bioinformatics. </li></ul><ul><li>Most pharma and biotech companies are adopting a wait and watch policy and don't have full fledged department, IBM, Sun Microsystems and Oracle are significant players in the biosystems domain. </li></ul>
  3. 3. Pharmacoinformatics <ul><li>Agenda: </li></ul><ul><li>Database Design </li></ul><ul><li>Information Management </li></ul><ul><li>Drug Information Services </li></ul>
  4. 4. <ul><li>Database Design: </li></ul><ul><li>Structure of databases </li></ul><ul><li>Sequence databases </li></ul><ul><li>Relational databases </li></ul><ul><li>Sequence analysis </li></ul><ul><li>Software resources </li></ul><ul><li>Sequence alignment </li></ul><ul><li>Database searches </li></ul><ul><li>Phylogentic analysis </li></ul>
  5. 5. Fundamentals of Database Design
  6. 6. Agenda <ul><li>Introduction and participants needs </li></ul><ul><li>We will review “what is a database;” </li></ul><ul><li>Understand the difference between data and information; </li></ul><ul><li>What is the purpose of a database system; </li></ul><ul><li>How to select a database system; </li></ul><ul><li>Database definitions and fundamental building blocks; </li></ul>
  7. 7. Agenda (2) <ul><li>Database development: the first steps; </li></ul><ul><li>Quality control issues; </li></ul><ul><li>Data entry considerations; </li></ul>
  8. 8. What is a database <ul><li>A database is any organized collection of data. Some examples of databases you may encounter in your daily life are: </li></ul><ul><ul><li>a telephone book </li></ul></ul><ul><ul><li>T.V. Guide </li></ul></ul><ul><ul><li>airline reservation system </li></ul></ul><ul><ul><li>motor vehicle registration records </li></ul></ul><ul><ul><li>papers in your filing cabinet </li></ul></ul><ul><ul><li>files on your computer hard drive. </li></ul></ul><ul><ul><li>Banking </li></ul></ul>
  9. 9. Data vs. information: What is the difference? <ul><li>What is data? </li></ul><ul><ul><li>Data can be defined in many ways. Information science defines data as unprocessed information. </li></ul></ul><ul><li>What is information? </li></ul><ul><ul><li>Information is data that have been organized and communicated in a coherent and meaningful manner. </li></ul></ul><ul><ul><li>Data is converted into information, and information is converted into knowledge. </li></ul></ul><ul><ul><li>Knowledge; information evaluated and organized so that it can be used purposefully. </li></ul></ul>
  10. 10. Why do we need a database? <ul><li>Keep records of our: </li></ul><ul><ul><li>Clients </li></ul></ul><ul><ul><li>Staff </li></ul></ul><ul><ul><li>Volunteers </li></ul></ul><ul><li>To keep a record of activities and interventions; </li></ul><ul><li>Keep sales records; </li></ul><ul><li>Develop reports; </li></ul><ul><li>Perform research </li></ul><ul><li>Longitudinal tracking </li></ul>
  11. 11. What is the ultimate purpose of a database management system? Data Information Knowledge Action Is to transform
  12. 12. More about database definition <ul><li>What is a database? </li></ul><ul><li>Quite simply, it’s an organized collection of data. A database management system (DBMS) such as Access, FileMaker, Lotus Notes, Oracle or SQL Server which provides you with the software tools you need to organize that data in a flexible manner. It includes tools to add, modify or delete data from the database, ask questions (or queries) about the data stored in the database and produce reports summarizing selected contents. </li></ul>
  13. 13. For example: Databases in Bioinformatics <ul><li>Outlook contacts </li></ul><ul><li>Aspira Association MIS </li></ul><ul><li>KidTrax </li></ul><ul><li>GIS-GPS systems </li></ul>
  14. 14. Example: 2
  15. 15. What is a database? <ul><li>A collection of... </li></ul><ul><ul><li>structured </li></ul></ul><ul><ul><li>searchable (index) -> table of contents </li></ul></ul><ul><ul><li>updated periodically (release) -> new edition </li></ul></ul><ul><ul><li>cross-referenced ( hyperlinks ) -> links with other db </li></ul></ul><ul><li>… data </li></ul><ul><li>Includes also associated tools (software) necessary for db access, db updating, db information insertion, db information deletion…. </li></ul>
  16. 16. Types of Databases <ul><li>Non-relational databases </li></ul><ul><li>Non-relational databases place information in field categories that we create so that information is available for sorting and disseminating the way we need it. The data in a non-relational database, however, is limited to that program and cannot be extracted and applied to a number of other software programs, or other database files within a school or administrative system. The data can only be &quot;copied and pasted.“ Example: a spread sheet </li></ul><ul><li>Relational databases </li></ul><ul><li>In relational databases, fields can be used in a number of ways (and can be of variable length), provided that they are linked in tables. It is developed based on a database model that provides for logical connections among files (known as tables) by including identifying data from one table in another table </li></ul>
  17. 17. Data structure <ul><li>In computer science , a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently. </li></ul><ul><li>Data structures are used in almost every program or software system </li></ul><ul><li>Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks. For example, B-trees are particularly well-suited for implementation of databases, while compiler implementations usually use hash tables to look up identifiers. </li></ul><ul><li>Principle : </li></ul><ul><li>Data structures are generally based on the ability of a computer to fetch and store data at any place in its memory, specified by an address --a bit string that can be itself stored in memory and manipulated by the program. </li></ul><ul><li>The implementation of a data structure usually requires writing a set of procedures that create and manipulate instances of that structure. </li></ul>
  18. 18. <ul><li>Common data structures </li></ul><ul><li>Array , --An array is a systematic arrangement of objects, usually in rows and columns. </li></ul><ul><li>linked list , -- linked list (or more clearly, &quot;singly-linked list&quot;) is a data structure that consists of a sequence of nodes each of which contains a reference (i.e., a link ) to the next node in the sequence. </li></ul><ul><li>hash-table ,- hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys (e.g., a person's name), to their associated values (e.g., their telephone number). </li></ul><ul><li>heap , -- heap is a specialized tree -based data structure that satisfies the heap property. </li></ul>
  19. 19. <ul><li>B-tree , --a B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic amortized time . </li></ul><ul><li>red-black tree , -- a type of self-balancing binary search tree , a data structure used in computing science , typically used to implement associative arrays . ---organize pieces of comparable data , such as text fragments or numbers </li></ul><ul><li>trie .--a trie , or prefix tree , is an ordered tree data structure that is used to store an associative array where the keys are usually strings . </li></ul>
  20. 20. <ul><li>Language support: </li></ul><ul><li>Most Assembly languages and some low-level languages ex: BCPL generally lack support for data structures </li></ul><ul><li>Many high-level programming languages , and some higher-level assembly languages, ex: MASM , on the other hand, have special syntax or other built-in support for certain data structures, </li></ul><ul><li>Programming languages: supported with standard libraries that implement the most common data structures ex: the C++ Standard Template Library , the Java Collections Framework , and Microsoft 's .NET Framework . </li></ul>
  21. 21. <ul><li>Sequence database: </li></ul><ul><li>---In the field of bioinformatics , a sequence database is a large collection of computerized (&quot; digital &quot;) nucleic acid sequences , protein sequences , or other sequences stored on a computer. A database can include sequences from only one organism (e.g., a database for all proteins in Saccharomyces cerevisiae ), or it can include sequences from all organisms whose DNA has been sequenced. </li></ul><ul><li>Ex: Protein structure database- - In biology , a protein structure database is a database that is modeled around the various experimentally determined protein structures . The aim of most protein structure databases is to organize and annotate the protein structures, providing the biological community access to the experimental data in a useful way. </li></ul>
  22. 22. <ul><li>Examples of protein structure databases include (in alphabetical order); </li></ul><ul><li>Database of Macromolecular Movements   describes the motions that occur in proteins and other macromolecules, particularly using movies JenaLib   the Jena Library of Biological Macromolecules is aimed at a better dissemination of information on three-dimensional biopolymer structures with an emphasis on visualization and analysis. MODBASE   a database of three-dimensional protein models calculated by comparative modeling PDBe   the European resource for the collection, organisation and dissemination of data on biological macromolecular structures, and a member of the Worldwide Protein Data Bank . OCA   a browser-database for protein structure/function - The OCA integrates information from KEGG , OMIM , PDBselect , Pfam , PubMed , SCOP , SwissProt , and others. OPM   provides spatial positions of protein three-dimensional structures with respect to the lipid bilayer . PDB Lite   derived from OCA, PDB Lite was provided to make it as easy as possible to find and view a macromolecule within the PDB PDBsum   provides an overview macromolecular structures in the PDB, giving schematic diagrams of the molecules in each structure and of the interactions between them PDBTM   the Protein Data Bank of Transmembrane Proteins — a selection of the PDB. PDBWiki   a community annotated knowledge base of biological molecular structures [1] Protein   the NIH protein database, a collection of sequences from several sources, including translations from annotated coding regions in GenBank , RefSeq and TPA , as well as records from SwissProt , PIR , PRF , and PDB Proteopedia   the collaborative, 3D encyclopedia of proteins and other molecules. A wiki that contains a page for every entry in the PDB (>50,000 pages), with a Jmol view that highlights functional sites and ligands. Offers an easy-to-use scene-authoring tool so you don't have to learn Jmol script language to create customized molecular scenes. Custom scenes are easily attached to &quot;green links&quot; in descriptive text that display those scenes in Jmol. SCOP   the Structural Classification of Proteins [2] a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known. SWISS-MODEL Repository   a database of annotated protein models calculated by homology modeling TOPSAN   the Open Protein Structure Annotation Network — a wiki designed to collect, share and distribute information about protein three-dimensional structures. Retrieved from &quot; http:// &quot; </li></ul>
  23. 23. <ul><li>Sequence analysis </li></ul><ul><li>Def: The term &quot; sequence analysis &quot; in biology implies subjecting a DNA or peptide sequence to sequence alignment , sequence databases , repeated sequence searches, or other bioinformatics methods on a computer. </li></ul><ul><li>Sequence analysis in molecular biology and bioinformatics is an automated, computer-based examination of characteristic fragments, e.g. of a DNA strand. It basically includes relevant topics: </li></ul><ul><li>The comparison of sequences in order to find similarity and dissimilarity in compared sequences (sequence alignment) </li></ul><ul><li>Identification of gene-structures , reading frames , distributions of introns and exons and regulatory elements </li></ul><ul><li>Finding and comparing point mutations or the single nucleotide polymorphism (SNP) in organism in order to get the genetic marker. </li></ul><ul><li>Revealing the evolution and genetic diversity of organisms. </li></ul><ul><li>Function annotation of genes. </li></ul><ul><li>In chemistry , sequence analysis comprises techniques used to do determine the sequence of a polymer formed of several monomers . In molecular biology and genetics , the same process is called simply &quot; sequencing &quot;. </li></ul><ul><li>In marketing , sequence analysis is often used in analytical customer relationship management applications, such as NPTB models (Next Product to Buy). </li></ul>
  24. 24. <ul><li>Sequence Analysis in Molecular Biology: </li></ul><ul><li>Sequence Alignment is a way of arranging the sequences of DNA , RNA , or protein sequences to identify regions of similarity. It generally falls into two types: </li></ul><ul><li>-Pairwise alignment: Alignment between two sequences </li></ul><ul><li>-Multiple alignment: Alignment between more than two sequences </li></ul><ul><li>Existing methods for pairwise alignment include: Needleman- Wunsch algorithm , Smith-Waterman algorithm , and BLAST </li></ul><ul><li>Existing methods for multiple alignment include: ClustalW , PROBCONS , MUSCLE , MAFFT , DIALIGN , T-Coffee , POA, and MANGO . </li></ul><ul><li>Motif Finding </li></ul><ul><li>Motif Prediction </li></ul><ul><li>Methodology </li></ul><ul><li>The tasks that lie in the space of sequence analysis are often non-trivial to resolve and require the use of relatively complex approaches. Of the many types of methods used in practice, the most popular include: </li></ul><ul><li>Artificial Neural Network , </li></ul><ul><li>Hidden Markov Model </li></ul><ul><li>Support Vector Machine </li></ul><ul><li>Clustering </li></ul><ul><li>Bayesian Network </li></ul><ul><li>Regression Analysis </li></ul>
  25. 25. <ul><li>List of Computational Chemistry Software – Resources </li></ul><ul><li> Bioinformatics Software Cheminformatics Software LIMS Software Computer-Assisted Molecular Modeling Software CADD - Biopolymer Modeling Software CADD - General Modeling Software CADD - Conformational Search Software CADD - General Tools CADD - Molecular Mechanics/Dynamics Software CADD - Quantum Chemistry Software CADD - Display Software Structural Chemistry Software Structural Chemistry Software for Xray Analysis Structural Chemistry Software for IR Analysis Structural Chemistry Software for MS Analysis Structural Chemistry Software for NMR Analysis General Software Tools </li></ul>
  26. 26. <ul><li>Lists of Software for Bioinformatics: </li></ul><ul><li> Sequence Databases : ex: AceDB ( genome database ); The BioCyc (databases provides electronic reference sources on the pathways and genomes of different organisms ); Biopendium: (brings together information on sequence, structure and function relationships for all gene products in the public domain.); CAMELEON is a set of multiple sequence alignment tools with links to databases of known 3D structural fragments ); ERGO Light is a curated database of public and proprietary genomic DNA, with connected similarities, functions, pathways, functional models, clusters and more ; Expasy site contains a 2-D gel data database, searching engine and links to several gel databases throughout the world. ); GAIA 22 is a Chromosome 22 specific version of the GAIA database. GAIA is a data analysis and storage system for genomic sequence and its annotation. As a data analysis engine it accepts raw genomic sequence and automatically adds significant annotation ); GeneCards is a database of human genes, their products and their involvement in diseases ); GENESEQ was a database of protein and nucleic acid sequences extracted from world-wide patent documents ; GeneWorks - was an integrated sequence analysis and database searching ; ISYS(TM) , is the National Center for Genome Resources' new product that integrates independent bioinformatic software tools and databases ); OligoMaster is a multi-user oligonucleotide cataloguing application designed to help biologists manage and organise their oligonucleotide collections, available in versions for Windows, Macintosh and Linux); PhyloPat provides phylogenetic pattern analysis of eukaryotic genes.; ProteinCenter(™) integrates the contents of a large number of public protein sequence databases and your experimental systems biology data. Relibase is a web-based tool for searching and analysing protein ligand structures in the PDB); </li></ul>
  27. 27. <ul><li>ResNet is a comprehensive database of molecular networks and protein interactions, derived from automatic analysis of the whole PubMed.; The Rosetta Resolver System , provides high-capacity data storage, retrieval and analysis of gene expression data. The system is ideal for life science research organizations that need to assess compound specificity or toxicity, identify new genes or therapeutic targets, or compare and analyze large databases of expression profiles.; SGD is a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast; SRS is a database integration and biological information search system. It is capable of quering 400 different molecular biology, bibliographic, compound data, genetic and medical databases via a single interface. ; Software Solution for BioMedicine (SSBM) offers high-speed analysis of both public and proprietary genetic databases within the security of the corporate firewall; Vector NTI is a Macintosh- and Windows-based molecular biology support system . </li></ul><ul><li>Pathway Analysis Tools Structure Prediction and Analysis Tools Sequence Analysis Tools Sequence Management Tools Visualization Tools </li></ul>
  28. 28. <ul><li>Sequence Analysis Tools: Software resources: </li></ul><ul><li>AAT - Analysis and Annotation Tool used to identify genes by comparing cDNA and protein sequence databases. </li></ul><ul><li>ABI PRISM ; AcaClone pDRAW32 ; AGCT ; AlleleID ; Antheprot Protein analysis software ; Array Designer 4 : arraySCOUT TM is a gene expression data analysis application ; Artemis is a free genome viewer and annotation tool ; Asterias is a suite of freely- accessible web-based genomic data analysis programs ; Bio Image is a life sciences software information company which carries a wide variety of electrophoresis image analysis software for Windows, Powermac, and UNIX ; BioinformatiX is an enterprise software which provides an environment for the analysis of microarray data. ; BioRainbow Analysis Tools are a collection of software tools for binding site prediction, weight matrix search, regulatory sequences analysis, microarray analysis, footprint ; bioSCOUT® is a comprehensive and customizable bioinformatics package ; BioTools offers three primary bioinformatics products: GeneTool for DNA sequence analysis, PepTool for protein sequence analysis, and ChromaTool for chromatogram analysis; BlockSearch is a quantitative method for the elucidation of unknown protein functions; Bosque ( http:// ) is a distributed software environment oriented to manage the computational resources involved in typical phylogenetic analyses Clann: Software for investigating phylogenomic information using supertrees ; CURVES , by Richard Lavery and Heinz Sklenar is a very useful nucleic acid helical analysis program. DNADynamo is a general purpose software for DNA and Protein sequence analysis DNASIS is a robust sequence analysis software package that delivers industry standard functionality DNPTrapper is a shotgun sequencing assembly editing tool, specifically designed for finishing and analysis of repeated regions. EuGene and SAm is a menus based DNA and protein sequence analysis package Genchek , developed by Ocimum Biosolutions is a comprehensive, LIMS based, user friendly Nucleotide and Polypeptide Sequence Analysis Tool with a backend Relational Database Genehound(™) offers a new, innovative, and exciting apporach to identifying coding regions in prokaryotic genomes GeneInform is an easy-to-operate gene expression management and analysis tool that saves cost and time by facilitating the collection, storage, analysis, and sharing of gene expression data </li></ul>
  29. 29. <ul><li>Gene Inspector(™)1.5: A powerful and versatile combination of an electronic laboratory notebook and sequence analysis package for biologists. GeneLinker products are the easiest way for researchers to start analyzing gene expression data. GeneJockey is a program for editing, manipulation, and analysis of nucleic acid and protein sequences. GENEMARK is a genefinding tool available from the Georgia Institute of Technology that uses an algorithm based on non-homogenous Markov chain models. GENEPARSER is a coding region recognition program from the University of Colorado that uses potential similarity between query sequence and known amino acid sequences. GeneSifter ™, a Web-based microarray analysis system that combines data management and analytical functions with integrated, current gene annotation from databases such as Unigene and LocusLink. GeneSolve is a single-User desktop sofware package for analyzing nucleic acid sequence infromation GeneStudio Pro from GeneStudio, Inc. ( http:// ) is a newly developed suite of molecular biology programs for Windows GeneWorks - an integrated sequence analysis and database searching on the Macintosh previously marketed by Oxford Molecular Group GenomeBrowser is a powerful software tool that simplifies the proccess of analysis, annotation, and manipulation of genetic sequences. Genie , from LBNL, is a gene finder based on generalized hidden Markov models to locate multi-exon genes. Etc… </li></ul>
  30. 30. Relational Database <ul><li>Definition: </li></ul><ul><ul><li>Data stored in tables that are associated by shared attributes (keys). </li></ul></ul><ul><ul><li>Any data element (or entity) can be found in the database through the name of the table, the attribute name, and the value of the primary key. </li></ul></ul>
  31. 31. Relational Database Definitions <ul><li>Entity: Object, Concept or event (subject) </li></ul><ul><li>Attribute: a Characteristic of an entity </li></ul><ul><li>Row or Record: the specific characteristics of one entity </li></ul><ul><li>Table: a collection of records </li></ul><ul><li>Database: a collection of tables </li></ul>
  32. 64. <ul><li>Overview of Phylogenetic Analysis </li></ul><ul><li>Phylogenetic analysis is the process you use to determine the evolutionary relationships between organisms. </li></ul><ul><li>The results of an analysis can be drawn in a hierarchical diagram called a cladogram or phylogram (phylogenetic tree). </li></ul><ul><li>The branches in a tree are based on the hypothesized evolutionary relationships (phylogeny) between organisms. </li></ul><ul><li>Each member in a branch, also known as a monophyletic group, is assumed to be descended from a common ancestor. </li></ul><ul><li>Originally, phylogenetic trees were created using morphology, but now, determining evolutionary relationships includes matching patterns in nucleic acid and protein sequences. </li></ul><ul><li>Example: </li></ul><ul><li>-----phylogenetic tree is constructed from mitochondrial DNA (mtDNA) sequences for the </li></ul><ul><li>family Hominidae. This family includes gorillas, chimpanzees, orangutans, and humans. </li></ul><ul><li>Searching NCBI for Phylogenetic Data </li></ul><ul><li>The NCBI taxonomy Web site includes phylogenetic and taxonomic information from many sources. These sources include the published literature, Web databases, and taxonomy experts. And while the NCBI taxonomy database is not a phylogenetic or taxonomic authority, it can be useful as a gateway to the NCBI biological sequence databases </li></ul>
  33. 71. <ul><li>Principles of data organization </li></ul><ul><li>Database --a collection of related structured information about entities </li></ul><ul><li>File -- a collection of records </li></ul><ul><li>Record--a set of fields </li></ul><ul><li>Field --a single characteristic of an entity </li></ul><ul><li>Character--a symbol used in data field </li></ul>
  34. 85. Selecting a Database Management System <ul><li>Database management systems (or DBMSs) can be divided into two categories -- desktop databases and server databases.   </li></ul><ul><li>Generally speaking, desktop databases are oriented toward single-user applications and reside on standard personal computers (hence the term desktop).  </li></ul><ul><li>Server databases contain mechanisms to ensure the reliability and consistency of data and are geared toward multi-user applications. </li></ul>
  35. 87. Selecting a database system: Need Analysis <ul><li>The needs analysis process will be specific to your organization but, at a minimum, should answer the following questions: </li></ul><ul><li>How many records we will warehouse and for how long? </li></ul><ul><li>Who will be using the database and what tasks will they perform?  </li></ul><ul><li>How often will the data be modified?  Who will make these modifications? </li></ul><ul><li>Who will be providing IT support for the database? </li></ul><ul><li>What hardware is available?  Is there a budget for purchasing additional hardware? </li></ul><ul><li>Who will be responsible for maintaining the data? </li></ul><ul><li>Will data access be offered over the Internet?  If so, what level of access should be supported? </li></ul>
  36. 88. Some Definitions <ul><li>A File: A group or collection of similar records, like INST6031 Fall Student File, American History 1850-1866 file, Basic Food Group Nutrition File </li></ul><ul><li>A record book: a &quot;rolodex&quot; of data records, like address lists, inventory lists, classes or thematic units, or groupings of other unique records that are combined into one list (found in AppleWorks, FileMaker Pro software). </li></ul><ul><li>A field : one category of information, i.e., Name, Address, Semester Grade, Academic topic </li></ul><ul><li>A record : one piece of data, i.e., one student's information, a recipe, a test question </li></ul><ul><li>A layout : a design for a database that contains field names and possibly graphics. </li></ul><ul><li>Database glossary </li></ul>
  37. 89. <ul><li>Tables comprise the fundamental building blocks of any database.  If you're familiar with spreadsheets, you'll find database tables extremely similar.  Take a look at this example of a table sample database: </li></ul><ul><li>The table above contains the employee information for our organization -- characteristics like name, date of birth and title.  Examine the construction of the table and you'll find that each column of the table corresponds to a specific employee characteristic (or attribute in database terms).  Each row corresponds to one particular employee and contains his or her information.  That's all there is to it!  If it helps, think of each one of these tables as a spreadsheet-style listing of information. </li></ul>Fundamental building blocks
  38. 90. Where do we start? <ul><li>Let’s explore your “paper system” </li></ul><ul><ul><li>Client intake forms </li></ul></ul><ul><ul><li>Job application form </li></ul></ul><ul><ul><li>Funders reports </li></ul></ul><ul><li>Database modeling: </li></ul><ul><ul><li>Define required fields from “forms” or required reports </li></ul></ul><ul><ul><li>Avoid repetition </li></ul></ul><ul><ul><li>Keep it simple </li></ul></ul><ul><ul><li>Identify a unique identifier or primary key </li></ul></ul>
  39. 91. Some Quality Control Considerations <ul><li>Remember “garbage in – garbage out”. Some examples and how to prevent this. </li></ul><ul><li>Quality management encompasses three distinct processes: quality planning, quality control, and quality improvement </li></ul><ul><li>Quality Planning in relation to database systems design: </li></ul><ul><ul><li>Who will perform data entry? </li></ul></ul><ul><ul><li>Training? On-line help? </li></ul></ul><ul><ul><li>How data entry will be performed? </li></ul></ul>
  40. 92. Data entry considerations <ul><li>Define “must” enter fields – no record is complete unless: such and such is entered; </li></ul><ul><li>Make data entry fool proof. Example: Grade level can be entered as a number (8 or 8 th or eight). By using a pull-down menu with the correct data format these mistakes can be avoided. </li></ul>
  41. 93. Data Entry – additional considerations <ul><li>Barcode scanners </li></ul><ul><ul><li>USB or </li></ul></ul><ul><ul><li>Wireless attached to a Palm or Pocket PC </li></ul></ul><ul><li>Pocket PC </li></ul><ul><ul><li>WiFi 802.11g, Bluetooth </li></ul></ul><ul><ul><li>Wireless networks (real-time on demand systems) </li></ul></ul>
  42. 94. PEOPLE THAT WORK WITH DATABASES <ul><li>System Analysts </li></ul><ul><li>Database Designers </li></ul><ul><li>Application Developers </li></ul><ul><li>Database Administrators </li></ul><ul><li>End Users </li></ul>
  43. 95. System Analysts <ul><li>communicate with each prospective database user group in order to understand its </li></ul><ul><ul><li>information needs </li></ul></ul><ul><ul><li>processing needs </li></ul></ul><ul><li>develop a specification of each user group’s information and processing needs </li></ul><ul><li>develop a specification integrating the information and processing needs of the user groups </li></ul><ul><li>document the specification </li></ul>
  44. 96. Database Designers <ul><li>choose appropriate structures to represent the information specified by the system analysts </li></ul><ul><li>choose appropriate structures to store the information in a normalized manner in order to guarantee integrity and consistency of data </li></ul><ul><li>choose appropriate structures to guarantee an efficient system </li></ul><ul><li>document the database design </li></ul>
  45. 97. Application Developers <ul><li>implement the database design </li></ul><ul><li>implement the application programs to meet the program specifications </li></ul><ul><li>test and debug the database implementation and the application programs </li></ul><ul><li>document the database implementation and the application programs </li></ul>
  46. 98. Database Administrators <ul><li>Manage the database structure </li></ul><ul><li>Manage data activity </li></ul><ul><li>Manage the database management system </li></ul><ul><ul><li>generate database application performance reports </li></ul></ul><ul><ul><li>investigate user performance complaints </li></ul></ul><ul><ul><li>assess need for changes in database structure or application design </li></ul></ul><ul><ul><li>modify database structure </li></ul></ul><ul><ul><li>evaluate and implement new DBMS features </li></ul></ul><ul><ul><li>tune the database </li></ul></ul><ul><li>Establish the database data dictionary </li></ul><ul><ul><li>data names, formats, relationships </li></ul></ul><ul><ul><li>cross-references between data and application programs </li></ul></ul>
  47. 99. End Users <ul><li>Parametric end users constantly query and update the database. They use canned transactions to support standard queries and updates. </li></ul><ul><li>Casual end users occasional access the database, but may need different information each time. They use sophisticated query languages and browsers. </li></ul><ul><li>Sophisticated end users have complex requirement and need different information each time. They are thoroughly familiar with the capabilities of the DBMS. </li></ul>