GBSN - Microbiology (Unit 6) Human and Microbial interaction
1.Databases for bioinformatics and its types
1. WHAT is a database?
• A collection of data that needs to be:
– Structured
– Searchable
– Updated (periodically)
– Cross referenced
• Challenge:
– To change “meaningless” data into useful information that can be
accessed and analysed the best way possible.
For example:
HOW would YOU organise all biological sequences so that the
biological information is optimally accessible?
You need an appropriate database management system (DBMS)
2. DBMS
• Internal organization
– Controls speed and
flexibility
• A unity of programs that
– Store
– Extract
– Modify
Database
Store Extract Modify
USER(S)
4. Relational databases
• Data is stored in multiple related tables
• Data relationships across tables can be
either many-to-one or many-to-many
• A few rules allow the database to be
viewed in many ways
• Lets convert the “course details” to a
relational database
5. Student 1 Chemistry Biology A B B A C …..
Student 2 Ecology Maths A D A A A …..
.
.
.
.
Course details
FLAT DATABASE 2
Student 2 Ecology Biology A B A A A …..
Student 1 Chemistry English A A A A A …..
.
.
.
.
Name Depart. Course E1 E2 E3 P1 P2
Student 1 Chemistry Maths C C B A A …..
Our flat file database
6. Normalize (1NF) …
• We remove repeating records (rows)
sID Name dID
1 Student1 1
2 Student2 2
cID Course
1 Biology
2 Maths
3 English
dID Department
1 Chemistry
2 Ecology
1 1 A B B A C …..
2 2 A D A A A …..
.
.
.
.
2 1 A B A A A …..
1 3 A A A A A …..
.
.
.
.
sID cID E1 E2 E3 P1 P2
1 2 C C B A A …..
Primary keys
Foreign keys
8. Relational Databases
• What have we achieved?
– No repeating information
– Less storage space
– Better reality representation
– Easy modification/management
– Easy usage of any combination of records
Remember
the DBMS has programs to access and edit this
information so ignore the human reading limitation of
the primary keys
9. Accessing database information
• A request for data from a database is
called a query
• Queries can be of three forms:
– Choose from a list of parameters
– Query by example (QBE)
– Query language
Query by Example (QBE) reports allows end users to query, insert, update, and delete
values into a database table or view.
In the QBE build wizard, you choose which data to display in the report. Or, you can
allow end users to make their own queries in the QBE report's customization form.
Because the QBE system formulates the actual query, QBE is easier to learn than
formal query languages, such as the standard Structured Query Language (SQL).
10. Distributed databases
• From local to global attitude
• Data appears to be in one location but is most definitely
not
• A definition: Two or more data files in different locations,
periodically synchronized by the DBMS to keep data in
all locations consistent (A,B,C)
• An intricate network for combining and sharing
information
• Administrators praise fast network technologies!!!
• Users praise the internet!!!
11. Three main Points
• Database proliferation
– Dozens to hundreds at the moment
• More and more scientific discoveries result
from inter-database analysis and mining
• Rising complexity of required data-
combinations
– E.g. translational medicine: “from bench to
bedside” (genomic data vs. clinical data)
Proliferation = great and rapid increase in numbers; Grid = a network of evenly
space horizontal and vertical lines (rooster);
Semantic = related to the meaning;
12. Biological databases
• Like any other database
– Data organization for optimal analysis
• Data is of different types
– Raw data (DNA, RNA, protein sequences)
– Curated data (DNA, RNA and protein
annotated sequences and structures,
expression data)
14. A short word on problems
• Even today we face some key limitations
– There is no standard format
• Every database or program has its own format
– There is no standard nomenclature
• Every database has its own names
– Data is not fully optimized
• Some datasets have missing information without indications
of it
– Data errors
• Data is sometimes of poor quality, erroneous, misspelled
• Error propagation resulting from computer annotation