Artificial intelligence in the post-deep learning era
Interview with NCBI Staff Scientist Carol Scott
1. Interview with
Carol Scott
PhD, Bioengineering
Bioinformatics Scientist and Curator
Conserved Domain Database
A project of the U.S. National Library of Medicine at the
National Institutes of Health,
National Center for Biotechnology Information
Katie Rapp
LBSC 690
March 1, 2011
2. Protein is Everything!
Every living thing is made up of unique, identifiable proteins
Examples: human hemoglobin, insulin, proteins in
fungus, bacteria, plants
Proteins are made of different combinations of amino acids
20 naturally-occurring amino acids; they are like beads in a necklace
and their order determines the type of protein
Proteins do the work inside cells
Examples: Hemoglobin carries oxygen in the blood, insulin regulates
glucose metabolism
3. Problems with Proteins
Proteins do the work inside cells, so when there are
problems, such as diseases, they are often caused by a
defective protein
Example: Sickle Cell Anemia (one change in one amino acid in
hemoglobin and you go from healthy to ill)
Medical researchers study proteins at the molecular level in
order to find cures to diseases
4. Conserved Domains –
Motivation behind the
database
The amino acid chains that make up proteins are coiled and
folded. Repeated blocks of coiled and folded amino acids are
referred to as “conserved domains.”
Conserved domains have specific functions and 3-
dimensional shapes
It is useful for researchers to be able to compare related
conserved domains in different proteins, but there was no
real way to do this in the past
5. Conserved Domain Database -
Development
This database was developed to meet the needs of
researchers
Project begun in 2001; Carol Scott has worked on it since
2002
Worked with software developers to produce highly-
interactive database
6. Conserved Domain Database
Curators
Carol Scott and other curators create the data in the
database from lists of amino acid sequences found in other
databases
They take amino acid sequences from millions of proteins
and link them based on structural and functional similarities
They work with programmers to create the interface and
visual output of the database
Curators also find and provide links to information about each
protein, journal articles and other resources, related proteins
7. Conserved Domain Database -
Challenges
Not all amino acid sequence information is reliable – curators
must pick and choose where they get the basic data to put
into their database
The process of creating the comparisons in the CDD is very
complex and time-consuming
Software exists to help find these comparisons, but much
work must be done manually based on knowledge of the
chemical attributes of the amino acids
The project is currently facing budgetary cutbacks which
affect staffing and perhaps the future of the database
8. Conserved Domain Database
Results
Enables scientists to search on specific amino acid chains of
interest to them
Genetic studies, mutation studies, studying size, shape and
function of proteins
They can find and compare similar chemical alignments in
different proteins
These alignments can provide insight into the functions of
different parts of protein molecules
11. Conserved Domain Database
Users – Who Are They?
The database is freely accessible to anyone over the internet
It is used frequently by researchers around the world
Users include anyone studying proteins – everyone from high
school and college students up to very high level researchers
at NIH, pharmaceutical companies, genetic researchers,
bioengineering firms, etc.
Can be used to spur further research into areas where
defects in proteins could be repaired using genetic
engineering