Interview with NCBI Staff Scientist Carol Scott

Interview with
Carol Scott
PhD, Bioengineering
Bioinformatics Scientist and Curator
Conserved Domain Database
A project of the U.S. National Library of Medicine at the
National Institutes of Health,
National Center for Biotechnology Information
Katie Rapp
LBSC 690
March 1, 2011

Protein is Everything!
 Every living thing is made up of unique, identifiable proteins
 Examples: human hemoglobin, insulin, proteins in
fungus, bacteria, plants
 Proteins are made of different combinations of amino acids
 20 naturally-occurring amino acids; they are like beads in a necklace
and their order determines the type of protein
 Proteins do the work inside cells
 Examples: Hemoglobin carries oxygen in the blood, insulin regulates
glucose metabolism

Problems with Proteins
 Proteins do the work inside cells, so when there are
problems, such as diseases, they are often caused by a
defective protein
 Example: Sickle Cell Anemia (one change in one amino acid in
hemoglobin and you go from healthy to ill)
 Medical researchers study proteins at the molecular level in
order to find cures to diseases

Conserved Domains –
Motivation behind the
database
 The amino acid chains that make up proteins are coiled and
folded. Repeated blocks of coiled and folded amino acids are
referred to as “conserved domains.”
 Conserved domains have specific functions and 3-
dimensional shapes
 It is useful for researchers to be able to compare related
conserved domains in different proteins, but there was no
real way to do this in the past

Conserved Domain Database -
Development
 This database was developed to meet the needs of
researchers
 Project begun in 2001; Carol Scott has worked on it since
2002
 Worked with software developers to produce highly-
interactive database

Curators
 Carol Scott and other curators create the data in the
database from lists of amino acid sequences found in other
databases
 They take amino acid sequences from millions of proteins
and link them based on structural and functional similarities
 They work with programmers to create the interface and
visual output of the database
 Curators also find and provide links to information about each
protein, journal articles and other resources, related proteins

Conserved Domain Database -
Challenges
 Not all amino acid sequence information is reliable – curators
must pick and choose where they get the basic data to put
into their database
 The process of creating the comparisons in the CDD is very
complex and time-consuming
 Software exists to help find these comparisons, but much
work must be done manually based on knowledge of the
chemical attributes of the amino acids
 The project is currently facing budgetary cutbacks which
affect staffing and perhaps the future of the database

Results
 Enables scientists to search on specific amino acid chains of
interest to them
 Genetic studies, mutation studies, studying size, shape and
function of proteins
 They can find and compare similar chemical alignments in
different proteins
 These alignments can provide insight into the functions of
different parts of protein molecules

Output – 3-Dimensional Structures

Output - Superfamilies

Users – Who Are They?
 The database is freely accessible to anyone over the internet
 It is used frequently by researchers around the world
 Users include anyone studying proteins – everyone from high
school and college students up to very high level researchers
at NIH, pharmaceutical companies, genetic researchers,
bioengineering firms, etc.
 Can be used to spur further research into areas where
defects in proteins could be repaired using genetic
engineering

 Questions?

Interview with NCBI Staff Scientist Carol Scott

Recommended

Recommended

More Related Content

Similar to Interview with NCBI Staff Scientist Carol Scott

Similar to Interview with NCBI Staff Scientist Carol Scott (20)

Recently uploaded

Recently uploaded (20)

Interview with NCBI Staff Scientist Carol Scott