SlideShare a Scribd company logo
1 of 6
Download to read offline
DATABASES
eMelanoBase: An Online Locus-Specific Variant
Database for Familial Melanoma
David C.Y. Fung,1
Elizabeth A. Holland,1
Therese M. Becker,1
Nicholas K. Hayward,2
Brigitte Bressac-de Paillerets,3
Melanoma Genetics Consortium, and Graham J. Mann1n
1
Westmead Institute for Cancer Research, University of Sydney, Westmead Millennium Institute, Westmead, NSW, Australia; 2
Queensland
Institute of Medical Research, Brisbane, Australia; 3
Institut Gustave Roussy, Villejuif, France
Communicated by David N. Cooper
A proportion of melanoma-prone individuals in both familial and non-familial contexts has been shown
to carry inactivating mutations in either CDKN2A or, rarely, CDK4. CDKN2A is a complex locus that
encodes two unrelated proteins from alternately spliced transcripts that are read in different frames. The
alpha transcript (exons 1a, 2, and 3) produces the p16INK4A cyclin-dependent kinase inhibitor, while
the beta transcript (exons 1b and 2) is translated as p14ARF, a stabilizing factor of p53 levels through
binding to MDM2. Mutations in exon 2 can impair both polypeptides and insertions and deletions in
exons 1a, 1b, and 2, which can theoretically generate p16INK4A-p14ARF fusion proteins. No online
database currently takes into account all the consequences of these genotypes, a situation compounded by
some problematic previous annotations of CDKN2A-related sequences and descriptions of their
mutations. As an initiative of the international Melanoma Genetics Consortium, we have therefore
established a database of germline variants observed in all loci implicated in familial melanoma
susceptibility. Such a comprehensive, publicly accessible database is an essential foundation for research
on melanoma susceptibility and its clinical application. Our database serves two types of data as defined
by HUGO. The core dataset includes the nucleotide variants on the genomic and transcript levels, amino
acid variants, and citation. The ancillary dataset includes keyword description of events at the
transcription and translation levels and epidemiological data. The application that handles users’ queries
was designed in the model-view-controller architecture and was implemented in Java. The object-
relational database schema was deduced using functional dependency analysis. We hereby present our
first functional prototype of eMelanoBase. The service is accessible via the URL www.wmi.usyd.e-
du.au:8080/melanoma.html. Hum Mutat 21:2–7, 2002. r 2002 Wiley-Liss, Inc.
KEY WORDS: melanoma, familial; database; germline variants; model-view-controller architecture; CDKN2A;
CDK4
DATABASES:
CDKN2A – OMIM: 600160, 155601 (melanoma, familial); GDB: 335362; GenBank: L41934, U12818-20;
HGMD: CDKN2A
CDK4 – OMIM: 123829; GDB: 204022; GenBank: U37022; HGMD: CDK4
www.wmi.usyd.edu.au:8080/melanoma.html (eMelanoBase)
INTRODUCTION
A proportion of melanoma-prone individuals in
both familial (MIM# 155601) and non-familial
contexts has been shown to carry inactivating
mutations in either CDKN2A (MIM# 600160) or,
rarely, CDK4 (MIM# 123829). CDKN2A is a locus
with dual functionality, encoding two unrelated
proteins in distinct reading frames. The alpha
transcript (exons 1a, 2, and 3) produces the
p16INK4A cyclin-dependent kinase inhibitor known
to inhibit cyclinD1/CDK4 or CDK6 complexes. The
Received 4 June 2002; accepted revised manuscript16 Septem-
ber 2002.
n
Correspondence to: Graham J. Mann, M.D., Ph.D.,Westmead
Institute for Cancer Research, Westmead Millennium Institute,
Darcy Rd.,Westmead, NSW 2145, Australia.
E-mail: gmann@mail.usyd.edu.au
Grant sponsor: Australian National Health and Medical
Research Council; Grant numbers:991331;211172.
DOI10.1002/humu.10149
Published online in Wiley InterScience (www.interscience.wiley.
com).
rr2002 WILEY-LISS, INC.
HUMAN MUTATION 21:2^7 (2002)
beta transcript (exons 1b and 2) produces the protein
p14ARF, which is a positive regulator of p53 levels
through binding to MDM2.
Germline mutations have been observed in mela-
noma-susceptible individuals that affect all exons,
except exon 3; however, those uniquely affecting
p14ARF are rare, despite clear evidence from animal
knockout models that it is a potent tumor suppressor
gene and melanoma susceptibility locus in its own
right [Walker and Hayward, 2002]. Because of the
dual encoding by exon 2 (the shared exon 3 is beyond
the translation stop codon of p14ARF), mutations in
that exon can impair both the p16INK4A and
p14ARF polypeptides [Rizos et al., 2001]. Further-
more, frameshifts caused by insertions and deletions
in exons 1a, 1b, or 2 can, in theory, generate
p16INK4A-p14ARF fusion proteins whose functions
are yet to be understood.
Despite the clear association of these loci with
inherited susceptibility to melanoma, there is signifi-
cant uncertainty in the degree of risk conferred by
variant alleles [Kefford et al., 1999]. It is known to be
modified by prior sun exposure and by red-hair
associated variant alleles of the melanocortin receptor
(MC1R) [Cannon-Albright et al., 1994; van der
Velden et al., 2001; Box et al., 2001]. Geographic
location is a major determinant of penetrance of
CDKN2A mutations but it is not yet clear to what
extent this reflects sun exposure or ethnicity-asso-
ciated pigmentation [Bishop et al., 2002]. The
international Melanoma Genetics Consortium was
founded in 1997 to facilitate such research and to
develop guidelines for the use of genetic information
in risk assessment [Kefford et al., 1999]. A compre-
hensive, publicly accessible database of germline
variants in genes that influence melanoma suscept-
ibility is an essential foundation for such research and
its clinical application.
There are currently three data services available
online that present information about CDKN2A
mutations. Twelve p16INK4A and two CDK4 cDNA
mutations are described in Online Mendelian In-
heritance in Man [2000]. Forty-eight p16INK4A and
two CDK4 cDNA mutations are being presented in
the Human Genetics Mutation Database (HGMD,
Cardiff). A list of 148 p16INK4A mutations are
provided by the Radium Hospital (Norway) as a
downloadable Excel spreadsheet. None of them take
into account all the consequences of these genotypes,
especially those that impact on p14ARF function.
This situation has been compounded by some
problematic previous annotations of CDKN2A-re-
lated sequences and descriptions of their mutations
including reliance on initially published cDNA
sequences that were incomplete at the N-terminus.
Somatic variants from cell lines and primary tumors
are also included in two of the services. Such data are
more difficult to interpret because they do not provide
direct genetic evidence of a relationship with cancer
susceptibility. We therefore established a database of
germline variants observed in all loci implicated in
familial melanoma susceptibility. Its purpose is to
collate information for each reported genotype on its
molecular outcomes, biological effects, methods of
detection, and population genetics. As an initiative of
the international Melanoma Genetics Consortium,
we hereby describe our first functional prototype of
eMelanoBase.
SYSTEMS AND METHODS
Software Development Process
The process was divided in four stages: inception, elabora-
tion, construction, and transition [Shaw, 2001]. In the
inception stage, the functional requirements of the system
and the types of users targeted were defined. In the elaboration
stage, use cases were written to model possible interactions
between different types of users and the various functions of
the system (Fig. 1). The software architecture to be
implemented was also determined at this stage. In the
construction stage, use cases were being converted into a
series of engineering diagrams [Boggs and Boggs, 1999] that
were used collectively as a guide for coding Java classes. The
transition stage focuses on how to bring the service to the
users. It includes software testing, code optimization, and
Web service hosting.
Implementation
In order to handle the complex mapping of every user’s
query to the appropriate visual presentation, the service was
developed based on the model-view-controller (MVC) design
pattern [Petschulat, 2001] (Fig. 2). Here the three sets of
objects are being implemented as Java beans and servlets.
Communication between servlets and database is make
possible by the Java database connectivity (JDBC) driver
[White et al., 2000].
The database schema was derived from functional depen-
dency analysis [Johnson, 1997] (Fig. 3). The core data includes
Contributor
1.Query data
3.Process data
4.Notify users
2.Return data
5.Submit data
3.1.Create record
3.2.Delete record
3.3.Update record
Curator
User
is-part-of
eMelanoBase
FIGURE 1. Use case diagram of eMelanoBase.The solid frame
represents the boundary of thesystem.Theovals represent the
functions of the system and the ball-and-stick ¢gures repre-
sent the roles that interact with the system.The numbering of
each function is for identi¢cation only, not the sequence of
execution.
FAMILIAL MELANOMA MUTATION DATABASE 3
gene symbols of disease loci, DNA variants, transcript and
translation changes, and citation. The ancillary data includes
functional changes, methods of detection, geographical asso-
ciation, and genetic epidemiology. In this article, the terms
‘‘entity’’ and ‘‘table’’ are used interchangeably in the context of
relational database, whereas the term ‘‘class’’ has the same
meaning as ‘‘entity’’ in the context of object-relational
database.
All software modeling was done using the Rational Rose
Academic Edition (Rational Corporation, Lexington, MA).
Java coding was done using Borland Jbuilder 5 Personal Edition
(Borland International, Scotts Valley, CA).
Deployment
The service was first developed and tested on a Linux 2.4/
dual Pentium III 700 workstation (Dell Corporation, Austin,
TX). It was then hosted on two Linux 2.4/dual Pentium III 500
servers (Dell Corporation, Austin, TX) in a master/slave
configuration. Each server was installed with Java Develop-
ment Kit 1.3 (Sun Microsystems, Palo Alto, CA), Apache
Tomcat-Jakarta 3.2 server [Apache Group, 2002], PostgreSQL
7.0 [PostgreSQL Global Development Group, 2002], and
JDBC 7.0-1.2 driver. The server behind the firewall stores the
master copy of eMelanoBase and provides Intranet service for
local users and the curator, whereas the one located outside the
firewall stores the slave copy and provides the Internet service
for international users. The database was distributed as a text
file in the structured query language (SQL) format.
RESULTS AND DISCUSSION
Database Statistics
The database currently contains 34 genomic
variants of CDKN2A which gave rise to 55 RNA
variants and 53 amino-acid variants. A majority of
variants are missense mutations (85%). Out of these,
60% are nucleotide transversions and 40% are
transitions. A total of 58% of all variants occurred
in exon 2. Of these, all but four variants alter the
polypeptide sequences of both p14ARF and
p16INK4A, which may impair their function. Cover-
age is currently being extended to include all
published genomic variants.
Software Design
The MVC design is an object-oriented pattern in
which the application is made up of three sets of
objects: model, view, and controller. The model
represents the data or application objects currently
being implemented as Java beans. They often map to
the entities in the database schema and contain
methods for accessing and modifying data respectively.
The view represents the visual presentation of the
model generated by Java servlets. The controller also
implemented as servlets represents the logic for
manipulating the model. It determines what database
transactions are required for populating the latter. The
greatest strength of MVC design is the complete
uncoupling of these three sets of objects allowing the
model to be reused. The model can be presented in
various views without being modified or a new
controller can be added without having to alter the
model [Goodwill, 2001]. Hence MVC design is very
suited to data-driven Web-based information systems
such as variant databases of multiple loci, especially if
they contain large datasets of genetic epidemiology
and citations, and genome databases, for example,
GDB. It is, however, unsuitable for Web sites that
require very few database transactions because they
contain mainly static content. In this case, Java Server
Pages can be used instead of servlets and embedding
the data querying methods as code fragments in the
appropriate pages will be sufficient [Petschulat, 2001].
Web Reporting
Queried data was returned to the user in Web pages
generated by the view servlets. Each page is an
instance of the class HTMLDocument, which inherits
the toString() method from the HTMLObject parent
class (Fig. 4). The toString() method in turn generates
the necessary HTML tags. There are two benefits to
this design. The first is reducing the two most
Controller
View
Model
DBMS
Servlets
Bean
Servlets
Web
browser
Jakarta-Tomcat Server
set()
get()
request
response
J
D
B
C
FIGURE 2. Schematic diagram of MVC architecture.The solid
arrows represent data £ow between components.The dotted
frame represents the Java Web server Jakarta-Tomcat, which
contains theJava servlets required for theWeb service.
DNAEventRNAEvent
PeptEvent
Gene GeneObj
GenEleObj
ProtEvent
Citation Expression
Kindred
GeoArea
Method
Ethnicity
Population
FIGURE 3. Database schema in UML. Each box is an entity or
a data class.The arrows in broken lines represent dependency
between classes. For example, Gene class depends on the ex-
istence of GeneObj and GenEleObj classes. Solid lines repre-
sent bi-directional relationships between classes.
4 FUNG ETAL.
common errors in HTML programming, i.e., missing
tags and broken tags. The second benefit is allowing
the developer to model the mechanism of assembling
various HTML components into a document by the
view servlets.
Data Access Optimization
Data access performance is optimized on the
database side and servlet side. On the database side,
two rules applied to the designing of SQL queries.
The first was no queries should contain more than
three table-joins because they are resource expensive.
The second was navigation between entities should
rely on PostgreSQL-generated object IDs (OID)
instead of user-defined primary and foreign keys.
eMelanoBase is therefore an object-relational data-
base. The intention is to limit resources spent on
creating and searching indices. Although they can
enhance the performance of DBMS engine, excessive
use of indices can put heavy demand on memory and
CPU usage. The difference between relational and
object-relational models is illustrated in Figure 5. To
query data related to the entity Gene using the
relational model (Fig. 5a), PostgreSQL would have to
generate six indices for three primary keys and three
foreign keys to facilitate table searches. In the object-
relational model (Fig. 5b), Gene is considered a
composite of three other classes—GeneObj, GenE-
leObj, and DnaEvent. Its attributes are OIDs that
reference these classes. Because an OID is automa-
tically generated and indexed by PostgreSQL for every
newly created record, OIDs form the only index
required for facilitating the same transaction.
On the servlet-side, a connection pool is used for
holding already opened JDBC connections, thereby
reducing the need to re-open and close them
repeatedly.
HostingTechnology
The biggest difference between server-side Java and
conventional http/cgi is single process-multithreading.
Each Java servlet is an in-memory object that can be
shared by multiple threads. Once a servlet is loaded, it
persists in memory while handling new HTTP
requests and will only be garbage collected by the
Java virtual machine if no more requests have been
received within a defined time span. This is much
more efficient than reloading the same cgi script for
each HTTP request, i.e., single process-single thread-
ing. The same property also allows multiple servlets to
access the same Java bean at run time instead of
executing database transactions repeatedly. The other
advantage of Java is the versatility of the Java
development kit. It provides interfaces for database
access, security features, network support, file access,
and other packages extending the functionality of
servlets.
Nomenclature
While most variants can be described in the latest
draft of HUGO nomenclature [den Dunnen and
Antonarakis, 2000], there are two exceptional cases
found in CDKN2A. The first case is the creation of a
new initiator by an AGG4ATG substitution within
the non-coding region of exon 1a (g.71G4T, record
ID: MGC00001) [Liu et al., 1999]. The consequential
translation event is AGG4Met at the 32nd nucleo-
tide upstream of the original initiator leading to the
translation of the 50
UTR of p16INK4A transcript to a
new N-terminal of 11 residues. We suggest describing
this event as r.-34_324p.M1. The second case is a
HTMLObject
#htmlObjects : Vector
+toHTML() : String
+HTMLObject()
HTMLDocument
HTMLTable
HTMLTableRow
FIGURE 4. Simpli¢ed class diagram of HTML document mod-
el.The solid arrows represent class inheritance. Lines with a
trapezoid end represent class containment.The HTMLDocu-
ment, HTMLTable, andHTMLTableRowclasses inherits theto-
String() method from HTMLObject class. The relationship
among the daughter classes is class containment where
HTMLDocument is a container of HTMLTable, which in turn
is a container of HTMLTableRow.
(B)
<<Persistent>>
GenEleObj
EleType:VARCHAR(10)
GenBank_UID:INT4
<<Persistent>>
GeneObj
Symbol:VARCHAR(20)
GeneName:VARCHAR(40)
ChrNo:VARCHAR(40)
Cytoband:VARCHAR(12)
OMIM:INT8
<<Persistent>>
Gene
GeneObject:GeneObj::OID
GeneElement:GenEleObj::OID
Variant:DnaEvent::OID
<<Persistent>>
DnaEvent
MutID:VARCHAR(8)
DnaChge:VARCHAR(20)
Status:VARCHAR(10)
Species:VARCHAR(30)
EventType:VARCHAR(40)
1 1
1
1
1..*
1..*
<<Persistent>>
GenEleObj
<<Primary key>>ID:INT2
EleType:VARCHAR(10)
GenBank_UID:INT4
<<Persistent>>
GeneObj
<<Primary key>>ID:INT2
Symbol:VARCHAR(20)
GeneName:VARCHAR(40)
ChrNo:VARCHAR(2)
Cytoband:VARCHAR(12)
OMIM:INT8
<<Persistent>>
Gene
<<Foreign key>>GeneObj_REF:INT2
<<Foreign key>>GenEleObj_REF:INT2
<<Foreign key>>DnaEvent_REF:VARCHAR(8)
<<Persistent>>
DnaEvent
<<Primary key>>ID:VARCHAR(8)
DnaChge:VARCHAR(20)
Status:VARCHAR(10)
Species:VARCHAR(30)
EventType:VARCHAR(40)
1..*
1..*
1 1
1
1
<<Persistent>>
GenEleObj
EleType:VARCHAR(10)
GenBank_UID:INT4
<<Persistent>>
GeneObj
Symbol:VARCHAR(20)
GeneName:VARCHAR(40)
ChrNo:VARCHAR(40)
Cytoband:VARCHAR(12)
OMIM:INT8
<<Persistent>>
Gene
GeneObject:GeneObj::OID
GeneElement:GenEleObj::OID
Variant:DnaEvent::OID
<<Persistent>>
DnaEvent
MutID:VARCHAR(8)
DnaChge:VARCHAR(20)
Status:VARCHAR(10)
Species:VARCHAR(30)
EventType:VARCHAR(40)
1 1
1
1
1..*
1..*
<<Persistent>>
GenEleObj
<<Primary key>>ID:INT2
EleType:VARCHAR(10)
GenBank_UID:INT4
<<Persistent>>
GeneObj
<<Primary key>>ID:INT2
Symbol:VARCHAR(20)
GeneName:VARCHAR(40)
ChrNo:VARCHAR(2)
Cytoband:VARCHAR(12)
OMIM:INT8
<<Persistent>>
Gene
<<Foreign key>>GeneObj_REF:INT2
<<Foreign key>>GenEleObj_REF:INT2
<<Foreign key>>DnaEvent_REF:VARCHAR(8)
<<Persistent>>
DnaEvent
<<Primary key>>ID:VARCHAR(8)
DnaChge:VARCHAR(20)
Status:VARCHAR(10)
Species:VARCHAR(30)
EventType:VARCHAR(40)
1..*
1..*
1 1
1
1
(A)
FIGURE 5. Object-based and relational schemas. a and b
show how the two models di¡er from each other using the
same four classes.
FAMILIAL MELANOMA MUTATION DATABASE 5
19-bp deletion in exon 2 (g.225-243del, record ID:
MGC00045), which gave rise to two translation
events [Gruis et al., 1995]. One is the out-of-frame
translation of p16INK4A transcript starting from
codon 76 resulting in a polypeptide of 138 residues
in length (p.A76fs139X). The other is the translation
of a p14ARF/p16INK4A chimera in which the N-
terminal portion consists of the first 90 codons of
p14ARF, and the C-terminal consists of codons 82–
156 of p16INK4A. We suggest describing the event as
p.1-90p14:82-156p16. We also suggest adding a new
classifier ‘‘complex, chimera’’ to the current EBI
mutation event controlled vocabulary for this event
type.
den Dunnen and Antonarakis [2000] recom-
mended counting the nucleotide ‘‘A’’ of the initiating
codon as the first nucleotide. We applied this
numbering system on RNA and cDNA variants but
still apply GenBank numbering on genomic variants.
This is because the initiator-based numbering system
allows users to quickly check from the ordered list of
RNA variants on the Web page, ListVarCard, as to
whether their newly detected point mutation has
already been published and curated. The GenBank
numbering system allows variants located within any
gene elements to be described in a uniform style, i.e.,
g.[nucleotide number][variant]. Hyperlinks to the
appropriate genomic and cDNA sequences are
provided on the Web page, DNAVarCard, for users’
reference.
Data Submission and Quality Control
At present, contributors within and without the
Melanoma Genetics Consortium can submit novel
data by electronic mail to a single curator located in
Westmead, Australia. The curator will check the
accuracy of the nucleotide position where the variant
occurs on the genomic and transcript levels, the
codon number where a substitution occurs, the
methods of detection, and details about the con-
tributor. If the contributor is a student, his/her
supervisor’s name is required. In the near future, a
submission form will be provided online with a
Javascript for form validation.
Data Security
The biggest advantage of the master/slave deploy-
ment is data protection. The firewall that separates
the servers is configured to allow data to be
transferred from the master to the slave but not vice
versa. If the data in the Internet server is corrupted, it
will be replaced with another copy duplicated from the
master. To prevent data corruption, contributors can
submit data only to the curator’s e-mail account
residing in a separate server. Data management
servlets will be available to the curator only on the
Intranet server. The shortcoming of this configuration
is that a network of servers is required, in this case,
three of them. It is not always an affordable solution
for small laboratories. The master database and Java
classes are also copied onto a magnetic disk as a
backup for the curator, so that the entire service can
be rebuilt even if both servers are compromised.
FUTURE DIRECTIONS
Table 1 shows the milestones that have been
completed and those yet to be completed. The
database is being progressively populated with first
the core data and then the auxiliary data on published
TABLE 1. Milestones
Milestones Tasks to perform Status
1. Implement data schema Deduce schema using functional dependency analysis. Build
schema in PostgreSQL on local workstation.
Completed
2. Implement proof-of-concept
prototype servlet layer
Implement servlet classes on local workstation. Completed
3. Populate database schema Populate database with core data on CDKN2A variants.
Populate the entities Gene, GeneObj, GenEleObj, DnaEvent,
and RnaEvent.
Completed
4. Alpha testing Host the prototype on test server for local user evaluation. Completed
5. Refactorisation Extract commonly shared methods among servlets into
abstract classes.
Completed
6. Beta release Host the functional prototype on Internet server for public
access.
Completed
7. Populate database schema Populate database with auxiliary data on CDKN2A
variants. Populate the entities Citation, GeoArea, and Kindred.
Pending
8. Implement servlets for accessing
population genetics data
Implement controller servlets for accessing the entities
GeoArea, Population and Kindred.
Implement view servlets for data presentation on test server.
Completed
9. Implement data management
tool
Implement servlets for creating and editing data records on
Intranet server.
Completed
10. Implement messaging servlet Implement servlets for automatic noti¢cation of database
updates.
Pending
11. Full release 001 Host the fully functional release on Internet server. Pending
6 FUNG ETAL.
CDKN2A and CDK4 germline variants under the
supervision of a subcommittee of the international
Melanoma Genetics Consortium [Kefford et al.,
1999]. Data on unpublished variants and novel loci
will be curated and added progressively. Eventually,
the database content will reflect the most current
genetic epidemiological profile of familial melanoma.
ACKNOWLEDGMENTS
We thank Christopher Liddle and Adrian Plummer
for technical advice on network security issues and
providing computing resources for hosting the service.
We thank Helen Rizos and members of the Melanoma
Genetics Consortium for their advice and assistance in
curation, and Jennifer Cruz for programming support.
REFERENCES
Apache Group. 2002. http://java.apache.org.
Bishop DT, Demenais F, Goldstein AM, Pollock P, Holland EA,
Gruis N, Harland M, Ghiorzo P, Platz A, Hansson J, Bianchi-
Scarra´ A, Bergman W, Bressac de Paillerets B, Mann GJ,
Hayward NK, Tucker MA, Newton Bishop J, Melanoma
Genetics Consortium. 2002. Geographical variation in
CDKN2A penetrance for melanoma. J Natl Cancer Inst
94:894–903.
Boggs M, Boggs W. 1999. Mastering UML with Rational Rose.
Alameda: Sybex Inc. p 1–30.
Box NF, Duffy DL, Chen W, Stark M, Martin NG, Sturm RA,
Hayward NK. 2001. MC1R genotype modifies risk of
melanoma in families segregating CDKN2A mutations.
Am J Hum Genet 69:765–773.
Cannon-Albright LA, Meyer LJ, Goldgar DE, Lewis CM,
McWhorter WP, Jost M, Harrison D, Anderson DE, Zone JJ,
Skolnick MH. 1994. Penetrance and expressivity of the
chromosome 9p melanoma susceptibility locus (MLM).
Cancer Res 54:6041–6044.
den Dunnen JT, Antonarakis S. 2000. Mutation nomenclature
extensions and suggestions to describe complex mutations:
a discussion. Hum Mutat 15:7–12.
Goodwill J. 2001. Developing Java Servlets, 2nd ed. Indiana-
polis: Sams Publishing. p 8–10.
Gruis NA, Sandkuijl LA, van der Velden PA, Bergman W,
Frants RR. 1995. CDKN2 explains part of the clinical
phenotype in Dutch familial atypical multiple-mole mela-
noma (FAMMM) syndrome families. Melanoma Res 5:
169–177.
Johnson JL. 1997. Database models, languages, design. Oxford:
Oxford University Press. p 745–795.
Kefford RF, Newton Bishop JA, Bergman W, Tucker MA. 1999.
Counseling and DNA testing for individuals perceived to be
genetically predisposed to melanoma: a consensus statement
of the Melanoma Genetics Consortium. J Clin Oncol
17:3245–3251.
Liu L, Dilworth D, Gao L, Monzon J, Summers A, Lassam N,
Hogg D. 1999. Mutation of the CDKN2A 5’UTR creates an
aberrant initiation codon and predisposes to melanoma.
Nat Genet 21:128–132.
Online Mendelian Inheritance in Man, OMIMt. 2000.
Baltimore, MD: McKusick-Nathans Institute for Genetic
Medicine, Johns Hopkins University; Bethesda, MD: Na-
tional Center for Biotechnology Information, National
Library of Medicine. www.ncbi.nlm.nih.gov/omim/.
Petschulat S. 2001. JSPs or servlets—which architecture is
right for you? Java Report 6:54–57.
PostgreSQL Global Development Group. 2002. http://post-
gresql.org.
Rizos H, Darmanian AP, Holland EA, Mann GJ, Kefford RF.
2001. Mutations in the INK4a/ARF melanoma susceptibility
locus functionally impair p14ARF. J Biol Chem 276:
41424–41434.
Shaw D. 2001. Extreme programming in context. Systems
Developer 2:20–26.
van der Velden PA, Sandkuijl LA, Bergman W, Pavel S, van
Mourik L, Frants RR, Gruis NA. 2001. Melanocortin-1
receptor variant R151C modifies melanoma risk in
Dutch families with melanoma. Am J Hum Genet 69:
774–779.
Walker GJ, Hayward NK. 2002. p16INK4A and p14ARF
tumour suppressors in melanoma: lessons from the mouse.
Lancet 359:7–8.
White S, Fisher M, Cattell R, Hamilton G, Hapner M. 2000.
JDBC API tutorial and reference. Boston: Addison-Wesley.
p 46–127.
FAMILIAL MELANOMA MUTATION DATABASE 7
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.

More Related Content

What's hot

Advancement of molecular markers and crop improvement in plant breeding
Advancement of molecular markers and crop improvement in plant breedingAdvancement of molecular markers and crop improvement in plant breeding
Advancement of molecular markers and crop improvement in plant breedingPARTNER, BADC, World Bank
 
Marker assisted selection
Marker assisted selectionMarker assisted selection
Marker assisted selectionFAO
 
Mapping and QTL
Mapping and QTLMapping and QTL
Mapping and QTLFAO
 
MARKER-ASSISTED BREEDING FOR RICE IMPROVEMENT
MARKER-ASSISTED BREEDING FOR RICE IMPROVEMENTMARKER-ASSISTED BREEDING FOR RICE IMPROVEMENT
MARKER-ASSISTED BREEDING FOR RICE IMPROVEMENTFOODCROPS
 
Approaches to apply mas in plant breeding
Approaches to apply mas in plant breedingApproaches to apply mas in plant breeding
Approaches to apply mas in plant breedingsudha2555
 
Marker assisted whole genome selection in crop improvement
Marker assisted whole genome     selection in crop improvementMarker assisted whole genome     selection in crop improvement
Marker assisted whole genome selection in crop improvementSenthil Natesan
 
Nucleic Acids (DNA/RNA) as Nanoparticles Structures for siRNA Delivery Medica...
Nucleic Acids (DNA/RNA) as Nanoparticles Structures for siRNA Delivery Medica...Nucleic Acids (DNA/RNA) as Nanoparticles Structures for siRNA Delivery Medica...
Nucleic Acids (DNA/RNA) as Nanoparticles Structures for siRNA Delivery Medica...Al Baha University
 
Marker assisted selection for complex traits in agricultural crops
Marker assisted selection for complex traits in agricultural cropsMarker assisted selection for complex traits in agricultural crops
Marker assisted selection for complex traits in agricultural cropsAparna Veluru
 
Molecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010xMolecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010xFOODCROPS
 
Molecular markers and Functional molecular markers
Molecular markers and Functional molecular markersMolecular markers and Functional molecular markers
Molecular markers and Functional molecular markersChandana B.R.
 
Marker assisted selection in plants
Marker assisted selection in plantsMarker assisted selection in plants
Marker assisted selection in plantsiqraakbar8
 
Marker assissted selection
Marker assissted selectionMarker assissted selection
Marker assissted selectionmuzamil ahmad
 
Marker Assisted Selection in Crop Breeding
 Marker Assisted Selection in Crop Breeding Marker Assisted Selection in Crop Breeding
Marker Assisted Selection in Crop BreedingPawan Chauhan
 

What's hot (20)

Conference Talk BioSB 2015
Conference Talk BioSB 2015Conference Talk BioSB 2015
Conference Talk BioSB 2015
 
Advancement of molecular markers and crop improvement in plant breeding
Advancement of molecular markers and crop improvement in plant breedingAdvancement of molecular markers and crop improvement in plant breeding
Advancement of molecular markers and crop improvement in plant breeding
 
Marker assisted selection
Marker assisted selectionMarker assisted selection
Marker assisted selection
 
MAS
MASMAS
MAS
 
marker assisted selection
marker assisted selectionmarker assisted selection
marker assisted selection
 
GBI2016_Cantone
GBI2016_CantoneGBI2016_Cantone
GBI2016_Cantone
 
marker assisted selection
marker assisted selectionmarker assisted selection
marker assisted selection
 
Mapping and QTL
Mapping and QTLMapping and QTL
Mapping and QTL
 
MARKER-ASSISTED BREEDING FOR RICE IMPROVEMENT
MARKER-ASSISTED BREEDING FOR RICE IMPROVEMENTMARKER-ASSISTED BREEDING FOR RICE IMPROVEMENT
MARKER-ASSISTED BREEDING FOR RICE IMPROVEMENT
 
MARKER ASSISTED SELECTION
MARKER ASSISTED SELECTIONMARKER ASSISTED SELECTION
MARKER ASSISTED SELECTION
 
Approaches to apply mas in plant breeding
Approaches to apply mas in plant breedingApproaches to apply mas in plant breeding
Approaches to apply mas in plant breeding
 
Marker assisted whole genome selection in crop improvement
Marker assisted whole genome     selection in crop improvementMarker assisted whole genome     selection in crop improvement
Marker assisted whole genome selection in crop improvement
 
Nucleic Acids (DNA/RNA) as Nanoparticles Structures for siRNA Delivery Medica...
Nucleic Acids (DNA/RNA) as Nanoparticles Structures for siRNA Delivery Medica...Nucleic Acids (DNA/RNA) as Nanoparticles Structures for siRNA Delivery Medica...
Nucleic Acids (DNA/RNA) as Nanoparticles Structures for siRNA Delivery Medica...
 
Marker assisted selection for complex traits in agricultural crops
Marker assisted selection for complex traits in agricultural cropsMarker assisted selection for complex traits in agricultural crops
Marker assisted selection for complex traits in agricultural crops
 
QTL mapping for crop improvement
QTL mapping for crop improvementQTL mapping for crop improvement
QTL mapping for crop improvement
 
Molecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010xMolecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010x
 
Molecular markers and Functional molecular markers
Molecular markers and Functional molecular markersMolecular markers and Functional molecular markers
Molecular markers and Functional molecular markers
 
Marker assisted selection in plants
Marker assisted selection in plantsMarker assisted selection in plants
Marker assisted selection in plants
 
Marker assissted selection
Marker assissted selectionMarker assissted selection
Marker assissted selection
 
Marker Assisted Selection in Crop Breeding
 Marker Assisted Selection in Crop Breeding Marker Assisted Selection in Crop Breeding
Marker Assisted Selection in Crop Breeding
 

Similar to DF-HMe-melanobase2002

SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...csandit
 
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...cscpconf
 
Deep Learning for Leukemia Detection: A MobileNetV2-Based Approach for Accura...
Deep Learning for Leukemia Detection: A MobileNetV2-Based Approach for Accura...Deep Learning for Leukemia Detection: A MobileNetV2-Based Approach for Accura...
Deep Learning for Leukemia Detection: A MobileNetV2-Based Approach for Accura...IRJET Journal
 
Serine Integrases in Genetic Circuit Design
Serine Integrases in Genetic Circuit DesignSerine Integrases in Genetic Circuit Design
Serine Integrases in Genetic Circuit DesignDylan MacPhail
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Solutions
 
Next generation sequencing by Muhammad Abbas
Next generation sequencing by Muhammad AbbasNext generation sequencing by Muhammad Abbas
Next generation sequencing by Muhammad AbbasMuhammadAbbaskhan9
 
RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationIJAEMSJORNAL
 
Mutation Prediction for Coronaviruses using Genome Sequence and Recurrent Neu...
Mutation Prediction for Coronaviruses using Genome Sequence and Recurrent Neu...Mutation Prediction for Coronaviruses using Genome Sequence and Recurrent Neu...
Mutation Prediction for Coronaviruses using Genome Sequence and Recurrent Neu...Christo Ananth
 
CRISPRCas9 Gene Therapy Delivery Strategies.pdf
CRISPRCas9 Gene Therapy Delivery Strategies.pdfCRISPRCas9 Gene Therapy Delivery Strategies.pdf
CRISPRCas9 Gene Therapy Delivery Strategies.pdfDoriaFang
 
De novo transcriptome assembly of solid sequencing data in cucumis melo
De novo transcriptome assembly of solid sequencing data in cucumis meloDe novo transcriptome assembly of solid sequencing data in cucumis melo
De novo transcriptome assembly of solid sequencing data in cucumis melobioejjournal
 
DE NOVO TRANSCRIPTOME ASSEMBLY OF SOLID SEQUENCING DATA IN CUCUMIS MELO
DE NOVO TRANSCRIPTOME ASSEMBLY OF SOLID SEQUENCING DATA IN CUCUMIS MELODE NOVO TRANSCRIPTOME ASSEMBLY OF SOLID SEQUENCING DATA IN CUCUMIS MELO
DE NOVO TRANSCRIPTOME ASSEMBLY OF SOLID SEQUENCING DATA IN CUCUMIS MELObioejjournal
 
Corrected 2e-5
Corrected 2e-5Corrected 2e-5
Corrected 2e-5Dago Noel
 
Corrected 2e-5
Corrected 2e-5Corrected 2e-5
Corrected 2e-5Dago Noel
 
New foreground markers for Drosophila cell segmentation using marker-controll...
New foreground markers for Drosophila cell segmentation using marker-controll...New foreground markers for Drosophila cell segmentation using marker-controll...
New foreground markers for Drosophila cell segmentation using marker-controll...IJECEIAES
 
Effects of splicing mutations on NF2 transcripts
Effects of splicing mutations on NF2 transcriptsEffects of splicing mutations on NF2 transcripts
Effects of splicing mutations on NF2 transcriptsBianca Heinrich
 
Prognostic Value of LINC01600 and CASC15 as Competitive Endogenous RNAs in Lu...
Prognostic Value of LINC01600 and CASC15 as Competitive Endogenous RNAs in Lu...Prognostic Value of LINC01600 and CASC15 as Competitive Endogenous RNAs in Lu...
Prognostic Value of LINC01600 and CASC15 as Competitive Endogenous RNAs in Lu...daranisaha
 
Cimetta et al., 2013
Cimetta et al., 2013Cimetta et al., 2013
Cimetta et al., 2013Fran Flores
 

Similar to DF-HMe-melanobase2002 (20)

1207.2600
1207.26001207.2600
1207.2600
 
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...
 
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...
 
Deep Learning for Leukemia Detection: A MobileNetV2-Based Approach for Accura...
Deep Learning for Leukemia Detection: A MobileNetV2-Based Approach for Accura...Deep Learning for Leukemia Detection: A MobileNetV2-Based Approach for Accura...
Deep Learning for Leukemia Detection: A MobileNetV2-Based Approach for Accura...
 
Serine Integrases in Genetic Circuit Design
Serine Integrases in Genetic Circuit DesignSerine Integrases in Genetic Circuit Design
Serine Integrases in Genetic Circuit Design
 
Qi liu 08.08.2014
Qi liu 08.08.2014Qi liu 08.08.2014
Qi liu 08.08.2014
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic Solutions
 
Next generation sequencing by Muhammad Abbas
Next generation sequencing by Muhammad AbbasNext generation sequencing by Muhammad Abbas
Next generation sequencing by Muhammad Abbas
 
RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferation
 
Mutation Prediction for Coronaviruses using Genome Sequence and Recurrent Neu...
Mutation Prediction for Coronaviruses using Genome Sequence and Recurrent Neu...Mutation Prediction for Coronaviruses using Genome Sequence and Recurrent Neu...
Mutation Prediction for Coronaviruses using Genome Sequence and Recurrent Neu...
 
CRISPRCas9 Gene Therapy Delivery Strategies.pdf
CRISPRCas9 Gene Therapy Delivery Strategies.pdfCRISPRCas9 Gene Therapy Delivery Strategies.pdf
CRISPRCas9 Gene Therapy Delivery Strategies.pdf
 
De novo transcriptome assembly of solid sequencing data in cucumis melo
De novo transcriptome assembly of solid sequencing data in cucumis meloDe novo transcriptome assembly of solid sequencing data in cucumis melo
De novo transcriptome assembly of solid sequencing data in cucumis melo
 
DE NOVO TRANSCRIPTOME ASSEMBLY OF SOLID SEQUENCING DATA IN CUCUMIS MELO
DE NOVO TRANSCRIPTOME ASSEMBLY OF SOLID SEQUENCING DATA IN CUCUMIS MELODE NOVO TRANSCRIPTOME ASSEMBLY OF SOLID SEQUENCING DATA IN CUCUMIS MELO
DE NOVO TRANSCRIPTOME ASSEMBLY OF SOLID SEQUENCING DATA IN CUCUMIS MELO
 
Corrected 2e-5
Corrected 2e-5Corrected 2e-5
Corrected 2e-5
 
Corrected 2e-5
Corrected 2e-5Corrected 2e-5
Corrected 2e-5
 
New foreground markers for Drosophila cell segmentation using marker-controll...
New foreground markers for Drosophila cell segmentation using marker-controll...New foreground markers for Drosophila cell segmentation using marker-controll...
New foreground markers for Drosophila cell segmentation using marker-controll...
 
Effects of splicing mutations on NF2 transcripts
Effects of splicing mutations on NF2 transcriptsEffects of splicing mutations on NF2 transcripts
Effects of splicing mutations on NF2 transcripts
 
Seminar on crispr
Seminar on crisprSeminar on crispr
Seminar on crispr
 
Prognostic Value of LINC01600 and CASC15 as Competitive Endogenous RNAs in Lu...
Prognostic Value of LINC01600 and CASC15 as Competitive Endogenous RNAs in Lu...Prognostic Value of LINC01600 and CASC15 as Competitive Endogenous RNAs in Lu...
Prognostic Value of LINC01600 and CASC15 as Competitive Endogenous RNAs in Lu...
 
Cimetta et al., 2013
Cimetta et al., 2013Cimetta et al., 2013
Cimetta et al., 2013
 

DF-HMe-melanobase2002

  • 1. DATABASES eMelanoBase: An Online Locus-Specific Variant Database for Familial Melanoma David C.Y. Fung,1 Elizabeth A. Holland,1 Therese M. Becker,1 Nicholas K. Hayward,2 Brigitte Bressac-de Paillerets,3 Melanoma Genetics Consortium, and Graham J. Mann1n 1 Westmead Institute for Cancer Research, University of Sydney, Westmead Millennium Institute, Westmead, NSW, Australia; 2 Queensland Institute of Medical Research, Brisbane, Australia; 3 Institut Gustave Roussy, Villejuif, France Communicated by David N. Cooper A proportion of melanoma-prone individuals in both familial and non-familial contexts has been shown to carry inactivating mutations in either CDKN2A or, rarely, CDK4. CDKN2A is a complex locus that encodes two unrelated proteins from alternately spliced transcripts that are read in different frames. The alpha transcript (exons 1a, 2, and 3) produces the p16INK4A cyclin-dependent kinase inhibitor, while the beta transcript (exons 1b and 2) is translated as p14ARF, a stabilizing factor of p53 levels through binding to MDM2. Mutations in exon 2 can impair both polypeptides and insertions and deletions in exons 1a, 1b, and 2, which can theoretically generate p16INK4A-p14ARF fusion proteins. No online database currently takes into account all the consequences of these genotypes, a situation compounded by some problematic previous annotations of CDKN2A-related sequences and descriptions of their mutations. As an initiative of the international Melanoma Genetics Consortium, we have therefore established a database of germline variants observed in all loci implicated in familial melanoma susceptibility. Such a comprehensive, publicly accessible database is an essential foundation for research on melanoma susceptibility and its clinical application. Our database serves two types of data as defined by HUGO. The core dataset includes the nucleotide variants on the genomic and transcript levels, amino acid variants, and citation. The ancillary dataset includes keyword description of events at the transcription and translation levels and epidemiological data. The application that handles users’ queries was designed in the model-view-controller architecture and was implemented in Java. The object- relational database schema was deduced using functional dependency analysis. We hereby present our first functional prototype of eMelanoBase. The service is accessible via the URL www.wmi.usyd.e- du.au:8080/melanoma.html. Hum Mutat 21:2–7, 2002. r 2002 Wiley-Liss, Inc. KEY WORDS: melanoma, familial; database; germline variants; model-view-controller architecture; CDKN2A; CDK4 DATABASES: CDKN2A – OMIM: 600160, 155601 (melanoma, familial); GDB: 335362; GenBank: L41934, U12818-20; HGMD: CDKN2A CDK4 – OMIM: 123829; GDB: 204022; GenBank: U37022; HGMD: CDK4 www.wmi.usyd.edu.au:8080/melanoma.html (eMelanoBase) INTRODUCTION A proportion of melanoma-prone individuals in both familial (MIM# 155601) and non-familial contexts has been shown to carry inactivating mutations in either CDKN2A (MIM# 600160) or, rarely, CDK4 (MIM# 123829). CDKN2A is a locus with dual functionality, encoding two unrelated proteins in distinct reading frames. The alpha transcript (exons 1a, 2, and 3) produces the p16INK4A cyclin-dependent kinase inhibitor known to inhibit cyclinD1/CDK4 or CDK6 complexes. The Received 4 June 2002; accepted revised manuscript16 Septem- ber 2002. n Correspondence to: Graham J. Mann, M.D., Ph.D.,Westmead Institute for Cancer Research, Westmead Millennium Institute, Darcy Rd.,Westmead, NSW 2145, Australia. E-mail: gmann@mail.usyd.edu.au Grant sponsor: Australian National Health and Medical Research Council; Grant numbers:991331;211172. DOI10.1002/humu.10149 Published online in Wiley InterScience (www.interscience.wiley. com). rr2002 WILEY-LISS, INC. HUMAN MUTATION 21:2^7 (2002)
  • 2. beta transcript (exons 1b and 2) produces the protein p14ARF, which is a positive regulator of p53 levels through binding to MDM2. Germline mutations have been observed in mela- noma-susceptible individuals that affect all exons, except exon 3; however, those uniquely affecting p14ARF are rare, despite clear evidence from animal knockout models that it is a potent tumor suppressor gene and melanoma susceptibility locus in its own right [Walker and Hayward, 2002]. Because of the dual encoding by exon 2 (the shared exon 3 is beyond the translation stop codon of p14ARF), mutations in that exon can impair both the p16INK4A and p14ARF polypeptides [Rizos et al., 2001]. Further- more, frameshifts caused by insertions and deletions in exons 1a, 1b, or 2 can, in theory, generate p16INK4A-p14ARF fusion proteins whose functions are yet to be understood. Despite the clear association of these loci with inherited susceptibility to melanoma, there is signifi- cant uncertainty in the degree of risk conferred by variant alleles [Kefford et al., 1999]. It is known to be modified by prior sun exposure and by red-hair associated variant alleles of the melanocortin receptor (MC1R) [Cannon-Albright et al., 1994; van der Velden et al., 2001; Box et al., 2001]. Geographic location is a major determinant of penetrance of CDKN2A mutations but it is not yet clear to what extent this reflects sun exposure or ethnicity-asso- ciated pigmentation [Bishop et al., 2002]. The international Melanoma Genetics Consortium was founded in 1997 to facilitate such research and to develop guidelines for the use of genetic information in risk assessment [Kefford et al., 1999]. A compre- hensive, publicly accessible database of germline variants in genes that influence melanoma suscept- ibility is an essential foundation for such research and its clinical application. There are currently three data services available online that present information about CDKN2A mutations. Twelve p16INK4A and two CDK4 cDNA mutations are described in Online Mendelian In- heritance in Man [2000]. Forty-eight p16INK4A and two CDK4 cDNA mutations are being presented in the Human Genetics Mutation Database (HGMD, Cardiff). A list of 148 p16INK4A mutations are provided by the Radium Hospital (Norway) as a downloadable Excel spreadsheet. None of them take into account all the consequences of these genotypes, especially those that impact on p14ARF function. This situation has been compounded by some problematic previous annotations of CDKN2A-re- lated sequences and descriptions of their mutations including reliance on initially published cDNA sequences that were incomplete at the N-terminus. Somatic variants from cell lines and primary tumors are also included in two of the services. Such data are more difficult to interpret because they do not provide direct genetic evidence of a relationship with cancer susceptibility. We therefore established a database of germline variants observed in all loci implicated in familial melanoma susceptibility. Its purpose is to collate information for each reported genotype on its molecular outcomes, biological effects, methods of detection, and population genetics. As an initiative of the international Melanoma Genetics Consortium, we hereby describe our first functional prototype of eMelanoBase. SYSTEMS AND METHODS Software Development Process The process was divided in four stages: inception, elabora- tion, construction, and transition [Shaw, 2001]. In the inception stage, the functional requirements of the system and the types of users targeted were defined. In the elaboration stage, use cases were written to model possible interactions between different types of users and the various functions of the system (Fig. 1). The software architecture to be implemented was also determined at this stage. In the construction stage, use cases were being converted into a series of engineering diagrams [Boggs and Boggs, 1999] that were used collectively as a guide for coding Java classes. The transition stage focuses on how to bring the service to the users. It includes software testing, code optimization, and Web service hosting. Implementation In order to handle the complex mapping of every user’s query to the appropriate visual presentation, the service was developed based on the model-view-controller (MVC) design pattern [Petschulat, 2001] (Fig. 2). Here the three sets of objects are being implemented as Java beans and servlets. Communication between servlets and database is make possible by the Java database connectivity (JDBC) driver [White et al., 2000]. The database schema was derived from functional depen- dency analysis [Johnson, 1997] (Fig. 3). The core data includes Contributor 1.Query data 3.Process data 4.Notify users 2.Return data 5.Submit data 3.1.Create record 3.2.Delete record 3.3.Update record Curator User is-part-of eMelanoBase FIGURE 1. Use case diagram of eMelanoBase.The solid frame represents the boundary of thesystem.Theovals represent the functions of the system and the ball-and-stick ¢gures repre- sent the roles that interact with the system.The numbering of each function is for identi¢cation only, not the sequence of execution. FAMILIAL MELANOMA MUTATION DATABASE 3
  • 3. gene symbols of disease loci, DNA variants, transcript and translation changes, and citation. The ancillary data includes functional changes, methods of detection, geographical asso- ciation, and genetic epidemiology. In this article, the terms ‘‘entity’’ and ‘‘table’’ are used interchangeably in the context of relational database, whereas the term ‘‘class’’ has the same meaning as ‘‘entity’’ in the context of object-relational database. All software modeling was done using the Rational Rose Academic Edition (Rational Corporation, Lexington, MA). Java coding was done using Borland Jbuilder 5 Personal Edition (Borland International, Scotts Valley, CA). Deployment The service was first developed and tested on a Linux 2.4/ dual Pentium III 700 workstation (Dell Corporation, Austin, TX). It was then hosted on two Linux 2.4/dual Pentium III 500 servers (Dell Corporation, Austin, TX) in a master/slave configuration. Each server was installed with Java Develop- ment Kit 1.3 (Sun Microsystems, Palo Alto, CA), Apache Tomcat-Jakarta 3.2 server [Apache Group, 2002], PostgreSQL 7.0 [PostgreSQL Global Development Group, 2002], and JDBC 7.0-1.2 driver. The server behind the firewall stores the master copy of eMelanoBase and provides Intranet service for local users and the curator, whereas the one located outside the firewall stores the slave copy and provides the Internet service for international users. The database was distributed as a text file in the structured query language (SQL) format. RESULTS AND DISCUSSION Database Statistics The database currently contains 34 genomic variants of CDKN2A which gave rise to 55 RNA variants and 53 amino-acid variants. A majority of variants are missense mutations (85%). Out of these, 60% are nucleotide transversions and 40% are transitions. A total of 58% of all variants occurred in exon 2. Of these, all but four variants alter the polypeptide sequences of both p14ARF and p16INK4A, which may impair their function. Cover- age is currently being extended to include all published genomic variants. Software Design The MVC design is an object-oriented pattern in which the application is made up of three sets of objects: model, view, and controller. The model represents the data or application objects currently being implemented as Java beans. They often map to the entities in the database schema and contain methods for accessing and modifying data respectively. The view represents the visual presentation of the model generated by Java servlets. The controller also implemented as servlets represents the logic for manipulating the model. It determines what database transactions are required for populating the latter. The greatest strength of MVC design is the complete uncoupling of these three sets of objects allowing the model to be reused. The model can be presented in various views without being modified or a new controller can be added without having to alter the model [Goodwill, 2001]. Hence MVC design is very suited to data-driven Web-based information systems such as variant databases of multiple loci, especially if they contain large datasets of genetic epidemiology and citations, and genome databases, for example, GDB. It is, however, unsuitable for Web sites that require very few database transactions because they contain mainly static content. In this case, Java Server Pages can be used instead of servlets and embedding the data querying methods as code fragments in the appropriate pages will be sufficient [Petschulat, 2001]. Web Reporting Queried data was returned to the user in Web pages generated by the view servlets. Each page is an instance of the class HTMLDocument, which inherits the toString() method from the HTMLObject parent class (Fig. 4). The toString() method in turn generates the necessary HTML tags. There are two benefits to this design. The first is reducing the two most Controller View Model DBMS Servlets Bean Servlets Web browser Jakarta-Tomcat Server set() get() request response J D B C FIGURE 2. Schematic diagram of MVC architecture.The solid arrows represent data £ow between components.The dotted frame represents the Java Web server Jakarta-Tomcat, which contains theJava servlets required for theWeb service. DNAEventRNAEvent PeptEvent Gene GeneObj GenEleObj ProtEvent Citation Expression Kindred GeoArea Method Ethnicity Population FIGURE 3. Database schema in UML. Each box is an entity or a data class.The arrows in broken lines represent dependency between classes. For example, Gene class depends on the ex- istence of GeneObj and GenEleObj classes. Solid lines repre- sent bi-directional relationships between classes. 4 FUNG ETAL.
  • 4. common errors in HTML programming, i.e., missing tags and broken tags. The second benefit is allowing the developer to model the mechanism of assembling various HTML components into a document by the view servlets. Data Access Optimization Data access performance is optimized on the database side and servlet side. On the database side, two rules applied to the designing of SQL queries. The first was no queries should contain more than three table-joins because they are resource expensive. The second was navigation between entities should rely on PostgreSQL-generated object IDs (OID) instead of user-defined primary and foreign keys. eMelanoBase is therefore an object-relational data- base. The intention is to limit resources spent on creating and searching indices. Although they can enhance the performance of DBMS engine, excessive use of indices can put heavy demand on memory and CPU usage. The difference between relational and object-relational models is illustrated in Figure 5. To query data related to the entity Gene using the relational model (Fig. 5a), PostgreSQL would have to generate six indices for three primary keys and three foreign keys to facilitate table searches. In the object- relational model (Fig. 5b), Gene is considered a composite of three other classes—GeneObj, GenE- leObj, and DnaEvent. Its attributes are OIDs that reference these classes. Because an OID is automa- tically generated and indexed by PostgreSQL for every newly created record, OIDs form the only index required for facilitating the same transaction. On the servlet-side, a connection pool is used for holding already opened JDBC connections, thereby reducing the need to re-open and close them repeatedly. HostingTechnology The biggest difference between server-side Java and conventional http/cgi is single process-multithreading. Each Java servlet is an in-memory object that can be shared by multiple threads. Once a servlet is loaded, it persists in memory while handling new HTTP requests and will only be garbage collected by the Java virtual machine if no more requests have been received within a defined time span. This is much more efficient than reloading the same cgi script for each HTTP request, i.e., single process-single thread- ing. The same property also allows multiple servlets to access the same Java bean at run time instead of executing database transactions repeatedly. The other advantage of Java is the versatility of the Java development kit. It provides interfaces for database access, security features, network support, file access, and other packages extending the functionality of servlets. Nomenclature While most variants can be described in the latest draft of HUGO nomenclature [den Dunnen and Antonarakis, 2000], there are two exceptional cases found in CDKN2A. The first case is the creation of a new initiator by an AGG4ATG substitution within the non-coding region of exon 1a (g.71G4T, record ID: MGC00001) [Liu et al., 1999]. The consequential translation event is AGG4Met at the 32nd nucleo- tide upstream of the original initiator leading to the translation of the 50 UTR of p16INK4A transcript to a new N-terminal of 11 residues. We suggest describing this event as r.-34_324p.M1. The second case is a HTMLObject #htmlObjects : Vector +toHTML() : String +HTMLObject() HTMLDocument HTMLTable HTMLTableRow FIGURE 4. Simpli¢ed class diagram of HTML document mod- el.The solid arrows represent class inheritance. Lines with a trapezoid end represent class containment.The HTMLDocu- ment, HTMLTable, andHTMLTableRowclasses inherits theto- String() method from HTMLObject class. The relationship among the daughter classes is class containment where HTMLDocument is a container of HTMLTable, which in turn is a container of HTMLTableRow. (B) <<Persistent>> GenEleObj EleType:VARCHAR(10) GenBank_UID:INT4 <<Persistent>> GeneObj Symbol:VARCHAR(20) GeneName:VARCHAR(40) ChrNo:VARCHAR(40) Cytoband:VARCHAR(12) OMIM:INT8 <<Persistent>> Gene GeneObject:GeneObj::OID GeneElement:GenEleObj::OID Variant:DnaEvent::OID <<Persistent>> DnaEvent MutID:VARCHAR(8) DnaChge:VARCHAR(20) Status:VARCHAR(10) Species:VARCHAR(30) EventType:VARCHAR(40) 1 1 1 1 1..* 1..* <<Persistent>> GenEleObj <<Primary key>>ID:INT2 EleType:VARCHAR(10) GenBank_UID:INT4 <<Persistent>> GeneObj <<Primary key>>ID:INT2 Symbol:VARCHAR(20) GeneName:VARCHAR(40) ChrNo:VARCHAR(2) Cytoband:VARCHAR(12) OMIM:INT8 <<Persistent>> Gene <<Foreign key>>GeneObj_REF:INT2 <<Foreign key>>GenEleObj_REF:INT2 <<Foreign key>>DnaEvent_REF:VARCHAR(8) <<Persistent>> DnaEvent <<Primary key>>ID:VARCHAR(8) DnaChge:VARCHAR(20) Status:VARCHAR(10) Species:VARCHAR(30) EventType:VARCHAR(40) 1..* 1..* 1 1 1 1 <<Persistent>> GenEleObj EleType:VARCHAR(10) GenBank_UID:INT4 <<Persistent>> GeneObj Symbol:VARCHAR(20) GeneName:VARCHAR(40) ChrNo:VARCHAR(40) Cytoband:VARCHAR(12) OMIM:INT8 <<Persistent>> Gene GeneObject:GeneObj::OID GeneElement:GenEleObj::OID Variant:DnaEvent::OID <<Persistent>> DnaEvent MutID:VARCHAR(8) DnaChge:VARCHAR(20) Status:VARCHAR(10) Species:VARCHAR(30) EventType:VARCHAR(40) 1 1 1 1 1..* 1..* <<Persistent>> GenEleObj <<Primary key>>ID:INT2 EleType:VARCHAR(10) GenBank_UID:INT4 <<Persistent>> GeneObj <<Primary key>>ID:INT2 Symbol:VARCHAR(20) GeneName:VARCHAR(40) ChrNo:VARCHAR(2) Cytoband:VARCHAR(12) OMIM:INT8 <<Persistent>> Gene <<Foreign key>>GeneObj_REF:INT2 <<Foreign key>>GenEleObj_REF:INT2 <<Foreign key>>DnaEvent_REF:VARCHAR(8) <<Persistent>> DnaEvent <<Primary key>>ID:VARCHAR(8) DnaChge:VARCHAR(20) Status:VARCHAR(10) Species:VARCHAR(30) EventType:VARCHAR(40) 1..* 1..* 1 1 1 1 (A) FIGURE 5. Object-based and relational schemas. a and b show how the two models di¡er from each other using the same four classes. FAMILIAL MELANOMA MUTATION DATABASE 5
  • 5. 19-bp deletion in exon 2 (g.225-243del, record ID: MGC00045), which gave rise to two translation events [Gruis et al., 1995]. One is the out-of-frame translation of p16INK4A transcript starting from codon 76 resulting in a polypeptide of 138 residues in length (p.A76fs139X). The other is the translation of a p14ARF/p16INK4A chimera in which the N- terminal portion consists of the first 90 codons of p14ARF, and the C-terminal consists of codons 82– 156 of p16INK4A. We suggest describing the event as p.1-90p14:82-156p16. We also suggest adding a new classifier ‘‘complex, chimera’’ to the current EBI mutation event controlled vocabulary for this event type. den Dunnen and Antonarakis [2000] recom- mended counting the nucleotide ‘‘A’’ of the initiating codon as the first nucleotide. We applied this numbering system on RNA and cDNA variants but still apply GenBank numbering on genomic variants. This is because the initiator-based numbering system allows users to quickly check from the ordered list of RNA variants on the Web page, ListVarCard, as to whether their newly detected point mutation has already been published and curated. The GenBank numbering system allows variants located within any gene elements to be described in a uniform style, i.e., g.[nucleotide number][variant]. Hyperlinks to the appropriate genomic and cDNA sequences are provided on the Web page, DNAVarCard, for users’ reference. Data Submission and Quality Control At present, contributors within and without the Melanoma Genetics Consortium can submit novel data by electronic mail to a single curator located in Westmead, Australia. The curator will check the accuracy of the nucleotide position where the variant occurs on the genomic and transcript levels, the codon number where a substitution occurs, the methods of detection, and details about the con- tributor. If the contributor is a student, his/her supervisor’s name is required. In the near future, a submission form will be provided online with a Javascript for form validation. Data Security The biggest advantage of the master/slave deploy- ment is data protection. The firewall that separates the servers is configured to allow data to be transferred from the master to the slave but not vice versa. If the data in the Internet server is corrupted, it will be replaced with another copy duplicated from the master. To prevent data corruption, contributors can submit data only to the curator’s e-mail account residing in a separate server. Data management servlets will be available to the curator only on the Intranet server. The shortcoming of this configuration is that a network of servers is required, in this case, three of them. It is not always an affordable solution for small laboratories. The master database and Java classes are also copied onto a magnetic disk as a backup for the curator, so that the entire service can be rebuilt even if both servers are compromised. FUTURE DIRECTIONS Table 1 shows the milestones that have been completed and those yet to be completed. The database is being progressively populated with first the core data and then the auxiliary data on published TABLE 1. Milestones Milestones Tasks to perform Status 1. Implement data schema Deduce schema using functional dependency analysis. Build schema in PostgreSQL on local workstation. Completed 2. Implement proof-of-concept prototype servlet layer Implement servlet classes on local workstation. Completed 3. Populate database schema Populate database with core data on CDKN2A variants. Populate the entities Gene, GeneObj, GenEleObj, DnaEvent, and RnaEvent. Completed 4. Alpha testing Host the prototype on test server for local user evaluation. Completed 5. Refactorisation Extract commonly shared methods among servlets into abstract classes. Completed 6. Beta release Host the functional prototype on Internet server for public access. Completed 7. Populate database schema Populate database with auxiliary data on CDKN2A variants. Populate the entities Citation, GeoArea, and Kindred. Pending 8. Implement servlets for accessing population genetics data Implement controller servlets for accessing the entities GeoArea, Population and Kindred. Implement view servlets for data presentation on test server. Completed 9. Implement data management tool Implement servlets for creating and editing data records on Intranet server. Completed 10. Implement messaging servlet Implement servlets for automatic noti¢cation of database updates. Pending 11. Full release 001 Host the fully functional release on Internet server. Pending 6 FUNG ETAL.
  • 6. CDKN2A and CDK4 germline variants under the supervision of a subcommittee of the international Melanoma Genetics Consortium [Kefford et al., 1999]. Data on unpublished variants and novel loci will be curated and added progressively. Eventually, the database content will reflect the most current genetic epidemiological profile of familial melanoma. ACKNOWLEDGMENTS We thank Christopher Liddle and Adrian Plummer for technical advice on network security issues and providing computing resources for hosting the service. We thank Helen Rizos and members of the Melanoma Genetics Consortium for their advice and assistance in curation, and Jennifer Cruz for programming support. REFERENCES Apache Group. 2002. http://java.apache.org. Bishop DT, Demenais F, Goldstein AM, Pollock P, Holland EA, Gruis N, Harland M, Ghiorzo P, Platz A, Hansson J, Bianchi- Scarra´ A, Bergman W, Bressac de Paillerets B, Mann GJ, Hayward NK, Tucker MA, Newton Bishop J, Melanoma Genetics Consortium. 2002. Geographical variation in CDKN2A penetrance for melanoma. J Natl Cancer Inst 94:894–903. Boggs M, Boggs W. 1999. Mastering UML with Rational Rose. Alameda: Sybex Inc. p 1–30. Box NF, Duffy DL, Chen W, Stark M, Martin NG, Sturm RA, Hayward NK. 2001. MC1R genotype modifies risk of melanoma in families segregating CDKN2A mutations. Am J Hum Genet 69:765–773. Cannon-Albright LA, Meyer LJ, Goldgar DE, Lewis CM, McWhorter WP, Jost M, Harrison D, Anderson DE, Zone JJ, Skolnick MH. 1994. Penetrance and expressivity of the chromosome 9p melanoma susceptibility locus (MLM). Cancer Res 54:6041–6044. den Dunnen JT, Antonarakis S. 2000. Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion. Hum Mutat 15:7–12. Goodwill J. 2001. Developing Java Servlets, 2nd ed. Indiana- polis: Sams Publishing. p 8–10. Gruis NA, Sandkuijl LA, van der Velden PA, Bergman W, Frants RR. 1995. CDKN2 explains part of the clinical phenotype in Dutch familial atypical multiple-mole mela- noma (FAMMM) syndrome families. Melanoma Res 5: 169–177. Johnson JL. 1997. Database models, languages, design. Oxford: Oxford University Press. p 745–795. Kefford RF, Newton Bishop JA, Bergman W, Tucker MA. 1999. Counseling and DNA testing for individuals perceived to be genetically predisposed to melanoma: a consensus statement of the Melanoma Genetics Consortium. J Clin Oncol 17:3245–3251. Liu L, Dilworth D, Gao L, Monzon J, Summers A, Lassam N, Hogg D. 1999. Mutation of the CDKN2A 5’UTR creates an aberrant initiation codon and predisposes to melanoma. Nat Genet 21:128–132. Online Mendelian Inheritance in Man, OMIMt. 2000. Baltimore, MD: McKusick-Nathans Institute for Genetic Medicine, Johns Hopkins University; Bethesda, MD: Na- tional Center for Biotechnology Information, National Library of Medicine. www.ncbi.nlm.nih.gov/omim/. Petschulat S. 2001. JSPs or servlets—which architecture is right for you? Java Report 6:54–57. PostgreSQL Global Development Group. 2002. http://post- gresql.org. Rizos H, Darmanian AP, Holland EA, Mann GJ, Kefford RF. 2001. Mutations in the INK4a/ARF melanoma susceptibility locus functionally impair p14ARF. J Biol Chem 276: 41424–41434. Shaw D. 2001. Extreme programming in context. Systems Developer 2:20–26. van der Velden PA, Sandkuijl LA, Bergman W, Pavel S, van Mourik L, Frants RR, Gruis NA. 2001. Melanocortin-1 receptor variant R151C modifies melanoma risk in Dutch families with melanoma. Am J Hum Genet 69: 774–779. Walker GJ, Hayward NK. 2002. p16INK4A and p14ARF tumour suppressors in melanoma: lessons from the mouse. Lancet 359:7–8. White S, Fisher M, Cattell R, Hamilton G, Hapner M. 2000. JDBC API tutorial and reference. Boston: Addison-Wesley. p 46–127. FAMILIAL MELANOMA MUTATION DATABASE 7 All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.