WPI172219015000848

Public disclosure of biological sequences in global patent practice
Osmat A. Jefferson a, b, *
, Deniz K€ollhofer a, b
, Prabha Ajjikuttira a, b
, Richard A. Jefferson a, b
a
Queensland University of Technology, Brisbane, QLD 4000, Australia
b
Cambia, P.O Box 3200, Canberra, ACT 2601, Australia
a r t i c l e i n f o
Article history:
Received 5 January 2015
Received in revised form
20 July 2015
Accepted 23 August 2015
Available online xxx
Keywords:
Patent
Biological patent
Patent sequence
Patent office
Sequence listings
Patent sequence data
Patent sequence download
PatSeq tools
Patent disclosure
a b s t r a c t
Biological sequences are an important part of global patenting, with unique challenges for their effective
and equitable use in practice and in policy. Because their function can only be determined with
computer-aided technology, the form in which sequences are disclosed matters greatly. Similarly, the
scope of patent rights sought and granted requires computer readable data and tools for comparison.
Critically, the primary data provided to the national patent offices and thence to the public, must be
comprehensive, standardized, timely and meaningful. It is not yet. The proposed global Patent Sequence
(PatSeq) Data platform can enable national and regional jurisdictions meet the desired standards.
© 2015 Cambia. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
1. Introduction
In the traditional working of the patent system, an inventor
secures governmental rights to exclude others from making, using,
or selling his/her invention for a limited time in exchange for
publicly disclosing the full details of the invention - what is called
‘the teachings’. The teachings derived from the disclosure and the
practice of an invention enable the public to use the invention
through licensing, to use the invention freely without license
outside the jurisdiction, scope and timeframe of protection, build
upon the invention through research and development, improve
upon it, or design around it to advance scientific and technological
capabilities and ultimately to benefit society.
In the contemporary use of patents to secure rights over genetic
material, the quality of these teachings has come under public
scrutiny and the role of patent offices in the disclosure process has
been challenged [1,2].
Within patent documents, genetic sequences have been viewed
both legally and practically as either chemical compounds or as
information-encoding elements, and within the context of patent
eligibility or infringement issues, their structure and function value
has gained more importance as various jurisdictions e including
the United States and Europe - attempt to balance competing in-
terests either in favor of the inventors, as the case in Europe, or the
public, as the case in USA [3].
As genetic sequences are made up of combinations of four bases
e designated as A, C, G, and T (U), in the case of DNA (RNA) e or 20
amino acids each with different chemical properties - designated
with single or triple letter codes - in the case of protein, they can
only be interpreted using specialized computer software tools. Such
tools clarify the structure, function and similarity of any sequence
relevant to other sequences. Therefore, during the disclosure pro-
cess, the applicant, the patent office, and upon publication, the
public should be able to access the disclosed sequence data and use
the computer tools to interrogate it within the context of all known
sequence listings to interpret, understand, and value their com-
bined effect on biological innovations. While some patent offices
claim to have internal computer-mediated searching, analysis and
visual tools to interpret the contextual value or meaning of patent
sequences, public access is still lacking. Moreover, creating patent
landscapes that can integrate sequence information with global
patent rights and disclosures remain expensive, slow and
* Corresponding author. Cambia/QUT, P.O. Box 3200, Canberra ACT 2601,
Australia.
E-mail address: Osmat@cambia.org (O.A. Jefferson).
Contents lists available at ScienceDirect
World Patent Information
journal homepage: www.elsevier.com/locate/worpatin
http://dx.doi.org/10.1016/j.wpi.2015.08.005
0172-2190/© 2015 Cambia. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
World Patent Information 43 (2015) 12e24

cumbersome to the public and to those professionals who cannot
afford the costly services of commercial providers.
Rules for handling of sequences in patent prosecution, imple-
mented by United States Patent and Trademark Office (USPTO) and
other major patent offices in the 1990s, required the submission of
any sequence (nucleotide or peptide) disclosed in any national or
foreign application [4]. At that time, the disclosure standard format,
known as “Sequence Listing”, was simple and file submissions were
accepted either electronically or on paper [5]. As sequence disclo-
sures grew exponentially over time, more legal rulings were
introduced regarding submissions and with respect to compliance
with standard formats. While the major offices recommended in-
ternational standards such as ST. 25 [6], for the disclosure of
sequence listings in the submitted patent applications, the sub-
mitted file formats remained flexible until recently (Table 1). Full
compliance with ST. 25 and the inclusion of the associated meta-
data such as the origin of the sequence, its length and type, func-
tion, and other markup in a computer readable format [7], were
actually achieved in only a few offices; variations in the readability
of file formats of disclosed sequence data and in its accurate
matching when transferred to public databases persist [8,9]. For
example, from 2001 until 2007, most international applications did
not comply with ST. 25 text format rules and the disclosed se-
quences were in tiff or pdf files and contained NON ASCII binary
data (Table 1, “Format of published sequences” category at WIPO in
2007).
2. Availability of published patent sequences to the public
Each of the major patent offices adopted a strategy regarding the
publication and provision of sequence listings to the public. Table 1,
column “Format of published sequence listings” depicts the prac-
tice adopted over time by USPTO, World Intellectual Property Office
(WIPO), and European Patent Office (EPO). Throughout the past 25
years, variations have existed among these offices. For example, the
published sequence data from US patent documents is available for
bulk downloads under various file formats, however USPTO does
not offer a sequence search facility to interrogate the data. The
office passes its published data to the National Center for
Biotechnology Information (NCBI) [10]. This center provides a
comprehensive public sequence search facility, BLAST, allowing
contextual interrogation of sequence data and a world class data-
base, GenBank [11], hosting nucleotide and peptide sequences from
primarily large sequencing projects and individual labs as well as
over 12 million sequences from US granted patents since 1982.
In an effort to enable access and interrogation of larger sets of
patent sequence data, NCBI and two other major public databases
providers, the European Bioinformatics Institute (EMBL-EBI) [12],
and the DNA Databank of Japan (DDBJ) [13] initiated an informal
collaboration in the early 1990s; The International Nucleotide
Sequence Database Collaboration (INSDC) [14], to exchange
nucleotide (DNA or RNA) -not protein-sequences, including those
disclosed in patents [15], and allow public access and interrogation
of the data.
Similarly, the European Patent Office releases their published
sequence listings mainly from published patent applications to
EMBL-EBI that incorporates into ENA [16] database within the
patent data class (PAT). The sequences are served to the public
along with other received sequence listings from partner in-
stitutions. The EMBL-EBI databases also provide access to protein-
based sequences in the Universal Protein Resource (Uniprot) [17].
Unlike NCBI, EMBL-EBI parses the received sequence listings and
extracts associated metadata before serving it in the ENA database
[18]. Furthermore, EMBL-EBI provides non-redundant sequence
databases based on patent sequences stored in ENA and protein
databases. The non-redundant databases are created at two levels
and contain additional annotation, patent family information and
links to patent literature [19,20].
Sequence listings disclosed in published patent documents from
Japan Patent Office (JPO) and Korean Intellectual Property Office
(KIPO) are shared through DDBJ, which is administered by the
Center for Information Biology of the National Institute of Genetics
in Japan. The Databank includes the nucleotide-based sequence
listings from patent documents published in Japan and Korea since
1997 [21]. In 2010, two amendments were introduced into this
database. First, the NCBI taxonomy ID was added to each sequence
listing based on the original organism declared for that sequence in
the patent application and the newly revised entries for nucleotides
and proteins were released in May 2010 with a scheduled update
once per year [22]. The second amendment included the release of
protein sequence listings from JPO and KIPO for ftp downloading
and later the availability of a sequence similarity search facility for
protein sequence listings from USPTO, EPO, JPO, and KIPO [23].
Other public databases that provide access to and search facility
of yet smaller collections of published patent sequences include
Patome@Korea database serving nucleotide and protein patent
sequences provided by the Korean Intellectual Property Office
(KIPO) [24] from 2004 to 2008 and maintained by the Korean
Bioinformation Center (KOBIC). Similarly, NASDAP, a semi-public
Chinese database, provided free sequence search services to
explore Chinese gene patents (applications and grants from
1999eFeb 2006), but it seems it is no longer available in our latest
search of May 2015. The database covered 123,218 sequence listings
from 8563 Chinese patents acquired from State Intellectual Prop-
erty Office as hard copies or images [25].
3. Why do we need a global and transparent patent sequence
dataset?
As NCBI, EMBL-EBI, and DDBJ decide which sequence listing data
to include in their databases and what sequence search facility to
provide on what data and when, accurate and comprehensive ac-
counting of published sequence data as disclosed in patents is then
hard to achieve. Upon reviewing the maze of the available patent
sequences from the public or commercial sources, Andree et al.
(2008) reported that each public database has still a unique dataset
and for any comprehensive searching and analysis, users may need
to access and use several databases [26].
Moreover, Cambia's 2011 survey of patent offices reveals that
over the past twenty years [27], there has been progress in
harmonizing sequence filing rules but sharing that knowledge in a
meaningful way and at a global level with the public has lagged, as
has ensuring compliance with these rules both by applicants and
internally. An optimally functioning patent office embracing such a
public disclosure responsibility would meet certain standards.
Biological inventions often disclose biological sequences, such
as DNA or proteins or portions of them, which may or may not be
claimed, and their teaching value depends on obtaining a clear
understanding of the nature and function, clear differentiation
between what is disclosed and what is claimed, and how such
sequences are used in follow-on inventions, and in innovations
(products and services) by whom, and where in the world. For
example, zooming on GALNT18 gene in PatSeq Analyzer [28]
reveals that a 15 mer portion in the 30 end region
(GGTTGGTGTGGTTGG) can be/has been used in several patent
documents in different contexts. Table 2 lists the issued patents
that reference that sequence in the claims and under various SEQ
IDs. The table also depicts the corresponding claim referencing the
SEQ ID, the claim category based on the use of that SEQ ID within
each patent, the applicant name, and the filing date.
O.A. Jefferson et al. / World Patent Information 43 (2015) 12e24 13

Table 1
Changes in patentability requirements for nucleotide or peptide sequences and in submitted and published sequence formats as adopted by United States Patent and Trademark Office (USPTO), World Intellectual Property
Organization (WIPO), and European Patent Office since 1990s.
Patent
office
Entered
into force
Patentability requirements for nucleotide or peptide sequences Format of submitted sequence listings Format of published sequence listings/
Comments
Reference
USPTO 01-10-1990 Every sequence described as cited art, used in a comparison figure or
table, or not claimed or disclosed in the specification, claims, and figures
is covered by the sequence rules and must appear in a “Sequence
Listing” section. “Sequence Listing” refers to a standard format for the
submission of even one “unbranched” nucleotide or amino acid
sequence. Branched sequences are excluded. The rules apply to any
nucleic acid sequence of ten or more nucleotides or peptide sequence of
four or more amino acids.
Submission in the standard format “Sequence
Listing” was done either on paper or Compact
Disk eR.
“A copy of the “Sequence Listing” is
available in electronic form from the
USPTO web site (http://seqdata.uspto.
gov/sequence.html?
DocID¼20010000241). An electronic
copy of the “Sequence Listing” will also
be available from the USPTO upon
request and payment of the fee set forth
in 37 CFR 1.19(b) (3) (http://www.
uspto.gov/web/offices/pac/mpep/
s2435.html)
55 Federal Register No. 84,
p. 18230, ROBERT WAX and
JAMES COBURN. Sequence
Rule Compliance-
Separating the Wizards
from the Muggles. 22
Biotechnology Law Report
397 Number 4 (August
2003). http://online.
liebertpub.com/doi/abs/10.
1089/
073003103769015915?
journalCode¼blr
19-11-1996 The requirements for restriction pursuant to 37 CFR 1.141(a) were
waived and applicants were permitted to claim, have examined, in a
single application, up to ten independent and distinct inventions
described by their nucleotide sequences. And for unity of invention
determinations pursuant to 37 CFR 1.475 et seq., up to ten, independent
and distinct molecules described by their nucleotide sequence in a
single patent application can be searched and examined in international
applications or national stage applications filed under 35 USC 371 with
four more additional sequences if applicants paid additional fees for
search and/or examination.
Submission in the standard format “Sequence
Listing” was done either on paper or Compact
Disk eR.
“Sequence data may also be accessed in
a more readily searchable manner from
the National Center for Biotechnology
Information (NCBI) at http://www.ncbi.
nlm.nih.gov or from a commercial
vendor. The USPTO forwards a copy of
the sequence data to NCBI when a
patent including a “Sequence Listing” is
granted, and when an application
containing a sequence is published
pursuant to 35 U.S.C. 122(b). If NCBI
elects to include the sequence data in
one of its databases, NCBI indexes the
sequence data according to patent or
patent application publication number.
There is currently no fee for the public
to use the NCBI site.” (http://www.
uspto.gov/web/offices/pac/mpep/
s2435.html)
1192 Off. Gaz.Pat. Office 68
http://www.uspto.gov/
web/offices/pac/dapp/opla/
preognotice/
sequence02212007.pdf
01-07-1998 The requirements for patent applications containing nucleotide
sequence and/or amino acid disclosures were published to set an
international standard with a language neutral format and using
numeric identifiers rather than the current subject headings for
“Sequence Listings”. Rules under Title 37 Code for Federal Regulation
(CFR) x 1.821e1.825 apply ONLY to applications containing sequences
that include at least ten nucleotides (four or more of which are
specifically defined) or four or more amino acids (of which four or more
are specifically defined) or both. The rules were amended to be
consistent with the new WIPO standard, ST.25 (https://www.
federalregister.gov/articles/2009/08/11/E9-19179/requirements-for-
patent-applications-containing-nucleotide-sequence-andor-amino-
acid-sequence#p-27)
a Sequence Listing must be submitted “as a
computer-readable American Standard Code for
Information Interchange (ASCII) file (the CRF)
on a diskette (Compact Disk-Recordable (CD-R)
for large Sequence Listings), as well as a printed
version of the same (or, again, CD-R[RAW2], as
37 CFR x1.52(e) contains the requirement of
filing a second CD-R in lieu of the “paper” copy).
Additionally, a statement must accompany the
Sequence Listing (“Statement to support…”)
verifying that (1) the submission does not
contain new matter and (2) the paper and
electronic copy of the listing are the same.”
(http://www.uspto.gov/web/offices/com/sol/
og/con/files/cons082.htm)
“37 CFR 1.821(e) requires the
submission of a copy of the “Sequence
Listing” in computer readable form. The
information on the computer readable
form will be entered into the Office's
database for searching and printing
nucleotide and amino acid sequences.
This electronic database will also enable
the Office to exchange patented
sequence data, in electronic form, with
the Japanese Patent Office and the
European Patent Office. It should be
noted that the Office's database
complies with the confidentiality
requirement imposed by 35 U.S.C. 122.
Pending application sequences are
maintained in the database separately
from published or patented sequences.
That is, the Office will not exchange or
make public any information on any
sequence until the patent application
containing that information is
pp. 29,620e29,643 http://
www.gpo.gov/fdsys/pkg/
FR-1998-06-01/pdf/98-
14194.pdf
O.A.Jeffersonetal./WorldPatentInformation43(2015)12e2414

published or matures into a patent, or
as otherwise allowed by 35 U.S.C. 122.”
(http://www.uspto.gov/web/offices/
pac/mpep/s2422.html)
07-11-2000 37 CFR 1.52 (e) as amended to provide for the filing of tables comprising
sequence listings on compact discs. The disc must be either a read-only
or a write-once disc. The disc must also be ASCII compliant and the
specification must contain a cross-reference to it. (http://www.ladas.
com/Patents/Patent Practice/USPractice/USPatLawRevisions-9_.html)
37 CFR Section 1.821(c) was also amended to provide “that a ‘‘Sequence
Listing’’ must be submitted either: (1) on paper, or (2) on a compact disc,
as defined in the amended x 1.52(e) and as further specified in x 1.823(a)
(2). For nucleotide and/or amino acid sequences, no change is made to
the computer readable form (CRF) practice under x 1.821(e)”. The
requirement for a paper copy of the sequences under x 1.821(c) is
modified to allow applicants to satisfy that section with either a paper
version or a submission on a CDeROM or CDeR (submitted in
duplicate). Submission on compact disk is in addition to and not a
replacement for the CRF required under x 1.821(e) (http://www.gpo.
gov/fdsys/pkg/FR-2000-09-08/pdf/00-22392.pdf)
A Sequence Listing may be submitted as “1.
Paper and disc (containing an ASCII text
computer-readable form on CD) 2. ASCII text
uploaded via Electronic Filing System (EFS) 3.
“Paperless Submission” consisting of multiple
CD submissions, but no paper.” (http://www.
seqidno.com/sequence-listing-services/rules-
summary/) (http://www.wipo.int/pct/en/texts/
pdf/pct_regulations_history.pdf)
US sequence rules were effective as of
July 1, 1998 whereas WIPO ST.25 (the
Sequence Rules) were effective as of
January 1, 1999 (Wax and Coburn's
paper 2003). Standard definitions of
“specifically defined” nucleotides and
amino acids are used based on the
World Intellectual Property
Organization (WIPO) Handbook on
Industrial Property Information and
Documentation, Standard ST.25:
Standard for the Presentation of
Nucleotide and Amino Acid Sequence
Listings in Patent Applications (1998),
including Tables 1 through 6 in
Appendix 2.
pp 54620e54681 http://
www.gpo.gov/fdsys/pkg/
FR-2000-09-08/pdf/00-
22392.pdf, http://www.
seqidno.com/sequence-
listing-services/rules-
summary/
14-10-2006 With the release of the Electronic Filing System (EFS) version 1.1, filing
of Sequence listings became easier. Filers would need to only submit a
single.txt file, provided the file is ASCII compliant (to serve both the
paper copy required by x 1.821(c) and the CRF required by x 1.821(e)),
for a sequence listing and 37 C.F.R.x 1.52(e) (5) requires that the
specification be amended to contain a reference to the material in the
text file in a separate paragraph which identifies the name of the text
file, the date of its creation, and the size of the text file in bytes.
Sequence listing text files submitted by EFS-Web have a size limit of 100
megabytes. (http://www.patentdocs.org/2009/01/sequence-listing-
efiling-options-using-efsweb.html)
Electronic submission using EFS-Web version
1.1 preferably as.txt file (.pdf are acceptable but
discouraged) (http://www.patentdocs.org/
2006/11/hasslefree_fili.html)
The requirements of US sequence rules
are less stringent than the requirements
of WIPO Standard ST.25 (1998). Under
ST.25: (1) Submissions from a Mac
computers are not accepted; (2) the
answers in fields <221> and <222>
must use selections from Tables 5 and 6
of the WIPO standard .25; (3) any free
text in field <223> will not be
translated and thus must appear in the
specification; (4) A CRF will not be
considered to be part of the disclosure
or published if filed after the filing of an
application under the PCT; and (5)
Paragraphs 24 and 39 of the WIPO
standard.25 require speficific
compliance criteria within the
sequence listing. MPEP (ROBERT WAX
and JAMES COBURN. Sequence Rule
Compliance Separating the Wizards
from the Muggles. 22 Biotechnology
Law Report 397 Number 4 (August
2003)).
Sections XIII, XVII, and XVIII
of the EFS-Web Legal
Framework http://www.
patentdocs.org/2009/01/
sequence-listing-efiling-
options-using-efsweb.html,
http://www.patentdocs.
org/2006/11/hasslefree_fili.
html
27-03-2007 USPTO rescinds the partial waiver of 37 CFR 1.141 et seq. for restriction
practice in national applications filed under 35 U.S.C. 111(a), and 37 CFR
1.475 et seq. for unity of invention determinations in both PCT
international applications and the resulting national stage applications
under 35 U.S.C. 371. “For National applications, polynucleotide
inventions will be considered for restriction, rejoinder, and examination
practice in accordance with the standards set forth in MPEP Chapter 800
(except for MPEP 803.04 which is superseded by this Notice). Claims to
polynucleotide molecules will be considered for independence,
relatedness, distinction and burden as for claims to any other type of
molecule. For International applications and national stage filings of
international applications under 35 U.S.C. 371, unity of invention
determination will be made in view of PCT Rule 13.2, 37 CFR 1.475 and
Chapter 10 of the ISPE Guidelines. Unity of invention will exist when the
polynucleotide molecules, as claimed, share a general inventive
Electronic submission using EFS-Web version
1.1 preferably as.txt file (.pdf are acceptable but
discouraged) (http://www.patentdocs.org/
2006/11/hasslefree_fili.html)
There is NO filing fee for submitting a
sequence listing as part of a U.S. patent
application. There is a filing fee for a
sequence listing filed in an international
application IF the application is more
than 30 pages. A $13 filing fee for each
page over 30 pages. There are NO page
fees for sequence listings submitted via
Electronic Filing System-Web in the
proper text format (http://www.gpo.
gov/fdsys/pkg/FR-2009-08-11/pdf/E9-
19179.pdf) Under 37 CFR 1.16(s) and
1.492(j), both U.S. and international
patent applications with paper
sequences listings that exceed 100
OG Notices: 27 March 2007
http://www.uspto.gov/
web/offices/com/sol/og/
2007/week13/patsequ.htm,
http://www.patentdocs.
org/2006/11/hasslefree_fili.
html
(continued on next page)

concept, i.e., share a technical feature which makes a contribution over
the prior art.” (http://www.uspto.gov/web/offices/com/sol/og/2007/
week13/patsequ.htm)
pages, may be subject to an application
size fee of $270 (or $135 for small
entities) for each additional 50 pages or
fraction thereof (https://www.
federalregister.gov/articles/2009/08/
11/E9-19179/requirements-for-patent-
applications-containing-nucleotide-
sequence-andor-amino-acid-
sequence#p-27).
WIPO 01-07-1992 International applications with a nucleotide and/or amino acid
sequence disclosure need to contain a listing of sequence in the
description that is in a format complying with WIPO standard .23 and in
accordance with Annex C of the Administrative Instructions.
International Search Authority may invite applicants to furnish a listing
of the sequence in a machine readable form provided for in the
Administrative Instructions (http://www.wipo.int/pct/en/texts/pdf/
pct_regulations_history.pdf)
Only USPTO mandated submission of sequence
listings in machine readable form and the file
encoded in a subset of the ASCII. European
Patent Office made the requirement mandatory
in January 1993. JPO recommended it with its
own character code, and IP Australia did not
require it but accepted the submission in ASCII.
Patent offices in Austria, Russia, Sweden, and
UK did not require submission of sequence
listings in machine readable form. (http://www.
uspto.gov/web/offices/pac/mpep/old/E5R16_AI.
pdf) (rev 14, November 1992)
Optical Character Recognition (OCR)
format was not been adopted by USPTO
whereas it was adopted by some
jurisdictions and complied with either
WIPO ST.22 or .23.
Rules 5.2 and 13ter of PCT
regulations, Sections 208,
513, 610, and Annex C of
the Administrative
Instructions (AI) as in force
from July 1, 1992 http://
www.wipo.int/pct/en/
texts/pdf/pct_regulations_
history.pdf
28-11-1994 USPTO, EPO, and JPO worked with WIPO to establish the basis of what
was to become the WIPO Standard, ST.25 (1998). (http://www.wipo.int/
meetings/en/doc_details.jsp?doc_id¼4412, PCT/MIA/VI/15)
Various formats were entertained. USPTO, EPO, and JPO proposed the
development of a single patent
sequence database with a front and
back ends (http://www.wipo.int/
meetings/en/details.jsp?meeting_
id¼2529)
PCT/MIA/V/1 and PCT/MIA/
V/2 http://www.wipo.int/
meetings/en/details.jsp?
meeting_id¼2529
01-04-1995 If International Search Authority is prepared to transcribe the sequence
listing into a machine readable form, it may request payment for the
cost of such transcription (http://www.wipo.int/pct/en/texts/pdf/pct_
regulations_history.pdf)
Various formats were entertained. USPTO, EPO,
and JPO proposed the development of a single
patent sequence database with a front and back
ends (http://www.wipo.int/meetings/en/
details.jsp?meeting_id¼2529)
The format of sequence listings in paper
and electronic form differs based on
different patent offices requirements
and sequence listings were required to
be translated for consideration in the
national stage. (http://www.uspto.gov/
web/offices/com/sol/notices/fr019819.
html)
Rule 13ter.1(a) of PCT
regulations http://www.
wipo.int/pct/en/texts/pdf/
pct_regulations_history.pdf
01-07-1998 A sequence listing would need to be a separate part of the description in
accordance with Annex C of Administrative Instructions and if that
sequence listing contains any free text, that free text would need to
appear in the main part of the description as well in the language
thereof. WIPO new standard, ST.25, replaced ST.23 and ST.24 and
established the international meaning of “Sequence listing” for
nucleotide and/or amino acid sequence disclosure and allowed
applicants to submit a single sequence listing that is acceptable to all
receiving offices, International Search, and Preliminary Examining
Authorities (for the international phase) and designated and elected
offices (for the national phase). See Annex C in http://www.uspto.gov/
web/offices/pac/mpep/old/E7R0_AI.pdf
Submission of only one sequence listing in
paper and electronic form will be required now
and no translation is needed. Computer
readable form is only required when a
competent authority requires it (http://www.
wipo.int/standards/en/pdf/03-25-01.pdf)
(http://www.wipo.int/wipostad/en/standards/
st25-en/1-0/view#2255)
(http://www.wipo.int/standards/en/pdf/
archives/03-25-01arc2009.pdf) (http://www.
wipo.int/pct/en/texts/
pdf/pct_regulations_history.pdf)
International applications in electronic
form would have mandatory data
elements: 1. Applicant Name, 2. Title of
Invention, 3. Number of SEQ ID NOs, 4.
SEQ ID NO:, 5. Length (sequence length
expressed in number of base pairs or
amino acids), 6. Type (type of molecule
sequenced in SEQ ID NO: x, either DNA,
RNA or PRT; if a nucleotide sequence
contains both DNA and RNA fragments,
the value shall be “DNA”), 7. Organism
(Genus Species (that is, scientific name)
or “Artificial Sequence” or “Unknown”)
Sequence.
Rules 5.2 and 13ter.1(a) of
PCT regulations and section
513 of Administrative
instructions http://www.
wipo.int/wipostad/en/
standards/st25-en/1-0/
view#2255 http://www.
wipo.int/standards/en/pdf/
archives/03-25-01arc2009.
pdf
01-03-2001 New instructions were put in place to deal with the filing, format, fees,
preparation, and publications of extremely large international
applications containing nucleotide and/or amino acid sequence listings.
Sequence listings will be published on the Internet on the date of
publication of the rest of the international application. (http://www.
uspto.gov/web/offices/pac/mpep/old/E8R0_AI.pdf)
Sequence listings, filed as parts of the
international applications under the new
Section 801(a) of the Administrative
Instructions allowed the applicant to file the
sequence listings (and/or tables) as: “(i) only on
an electronic medium in the computer readable
form referred to in Annex C; or (ii) both on an
under Section 805, publication of
international applications in electronic
form is at the discretion of the Director
General. “As from 2 August 2001, the
sequence listing parts of the
international applications filed under
Section 801 of the Administrative
Part 8 of the Administrative
Instructions (sections 801
e806). http://www.wipo.
int/edocs/pctndocs/en/
2001/pct_news_2001_8.pdf
Table 1 (continued)
Patent
office
Entered
into force
Comments
Reference

electronic medium in that computer readable
form and on paper in the written form referred
to in Annex C; “Tables filed in computer
readable form under Section 801(a) shall
comply with one of the following character
formats: (i) UTF-8-encoded Unicode 3.0; or (ii)
XML format conforming to the “Application-
Body” Document Type Definition referred to in
Appendix I of Annex F; at the option of the
competent Authority.” (http://www.uspto.gov/
web/offices/pac/mpep/old/E8R1_AI.pdf)
Instructions under the PCT will be
published on the Internet on the date of
publication of the rest of the
international application of which it
forms a part. Publication of a given
international application containing a
sequence listing part filed under Section
801 will thus comprise two elements
published on the same day: (i) a paper
pamphlet, as now, for all parts other
than the sequence listing part, and (ii) a
new electronic portion for the sequence
listing part only; cross-references
between the two elements will be
included for the sake of clarity.” (http://
www.wipo.int/edocs/pctndocs/en/
2001/pct_news_2001_8.pdf)
01-01-2003 Standard options were introduced for Electronic Filing and Processing of
International Applications. Part 7 does not apply to international
applications containing sequence listings. Part 8 applies. However, if
applicants submit such applications electronically, they will be subject
to Part 7 and NOT part 8 of the administrative instructions. All technical
Requirements for the Presentation of Tables Related to Nucleotide and
Amino Acid Sequence Listings in International Patent Applications
under the PCT were provided in a new annex, Annex C-bis
Annex F provided the standard for electronic
filing but the details were published in the PCT
Gazette Special Issue No. S-04/2001 dated 27
December 2001. This was not available on WIPO
website on 10/4/2012. But a more recent
version of Annex F is available, for example, at
http://www.wipo.int/pct/en/texts/pdf/ai_anf.
pdf submission of sequence listings remained as
computer read form and on paper format. While
PCT charged for electronically submitted
sequence listings as txt file, USPTO did not
charge.
Sequence listings to be available on the
internet in multiple formats.
Part 7 of the Administrative
Instructions Annex C-bis
and Annex F http://www.
wipo.int/edocs/pctndocs/
en/2001/pct_news_2001_8.
pdf
01-04-2005 USPTO implemented the restriction requirement as of November 1996
to limit an applicant claims to no more than 10 nucleotide sequences in
one application. PCT/MIA/VI/9 Administrative Instructions under PCT
were silent on the restriction requirement.
Rule 13ter. of PCT regulations was amended to
provide consistent procedures before all
authorities and to request compliance with
either the electronic form or paper filing of
sequence listings contained in the international
applications in accordance with the Standard
established in Annex C (http://www.wipo.int/
pct/en/texts/pdf/pct_regulations_history.pdf)
Sequence listings to be available on the
internet in multiple formats.
Rule 13ter. of PCT
regulations http://www.
wipo.int/pct/en/texts/pdf/
pct_regulations_history.pdf
01-10-2007 New publication system was in place to provide: “ XML daily update
files. All SLs [sequence Listings] will be included (i.e. including the SLs
extracted from the pamphlets). SLs embedded in the description will be
gradually removed. A new structure is available as follows: publication/
year/week/WO_number for the SL files, updates/year/month for the
update files. All subsequently published SLs will be added to the
publication week directory and reported in the update file. All
subsequently deleted/replaced/added SLs will trigger the update of the
corresponding international application publication content and will be
reported in the update file.” (http://www.wipo.int/patentscope/en/
news/pctdb/2007/news_0010.html)
From 2001 until 2007, most sequence listings
were from the “mixed mode” electronic
submission (PCT application on paper whereas
the sequence listings filed electronically).
From 2001 until 2007, most PCT
application DID NOT comply with ST.25
text format rules. Most sequence
listings were in TIF or pdf to TIF files and
contained NON ASCII compliant text.
(http://www.fiz-zarlsruhe.de/uploads/
tx_ptgsarelatedfiles/0210_wipo_bbm.
pdf)
PatentScope, WIPO, and
STN International Website
http://www.wipo.int/
patentscope/en/news/
pctdb/2007/news_0010.
html
01-07-2009 In view of the practice of electronic submission for sequence listings,
Part 8 of the Administrative Instructions (Sections 801e806) and Annex
C-bis became irrelevant and were deleted from the administrative
instructions. (http://www.wipo.int/edocs/pctndocs/en/2009/pct_news_
2009_07.pdf) (http://www.wipo.int/pct/en/newslett/2009/06/article_
0002.html)
A number of other modifications were also
introduced to the Administrative Instructions
under the PCT in relation to the international
filing fees: 1. mixed mode sequence listing filing
(sequence listing and tables are filed in
electronic form while the remainder of the
international application is filed on paper, when
the receiving Office accepts the filing of such
“Where a copy of a ST.25-compliant
text format sequence listing has been
furnished to the ISA under Rule 13ter.1
(for the purposes of international
search only), the ISA will forward a copy
of such a sequence listing to the
International Bureau
The International Bureau will make a
Section 707(a-bis) of the
administrative Instructions
http://www.wipo.int/
export/sites/www/pct/en/
texts/pdf/ai_9.pdf
(continued on next page)

“mixed mode” applications, will no longer be
possible, 2. there will no longer be a page fee
payable for sequence listings filed in text format
as part of an international application filed in
electronic form, 3. full page fees will be payable
for all pages of a sequence listing filed in image
format (for example, PDF format) or on paper, 4.
sequence listings, filed only for the purposes of
international search, will become publicly
available, 5. Full page fees for tables containing
sequence listings regardless of the format
submitted in (image or paper or electronic)
(http://www.wipo.int/export/sites/www/pct/
en/texts/pdf/ai_9.pdf)
copy of all sequence listings in text
format received publicly available on
PATENTSCOPE®
” (http://www.wipo.int/
pct/en/newslett/2009/06/article_0002.
html). The 2009 rules did not seem to
impact on the compliance with ST.25
text format rules and WIPO is still
sorting the 1999e2006 backlog of
sequence listings (mostly image files)
(http://www.fiz-karlsruhe.de/uploads/
tx_ptgsarelatedfiles/0210_wipo_bbm.
pdf)
01-01-2011 Paragraphs 2, 3bis, 4bis, 38 and 42 of Annex C of the Administrative
Instructions under the PCT was amended in relation to the correction,
rectification or amendment of sequence listings. These changes are only
applicable in respect to international applications files on or after
January 1, 2011. See page 1 in http://www.wipo.int/edocs/pctndocs/en/
2012/pct_news_2012_13.pdf
Annex C of the Administrative
Instructions, Rule 13ter http://www.
wipo.int/edocs/pctndocs/en/2012/pct_
news_2012_13.pdf
(April 3, 2012) A circular will be sent to all receiving Offices, International Searching
Authorities and designated Offices to introduce the new ST.26 XML
standard with links to example sequence listings in XML format (and
comparing features using ST.25 standard with those using the new
ST.26 XML standard). The Circular will inquire on when and how
implementation of the new standard can be facilitated and
accomplished over time.
Currently, the sequence listing software tool,
PatentIn, is being replaced by BISSAP, which is
expected to support both ST.25 and a draft
version of the new ST.26 XML standard. BISSAP
is being developed by European Patent Office
and will be used across all offices to help in the
preparation and processing of sequence listings.
(www.wipo.int/edocs/mdocs/pct/en/pct_wg_5/
pct_wg_5_14.doc)
Thirty percent of the sequence listings
downloadable from WIPO website are
in txt format, the rest is in image or pdf
unsearchable formats and they are
difficut to render in searchable format.
PCT/WG/5/14 www.wipo.
int/edocs/mdocs/pct/en/
pct_wg_5/pct_wg_5_14.
doc
EPO 01-01-1993 Rule 27 a (1) If nucleotide or amino acid sequences are disclosed in the
European patent application the description shall contain a sequence
listing conforming to the rules laid down by the President of the
European Patent Office for the standardized representation of
nucleotide and amino acid sequences. (4) A sequence listing filed after
the date of filing shall not form part of the description.
European Patent Convention (EPC
1973) Rule 27a (1), (4) (OJ EPO 1992,
342 ff). http://www.epo.org/law-
practice/legal-texts/html/epc/1973/e/
r27a.html
October 2,1998 Rule 27 a amended (2) The President of the European Patent Office may
require that, in addition to the written application documents, a
sequence listing in accordance with paragraph 1 be submitted on a data
carrier prescribed by him accompanied by a statement that the
information recorded on the data carrier is identical to the written
sequence listing, (3) If a sequence listing is filed or corrected after the
date of filing, the applicant shall submit a statement that the sequence
listing so filed or corrected does not include matter which goes beyond
the content of the application as filed.
The Sequence is to be submitted on a data
carrier
EPC 1998, R. 27a(2), (3) (Suppl. No. 2 to
OJ EPO 11/1998) http://www.epo.org/
law-practice/legal-texts/html/epc/
1973/e/r27a.html
13-12-2007 Rule 30 was introduced to meet the requirements of European patent
applications relating to nucleotide and amino acid sequences. Art. 56,
57, 80
R. 42 are relevant here. (1) If nucleotide or amino acid sequences are
disclosed in the European patent application, the description shall
contain a sequence listing conforming to the rules laid down by the
President of the European Patent Office for the standardized
representation of nucleotide and amino acid sequences, (2) A sequence
listing filed after the date of filing shall not form part of the description,
Disclosed sequences within the meaning of Rule
30(1) in the European patent application are to
be represented in a sequence listing which
conforms to WIPO Standard ST. 25. They can be
filed electronically and on paper. In such a case,
a copy of the sequence listing must also be
submitted in computer-readable form. (Special
edition No. 3, OJ EPO 2007, C.1 and C2)
Access to published patent sequence
data is via the EBI's website and you
purchase and bulk download patent
sequences from EPO site (http://www.
epo.org/searching/free/publication-
server/sequence-listings.html)
EPC(1973) to EPC Rule 30
replaced rule 27a(1) and
(4).Rule 27a (2), (3) was
deleted and a new clause
(3) was added to Rule 30.
http://www.epo.org/law-
practice/legal-texts/html/
epc/2010/e/r30.html
Table 1 (continued)
Patent
office
Entered
into force
Comments
Reference

(3) Where the applicant has not filed a sequence listing complying with
the require-ments under paragraph 1 at the date of filing, the European
Patent Office shall invite the applicant to furnish such a sequence listing
and pay the late furnishing fee. If the applicant does not furnish the
required sequence listing and pay the required late furnishing fee
within a period of two months after such an invitation, the application
shall be refused.
28-04-2011 European Patent Office in collaboration with national patent offices and
the European Bioinformatics Institute developed BiSSAP, which is a
computer program designed to facilitate submission of sequence
listings in patent applications (http://archive.epo.org/epo/pubs/oj011/
06_11/06_3761.pdf)
“BiSSAP can be used to prepare and verify
sequences, generate the sequence listing files
for submission, import existing sequence
listings in WIPO ST. 25 and convert between
sequence listing formats (WIPO ST. 25 and XML
proposal). It also contains a “batch verification”
module allowing users to verify collections of
sequence listings.”
Art. 6(2) Dec. of the President of the EPO
dated 28 April 2011 on the filing of
sequence listings, OJ EPO 2011, 372
requires conversion of sequence listings
into a pdf format. If they can not be
searchable, then public access will be
affected.
Rule 30 EPC, Rule 5.2 PCT,
and the Decision of the
President and Notice from
the EPO dated 28 April 2011
(OJ EPO 6/2011, 372 ff).
http://www.epo.org/law-
practice/legal-texts/html/
epc/2010/e/r30.html,
http://archive.epo.org/epo/
pubs/oj011/06_11/06_
3761.pdf
18-10-2013 Filing of Sequence listings was amended to: “1.1 If nucleotide or amino
acid sequences are disclosed in a European patent application, the
description must contain a sequence listing complying with WIPO
Standard ST.25 (Standard for the Presentation of Nucleotide and Amino
Acid Sequence Listings in Patent Applications - hereinafter referred to as
the “Standard”) (Rule 30(1) EPC in conjunction with Article 1 of the
decision of the President).”
“1.4 Under Article 1(1) of the decision of the
President, sequence listings must be submitted
in electronic form, i.e. in text format (TXT).
Further information about the document format
is set out in the Standard. The sequence listing
should no longer be filed on paper or, in the case
of electronic filing of the application, in PDF
format (see Article 1(1) and (2) of the decision of
the President). If the applicant also files the
sequence listing of his own accord on paper or
in PDF format, he must submit a statement that
the sequence listings in electronic form and on
paper or in PDF format are identical. In this case,
the paper or PDF form will be disregarded in the
further procedure.” http://archive.epo.org/epo/
pubs/oj013/11_13/11_5423.pdf
Check user feedback on the proposed
XML format ST.26 for public access to
sequence listings at (http://documents.
epo.org/projects/babylon/eponet.nsf/0/
97F67F6DDAF14D59C1257A070032C35B/$File/CL2012-522-0425User%
20feedback-on-ST.26-final.pdf)
Notice from EPO http://archive.epo.org/epo/
pubs/oj013/11_13/11_5423.pdf

Table 2
Issued patents that reference a 15 mer portion of GALNT18 gene (GGTTGGTGTGGTTGG) in the claims under various SEQ IDs along with the corresponding claim, the manually analyzed claim category, the applicant name, and the
filing date of that patent document.
Patent number_SEQ ID Claims Claim category Filing date Applicant
US_5840867_A_21 A composition consisting essentially of the aptamer having the formula: GGTTGGTGTGGTTGG (SEQ
ID NO:19), GGTTGGTGTGGTTGG.sup.#G.sup.#T (SEQ ID NO:20), GGTTGGTGTGGTT.sup.*G.sup.*G
(SEQ ID NO:21),
G.sup.*G.sup.*T.sup.*T.sup.*G.sup.*G.sup.*T.sup.*G.sup.*T.sup.*G.sup.*G.sup.*T.sup.*T.sup.*G.sup.*G
(SEQ ID NO:22)
Artificial 05-03-1994 GILEAD SCIENCES INC
(SEQ ID NO:21),
(SEQ ID NO:22),
Artificial 05-03-1994 GILEAD SCIENCES INC
(SEQ ID NO:21),
(SEQ ID NO:22),
Sequence claimed 05-03-1994 GILEAD SCIENCES INC
US_5756291_A_29 A method to detect the presence or absence of thrombin, which method comprises: a) contacting a
sample suspected of containing thrombin with a single-stranded DNA aptamer coupled to a label
under conditions wherein a complex between thrombin and the aptamer is formed; and b)
detecting the presence or absence of said complex indicating the presence or absence of thrombin;
wherein said aptamer comprises the sequence:##STR17## wherein N is A, T or G. The method of
claim 1 wherein N is T. The method of claim 1 wherein said aptamer comprises the
sequence:##STR18## wherein: N is A, T, or G; and Z is an integer from 2 to 5. The method of claim 3
wherein Z is 3. The method of claim 4 wherein said aptamer comprises the sequence:##STR19##
The method of claim 5 wherein N is T (SEQ ID NO: 29).
Probe or primer used in a method claim 06-07-1995 GILEAD SCIENCES INC
US_6323185_B1_53 An oligonucleotide having a nucleotide sequence chosen from the group consisting of SEQ ID NOS 2
e27, 29, 31e39, 46e52 and 53e87, wherein said nucleotide sequence is optionally modified at the
30
terminus or 50
terminus by attachment of a substituent moiety selected from the group consisting
of propylamine, poly-L-lysine, cholesterol, fatty acid chains of length 2 to 24 carbons, and vitamin E.
Sequence claimed 17/07/1996 US HEALTH
US_5691145_A_3 An oligonucleotide which forms an intramolecular G-quartet structure, the oligonucleotide being
labeled with a donor fluorophore and an acceptor, the donor fluorophore and the acceptor selected
such that fluorescence of the donor fluorophore is quenched by the acceptor when the
oligonucleotide forms the G-quartet structure and quenching of donor fluorophore fluorescence is
reduced upon unfolding of the G-quartet structure. The oligonucleotide of claim 1 consisting of SEQ
ID NO:3 labeled with the donor fluorophore and the acceptor.
Artificial 27/08/1996 BECTON DICKINSON CO
US_5882870_A_1 A kit for the reversible anticoagulation of blood comprising: a nucleic acid ligand which binds
thrombin, said nucleic acid ligand selected from the group consisting of a nucleic acid ligand having
SEQ ID NO:1, a nucleic acid ligand having SEQ ID NO:2, a bi-directional nucleic acid ligand
comprising two oligonucleotide segments having SEQ ID NO:5 linked at their respective 30
ends to a
phosphodiester group each of which is linked to a hexaethylene glycol chain, and a bi-directional
nucleic acid ligand comprising two oligonucleotide segments having SEQ ID NO:6 linked at their
respective 30
ends to a phosphodiester group each of which is linked to a glycerol derivative; and a
reversing agent which has greater affinity for the nucleic acid ligand than does thrombin, said
reversing agent selected from the group consisting of compositions comprising a nucleic acid
sequence complementary to that of said nucleic acid ligand, single-stranded DNA binding proteins,
copper (II), mercury (II), silver (I) and platinum complexes.
Subpart 14/01/1998 BECTON DICKINSON CO
US_6780850_B1_1 A composition comprising: a nucleic acid, that is derivatized at the 50
or 30
end or at both the 50
and 30
ends with streptavidin or a variant of streptavidin that retains biotin binding activity, that
specifically binds to thrombin, wherein said nucleic acid is 20
-fluoropyrimidine RNA or 20
-
aminopyrimidine RNA. The composition of claim 1, wherein the nucleic acid comprises nucleotides
having the sequence of SEQ ID NO: 1 or SEQ ID NO: 2. The composition of claim 13, wherein the
nucleic acid comprises nucleotides having the RNA sequence corresponding to SEQ ID NO: 1 or SEQ
ID NO: 2.
Subpart 22/06/2000 TRIUMF

As a single gene patent [29] may disclose from one to millions of
sequences but claim a selection individually in various patent
family members, or in different combinations in other versions/
family members, the volume of the redundant data can be over-
whelming. At present, sifting through such information requires
the use of all public databases and/or the costly use of combined
searches from commercially available databases [30].
An open public dataset with incorporated transparency metrics
that allow access to global published patent sequence data with
links to corresponding gene patents and interrogation through a
sequence search facility, is needed. Below is the progress report
towards building such a public facility.
4. PatSeq data platform
The new open and interactive online platform, PatSeq Data [31],
enables access to patents disclosing genetic sequences and bulk
downloads of disclosed sequence data based on jurisdiction,
document type, and either sequence type or sequence location. It
also serves as a global, open repository for national systems to
enable public sharing of sequence data associated with patents.
Three new transparency metrics were implemented in PatSeq
Data to foster confidence in the quality and quantity of the data,
whenever is made available; 1) detailed account of which juris-
dictions provide (or not) the sequence listings data, what they
provide, and how their data compares with that of PatSeq database;
2) ability to dynamically monitor and compare the degree of
overlap between each of data sources with each release date,
including the data sources from the public databases and patent
offices; 3) ability to link from PatSeq Data to the Lens and other
PatSeq tools, such as PatSeq Finder to conduct sequence searches,
view family members and download relevant information
including original patent documents (Fig. 1).
Using the second metric, for example, we were able to identify
missing sequence entries from 1990 to 2001 and missing bulk
sequence listings with more than 250 k sequence entries in Gen-
Bank. Moreover, we learned that protein sequence records from EBI
do not match those of GenBank or DDBJ patent division and may
not be synchronized with the EPO data. Table 3 depicts the progress
made so far on PatSeq database and shows the latest release data
from May 29, 2015. In summary, the current holdings of PatSeq
database are 232,590,639 sequences corresponding to more than
425,028 biological patent documents and the plan is to continue
parsing and adding any data available, especially from EPO in the
near future.
Considering that patents can be important for domestic and
global policymaking, within each jurisdiction, sequence-filing rules
were also tracked, whenever the information was available, and
displayed in the Dossier view of PatSeq Data. Sequence location
within a patent document was shown as well wherever possible
and sequence types based on publication year and document type
depicted. In the patent offices that act as an International Search
Fig. 1. Dossier view of an example jurisdiction, United States of America, in PatSeq Data, compares biological patent holdings found in the Lens with national and regional patent
offices databases, views sequence disclosures across jurisdictions over time, and allows download of sequence collections or link to the Lens to perform other searches and analyses.
This figure also depicts the newly introduced transparency metrics to accurately account for sequence listings across public databases, national and regional patent offices.

Authority, we developed timelines for relevant legal changes dur-
ing the past 30 years (Fig. 2).
As users enter the PatSeq Data site, they are offered a globe, map
or table [32] summary view wherein they can monitor the latest
holdings of PatSeq database based on the release date depicted
under the total holdings. Users can brush over the patent or
sequence data based on publication year and document type, link to
each year's patent collection in the Lens where they can explore
other PatSeq tools, such as PatSeq Finder, or simply download the
disclosed sequences. Sequences are downloadable based on
Fig. 2. Dossier view of an example jurisdiction, United States of America, in PatSeq Data showing the timelines for patentability requirements for sequences disclosed in patent
applications (see upper sections) and legal changes (legislative, administrative, and Court cases in lower sections). Clicking on each event would allow users to view a short
description of that event and to expand the view to check the reference of that information.
Table 3
Shared data between the various data sources based on PatSeq database holdings as of May 29, 2015 release date. Available sequence/
document counts are depicted for each database and as shared between two databases.
Data available/shared between two databasesa
Sequence count Patent count
EMBL_EBI 3,80,20,495 2,09,951
EMBL_EBI shares with USPTO 1,19,82,981 71,242
EMBL_EBI shares with WIPO 50,82,962 13,579
NCBI 3,59,21,386 1,84,878
NCBI shares with USPTO 1,20,84,974 72,534
NCBI shares with WIPO 47,10,219 10,524
DDBJ 3,69,69,064 2,01,130
DDBJ shares with USPTO 1,15,36,423 61,114
DDBJ shares with WIPO 47,78,925 11,789
DDBJ shares with NCBI 3,51,44,194 1,70,752
DDBJ shares with EMBL_EBI 3,61,23,009 1,83,885
NCBI shares with EMBL_EBI 3,58,06,474 1,83,338
USPTO 15,75,31,318 2,14,512
WIPO 3,52,73,473 29,662
CIPO 1,70,40,805 38,284
a
Sequence listings from EPO full text documents are yet to be included in the database.
O.A. Jefferson et al. / World Patent Information 43 (2015) 12e2422

document type, sequence type, and sequence location in document.
For example, the title, “Grants: Nucleotides (all)”, refers to nucle-
otide sequences disclosed in granted patent documents regardless
of where they are referenced in the documents whereas “Grants:
Nucleotides (in Claims)” refers to a subset of the earlier collection
and wherein the nucleotide sequences are referenced in the claims
of the granted patent documents. The data is available at no cost for
non-commercial users and for a fee for commercial users.
By using graphical globe or map function and hovering over a
jurisdiction, users can view in a floating tooltip the type of
contextual information available from that jurisdiction at the time
of their visit, access it in the dossier view and if sequence disclo-
sures were shared with us, users would be able to download them.
Once in the dossier view, for instance Germany [33], users may
choose to explore and link to all related biological patents available
for Germany in the Lens, view the mechanism by which Germany
shares publicly the data, learn about its format and coverage
whenever provided, examine annual biological patent holdings,
and compare these holdings with the declared holdings of the
patent office, as they become available.
Under “Sequences”, users can learn more about the nature of the
disclosed sequences. For example, while brushing over sequence
holdings in a particular year, the proportion of sequence types and
their distribution in the patent documents (in claims, summary,
drawings, example, and specification), and sources of data are
displayed dynamically allowing for a direct comparison of data
sources. The statistics reflect PatSeq data as of the published release
date shown in the summary view and in each of the jurisdiction
dossier view, and will update automatically with the regular data
feed updates (currently it is at monthly intervals) or as more
sequence listing data sources are added in the master database. For
example, in April 15, 2015, the contents of PatSeq database
increased by 25%.
The other relevant contextual information in the dossier view
includes; a) sequence filing rules for either nucleotide or peptide in
that jurisdiction based on Cambia 2011- and WIPO 2001 surveys
[34], b) a timeline for relevant legal changes in the jurisdictions that
act as International Search Authority (Fig. 2), and c) contact details
of the officer who contributed the information or a link to the actual
patent office website for more details if that information is pro-
vided on the official website.
Before releasing this facility, more than 50 patent offices were
consulted with and some of their requested features incorporated
in PatSeq Data. Moreover, offices such as USPTO, IP Australia, CIPO,
and GPTO have contributed sequence data to be included in PatSeq
database and while others promised to do so, some offices were not
in a position to provide the data such as the Danish Patent and
Trademark Office or the Israel patent office that does not even
publish the sequence data along with the patent (personal com-
munications). As it is now clear that many patent offices simply do
not have access to the analytic or server capabilities to host their
sequences in a useful manner, the PatSeq Data tool and the entire
PatSeq facility as a global non-government activity, offers them
such a service. The Lens collaborative project will continue reaching
out to other patent offices to demonstrate the public value of Pat-
Seq facility.
5. Conclusion
Many governments face tough policy choices around the pro-
tection or use of IP on biological technologies and materials. The
addition of new data from diverse patent offices and com-
plementing the missing data through patent family association will
enable users to compare patenting activity between various juris-
dictions, and engage in better-informed debates on the appropriate
degree of gene patenting to optimize economic and social impacts.
PatSeq Data allows offices to upload and share their holdings, and
for users to download and analyze sequence sets associated with
global patent documents.
Acknowledgments
This work was supported, in part, by the Bill & Melinda Gates
Foundation, Global Health Grant ID 52239; Gordon and Betty
Moore Foundation “Grant GBMF3465”; Queensland University
Technology “Grant 321121-0023/08”; and Queensland University
Technology and Syngenta Crop Protection AG “Research collabo-
ration No: 1400001566”. We thank the Lens team for their
continued support and improvement of the Lens functionalities
and Small Multiples, a private visualization company in Sydney,
Australia for implementing the open source-globe feature in the
platform design of PatSeq Data. We also appreciate the assistance of
Nina Prasolova and Innokenti Epichev in the research phase of this
project.
References
[1] A. Devlin, The misunderstood function of disclosure, Pat. Law Harv. J. Law
Technol. 23 (2010) 401e446.
[2] P. Drahos, Rethinking the Role of the Patent Office from the Perspective of
Responsive Regulation, Chapter 5 in Emerging Markets and the World Patent
Order: The Forces of Change by F.M. Abbott, C.M. Correa and P. Drahos,
Edward Elgar Publishing, Cheltenham, 2014, pp. 78e99.
[3] J. Kraus, T. Takenaka, Construction of an Efficient and Balanced Patent System:
Patentability and Patent Scope of Isolated DNA Sequence Under US Patent Act
and EU Biotech Directive, Chapter 11 in Constructing European Intellectual
Property : Achievements and New Perspectives by C. Geiger, Edward Elgar
Publishing, Cheltenham, 2013, pp. 255e270.
[4] http://seqdata.uspto.gov/sequence.html?DocID¼20010000241.
[5] R. Wax, J. Coburn, Sequence rule compliance dseparating the wizards from
the muggles, Biotechnol. Law Rep. 22 (2003) 397e400.
[6] http://www.wipo.int/standards/en/pdf/03-25-01.pdf.
[7] www.wipo.int/edocs/mdocs/pct/en/pct_wg_5/pct_wg_5_14.doc.
[8] R. Jones, Errors in patent application sequence listings, Nat. Biotechnol. 21
(2003) 1239e1240.
[9] https://www.stn-international.org/uploads/tx_ptgsarelatedfiles/0210_wipo_
bbm.pdf.
[10] http://www.ncbi.nlm.nih.gov/genbank/.
[11] D.A. Benson, I. Karsch-Mizrachi, D.J. Lipman, J. Ostell, D.L. Wheeler, GenBank,
Nucleic Acids Res. 33 (Database issue) (2005) D34eD38. Available at: http://
www.ncbi.nlm.nih.gov/pmc/articles/PMC540017/.
[12] http://www.ebi.ac.uk.
[13] http://www.ddbj.nig.ac.jp/.
[14] http://www.insdc.org.
[15] I. Karsch-Mizrachi, Y. Nakamura, G. Cochrane, The international nucleotide
sequence database collaboration, Nucleic Acids Res. 40 (Database issue)
(2012) D33eD37. Available at http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3244996/.
[16] http://www.ebi.ac.uk/ena.
[17] G. Cochrane, P. Aldebert, N. Althorpe, M. Andersson, W. Baker, A. Baldwin,
K. Bates, S. Bhattacharyya, P. Browne, A. van den Broek, et al., EMBL nucleotide
sequence database: developments in 2005, Nucleic Acids Res. 34 (Database
issue) (2006) D10eD15. Available at: http://www.ncbi.nlm.nih.gov/pmc/
articles/PMC1347492/.
[18] http://www.ebi.ac.uk/ebisearch/advancedsearchwizard.ebi?
domain¼patentdb.
[19] W. Li, H. McWilliam, A.R. de la Torre, A. Grodowski, I. Benediktovich,
M. Goujon, S. Nauche, R. Lopez, Non-redundant patent sequence databases
with value added annotations at two levels, Nucleic Acids Res. 38 (Database
issue) (2010) D52eD56. Available at http://www.ncbi.nlm.nih.gov/pmc/
articles/PMC2808894/.
[20] J. McDowall, Prioritizing patent sequence search results using annotation-rich
data, World Pat. Inf. 33 (2011) 236.
[21] K. Okubo, H. Sugawara, T. Gojobori, Y. Tateno, DDBJ in preparation for over-
view of research activities behind data submissions, Nucleic Acids Res. 34
(Database issue) (2006) D6eD9. Available at: http://www.ncbi.nlm.nih.gov/
pmc/articles/PMC1347473/.
[22] Eli Kaminuma, Takehide Kosuge, Yuichi Kodama, Hideo Aono,
Jun Mashima,Takashi Gojobori, Hideaki Sugawara, Osamu Ogasawara,
Toshihisa Takagi, Kousaku Okubo, Yasukazu Nakamura, DDBJ progress report,
Nucleic Acids Res. 39 (Database issue) (2011) D22eD27.
[23] Ibid
[24] http://verdi.kobic.re.kr/patome_kr_en/.

[25] http://www.intellogist.com/wiki/NASDAP.
[26] P.J. Andree, et al., A comparative study of patent sequence databases, World
Pat. Inf. 30 (2008) 300e308.
[27] O.A. Jefferson, D. K€ollhofer, T.H. Ehrich, R.A. Jefferson, Transparency tools in
gene patenting for informing policy and practice, Nat. Biotechnol. 31 (2013)
1086e1093. http://www.nature.com/nbt/journal/v31/n12/full/nbt.2755.html.
[28] https://www.lens.org/lens/bio/patseqanalyzer#psa//homo_sapiens/latest/
chromosome/11/11494656-11612692.
[29] We use the term ‘gene patent’ to include patents and patent applications that
disclose and/ or claim nucleotide or peptide sequences. Thus not all ‘gene
patents’ in this use have enforceable rights, nor do they necessarily include
sequences as essentially claimed material
[30] Ibid, Supra note 26.
[31] https://www.lens.org/lens/bio/patseqdata.
[32] https://www.lens.org/lens/bio/patseqdata#globe/; https://www.lens.org/lens/
bio/patseqdata#map/US/; and https://www.lens.org/lens/bio/patseqdata#table
/US/.
[33] http://patseqdev.lens.org/lens/bio/patseqdata#globe/DE/.
[34] WIPO Secretariat 48, WIPO, Geneva, 2001. Available at: http://www.wipo.int/
edocs/mdocs/tk/en/wipo_grtkf_ic_1/wipo_grtkf_ic_1_6.pdf.
O.A. Jefferson et al. / World Patent Information 43 (2015) 12e2424

WPI172219015000848

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to WPI172219015000848

Similar to WPI172219015000848 (20)

WPI172219015000848