SlideShare a Scribd company logo
1 of 13
Download to read offline
Public disclosure of biological sequences in global patent practice
Osmat A. Jefferson a, b, *
, Deniz K€ollhofer a, b
, Prabha Ajjikuttira a, b
, Richard A. Jefferson a, b
a
Queensland University of Technology, Brisbane, QLD 4000, Australia
b
Cambia, P.O Box 3200, Canberra, ACT 2601, Australia
a r t i c l e i n f o
Article history:
Received 5 January 2015
Received in revised form
20 July 2015
Accepted 23 August 2015
Available online xxx
Keywords:
Patent
Biological patent
Patent sequence
Patent office
Sequence listings
Patent sequence data
Patent sequence download
PatSeq tools
Patent disclosure
a b s t r a c t
Biological sequences are an important part of global patenting, with unique challenges for their effective
and equitable use in practice and in policy. Because their function can only be determined with
computer-aided technology, the form in which sequences are disclosed matters greatly. Similarly, the
scope of patent rights sought and granted requires computer readable data and tools for comparison.
Critically, the primary data provided to the national patent offices and thence to the public, must be
comprehensive, standardized, timely and meaningful. It is not yet. The proposed global Patent Sequence
(PatSeq) Data platform can enable national and regional jurisdictions meet the desired standards.
© 2015 Cambia. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
1. Introduction
In the traditional working of the patent system, an inventor
secures governmental rights to exclude others from making, using,
or selling his/her invention for a limited time in exchange for
publicly disclosing the full details of the invention - what is called
‘the teachings’. The teachings derived from the disclosure and the
practice of an invention enable the public to use the invention
through licensing, to use the invention freely without license
outside the jurisdiction, scope and timeframe of protection, build
upon the invention through research and development, improve
upon it, or design around it to advance scientific and technological
capabilities and ultimately to benefit society.
In the contemporary use of patents to secure rights over genetic
material, the quality of these teachings has come under public
scrutiny and the role of patent offices in the disclosure process has
been challenged [1,2].
Within patent documents, genetic sequences have been viewed
both legally and practically as either chemical compounds or as
information-encoding elements, and within the context of patent
eligibility or infringement issues, their structure and function value
has gained more importance as various jurisdictions e including
the United States and Europe - attempt to balance competing in-
terests either in favor of the inventors, as the case in Europe, or the
public, as the case in USA [3].
As genetic sequences are made up of combinations of four bases
e designated as A, C, G, and T (U), in the case of DNA (RNA) e or 20
amino acids each with different chemical properties - designated
with single or triple letter codes - in the case of protein, they can
only be interpreted using specialized computer software tools. Such
tools clarify the structure, function and similarity of any sequence
relevant to other sequences. Therefore, during the disclosure pro-
cess, the applicant, the patent office, and upon publication, the
public should be able to access the disclosed sequence data and use
the computer tools to interrogate it within the context of all known
sequence listings to interpret, understand, and value their com-
bined effect on biological innovations. While some patent offices
claim to have internal computer-mediated searching, analysis and
visual tools to interpret the contextual value or meaning of patent
sequences, public access is still lacking. Moreover, creating patent
landscapes that can integrate sequence information with global
patent rights and disclosures remain expensive, slow and
* Corresponding author. Cambia/QUT, P.O. Box 3200, Canberra ACT 2601,
Australia.
E-mail address: Osmat@cambia.org (O.A. Jefferson).
Contents lists available at ScienceDirect
World Patent Information
journal homepage: www.elsevier.com/locate/worpatin
http://dx.doi.org/10.1016/j.wpi.2015.08.005
0172-2190/© 2015 Cambia. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
World Patent Information 43 (2015) 12e24
cumbersome to the public and to those professionals who cannot
afford the costly services of commercial providers.
Rules for handling of sequences in patent prosecution, imple-
mented by United States Patent and Trademark Office (USPTO) and
other major patent offices in the 1990s, required the submission of
any sequence (nucleotide or peptide) disclosed in any national or
foreign application [4]. At that time, the disclosure standard format,
known as “Sequence Listing”, was simple and file submissions were
accepted either electronically or on paper [5]. As sequence disclo-
sures grew exponentially over time, more legal rulings were
introduced regarding submissions and with respect to compliance
with standard formats. While the major offices recommended in-
ternational standards such as ST. 25 [6], for the disclosure of
sequence listings in the submitted patent applications, the sub-
mitted file formats remained flexible until recently (Table 1). Full
compliance with ST. 25 and the inclusion of the associated meta-
data such as the origin of the sequence, its length and type, func-
tion, and other markup in a computer readable format [7], were
actually achieved in only a few offices; variations in the readability
of file formats of disclosed sequence data and in its accurate
matching when transferred to public databases persist [8,9]. For
example, from 2001 until 2007, most international applications did
not comply with ST. 25 text format rules and the disclosed se-
quences were in tiff or pdf files and contained NON ASCII binary
data (Table 1, “Format of published sequences” category at WIPO in
2007).
2. Availability of published patent sequences to the public
Each of the major patent offices adopted a strategy regarding the
publication and provision of sequence listings to the public. Table 1,
column “Format of published sequence listings” depicts the prac-
tice adopted over time by USPTO, World Intellectual Property Office
(WIPO), and European Patent Office (EPO). Throughout the past 25
years, variations have existed among these offices. For example, the
published sequence data from US patent documents is available for
bulk downloads under various file formats, however USPTO does
not offer a sequence search facility to interrogate the data. The
office passes its published data to the National Center for
Biotechnology Information (NCBI) [10]. This center provides a
comprehensive public sequence search facility, BLAST, allowing
contextual interrogation of sequence data and a world class data-
base, GenBank [11], hosting nucleotide and peptide sequences from
primarily large sequencing projects and individual labs as well as
over 12 million sequences from US granted patents since 1982.
In an effort to enable access and interrogation of larger sets of
patent sequence data, NCBI and two other major public databases
providers, the European Bioinformatics Institute (EMBL-EBI) [12],
and the DNA Databank of Japan (DDBJ) [13] initiated an informal
collaboration in the early 1990s; The International Nucleotide
Sequence Database Collaboration (INSDC) [14], to exchange
nucleotide (DNA or RNA) -not protein-sequences, including those
disclosed in patents [15], and allow public access and interrogation
of the data.
Similarly, the European Patent Office releases their published
sequence listings mainly from published patent applications to
EMBL-EBI that incorporates into ENA [16] database within the
patent data class (PAT). The sequences are served to the public
along with other received sequence listings from partner in-
stitutions. The EMBL-EBI databases also provide access to protein-
based sequences in the Universal Protein Resource (Uniprot) [17].
Unlike NCBI, EMBL-EBI parses the received sequence listings and
extracts associated metadata before serving it in the ENA database
[18]. Furthermore, EMBL-EBI provides non-redundant sequence
databases based on patent sequences stored in ENA and protein
databases. The non-redundant databases are created at two levels
and contain additional annotation, patent family information and
links to patent literature [19,20].
Sequence listings disclosed in published patent documents from
Japan Patent Office (JPO) and Korean Intellectual Property Office
(KIPO) are shared through DDBJ, which is administered by the
Center for Information Biology of the National Institute of Genetics
in Japan. The Databank includes the nucleotide-based sequence
listings from patent documents published in Japan and Korea since
1997 [21]. In 2010, two amendments were introduced into this
database. First, the NCBI taxonomy ID was added to each sequence
listing based on the original organism declared for that sequence in
the patent application and the newly revised entries for nucleotides
and proteins were released in May 2010 with a scheduled update
once per year [22]. The second amendment included the release of
protein sequence listings from JPO and KIPO for ftp downloading
and later the availability of a sequence similarity search facility for
protein sequence listings from USPTO, EPO, JPO, and KIPO [23].
Other public databases that provide access to and search facility
of yet smaller collections of published patent sequences include
Patome@Korea database serving nucleotide and protein patent
sequences provided by the Korean Intellectual Property Office
(KIPO) [24] from 2004 to 2008 and maintained by the Korean
Bioinformation Center (KOBIC). Similarly, NASDAP, a semi-public
Chinese database, provided free sequence search services to
explore Chinese gene patents (applications and grants from
1999eFeb 2006), but it seems it is no longer available in our latest
search of May 2015. The database covered 123,218 sequence listings
from 8563 Chinese patents acquired from State Intellectual Prop-
erty Office as hard copies or images [25].
3. Why do we need a global and transparent patent sequence
dataset?
As NCBI, EMBL-EBI, and DDBJ decide which sequence listing data
to include in their databases and what sequence search facility to
provide on what data and when, accurate and comprehensive ac-
counting of published sequence data as disclosed in patents is then
hard to achieve. Upon reviewing the maze of the available patent
sequences from the public or commercial sources, Andree et al.
(2008) reported that each public database has still a unique dataset
and for any comprehensive searching and analysis, users may need
to access and use several databases [26].
Moreover, Cambia's 2011 survey of patent offices reveals that
over the past twenty years [27], there has been progress in
harmonizing sequence filing rules but sharing that knowledge in a
meaningful way and at a global level with the public has lagged, as
has ensuring compliance with these rules both by applicants and
internally. An optimally functioning patent office embracing such a
public disclosure responsibility would meet certain standards.
Biological inventions often disclose biological sequences, such
as DNA or proteins or portions of them, which may or may not be
claimed, and their teaching value depends on obtaining a clear
understanding of the nature and function, clear differentiation
between what is disclosed and what is claimed, and how such
sequences are used in follow-on inventions, and in innovations
(products and services) by whom, and where in the world. For
example, zooming on GALNT18 gene in PatSeq Analyzer [28]
reveals that a 15 mer portion in the 30 end region
(GGTTGGTGTGGTTGG) can be/has been used in several patent
documents in different contexts. Table 2 lists the issued patents
that reference that sequence in the claims and under various SEQ
IDs. The table also depicts the corresponding claim referencing the
SEQ ID, the claim category based on the use of that SEQ ID within
each patent, the applicant name, and the filing date.
O.A. Jefferson et al. / World Patent Information 43 (2015) 12e24 13
Table 1
Changes in patentability requirements for nucleotide or peptide sequences and in submitted and published sequence formats as adopted by United States Patent and Trademark Office (USPTO), World Intellectual Property
Organization (WIPO), and European Patent Office since 1990s.
Patent
office
Entered
into force
Patentability requirements for nucleotide or peptide sequences Format of submitted sequence listings Format of published sequence listings/
Comments
Reference
USPTO 01-10-1990 Every sequence described as cited art, used in a comparison figure or
table, or not claimed or disclosed in the specification, claims, and figures
is covered by the sequence rules and must appear in a “Sequence
Listing” section. “Sequence Listing” refers to a standard format for the
submission of even one “unbranched” nucleotide or amino acid
sequence. Branched sequences are excluded. The rules apply to any
nucleic acid sequence of ten or more nucleotides or peptide sequence of
four or more amino acids.
Submission in the standard format “Sequence
Listing” was done either on paper or Compact
Disk eR.
“A copy of the “Sequence Listing” is
available in electronic form from the
USPTO web site (http://seqdata.uspto.
gov/sequence.html?
DocID¼20010000241). An electronic
copy of the “Sequence Listing” will also
be available from the USPTO upon
request and payment of the fee set forth
in 37 CFR 1.19(b) (3) (http://www.
uspto.gov/web/offices/pac/mpep/
s2435.html)
55 Federal Register No. 84,
p. 18230, ROBERT WAX and
JAMES COBURN. Sequence
Rule Compliance-
Separating the Wizards
from the Muggles. 22
Biotechnology Law Report
397 Number 4 (August
2003). http://online.
liebertpub.com/doi/abs/10.
1089/
073003103769015915?
journalCode¼blr
19-11-1996 The requirements for restriction pursuant to 37 CFR 1.141(a) were
waived and applicants were permitted to claim, have examined, in a
single application, up to ten independent and distinct inventions
described by their nucleotide sequences. And for unity of invention
determinations pursuant to 37 CFR 1.475 et seq., up to ten, independent
and distinct molecules described by their nucleotide sequence in a
single patent application can be searched and examined in international
applications or national stage applications filed under 35 USC 371 with
four more additional sequences if applicants paid additional fees for
search and/or examination.
Submission in the standard format “Sequence
Listing” was done either on paper or Compact
Disk eR.
“Sequence data may also be accessed in
a more readily searchable manner from
the National Center for Biotechnology
Information (NCBI) at http://www.ncbi.
nlm.nih.gov or from a commercial
vendor. The USPTO forwards a copy of
the sequence data to NCBI when a
patent including a “Sequence Listing” is
granted, and when an application
containing a sequence is published
pursuant to 35 U.S.C. 122(b). If NCBI
elects to include the sequence data in
one of its databases, NCBI indexes the
sequence data according to patent or
patent application publication number.
There is currently no fee for the public
to use the NCBI site.” (http://www.
uspto.gov/web/offices/pac/mpep/
s2435.html)
1192 Off. Gaz.Pat. Office 68
http://www.uspto.gov/
web/offices/pac/dapp/opla/
preognotice/
sequence02212007.pdf
01-07-1998 The requirements for patent applications containing nucleotide
sequence and/or amino acid disclosures were published to set an
international standard with a language neutral format and using
numeric identifiers rather than the current subject headings for
“Sequence Listings”. Rules under Title 37 Code for Federal Regulation
(CFR) x 1.821e1.825 apply ONLY to applications containing sequences
that include at least ten nucleotides (four or more of which are
specifically defined) or four or more amino acids (of which four or more
are specifically defined) or both. The rules were amended to be
consistent with the new WIPO standard, ST.25 (https://www.
federalregister.gov/articles/2009/08/11/E9-19179/requirements-for-
patent-applications-containing-nucleotide-sequence-andor-amino-
acid-sequence#p-27)
a Sequence Listing must be submitted “as a
computer-readable American Standard Code for
Information Interchange (ASCII) file (the CRF)
on a diskette (Compact Disk-Recordable (CD-R)
for large Sequence Listings), as well as a printed
version of the same (or, again, CD-R[RAW2], as
37 CFR x1.52(e) contains the requirement of
filing a second CD-R in lieu of the “paper” copy).
Additionally, a statement must accompany the
Sequence Listing (“Statement to support…”)
verifying that (1) the submission does not
contain new matter and (2) the paper and
electronic copy of the listing are the same.”
(http://www.uspto.gov/web/offices/com/sol/
og/con/files/cons082.htm)
“37 CFR 1.821(e) requires the
submission of a copy of the “Sequence
Listing” in computer readable form. The
information on the computer readable
form will be entered into the Office's
database for searching and printing
nucleotide and amino acid sequences.
This electronic database will also enable
the Office to exchange patented
sequence data, in electronic form, with
the Japanese Patent Office and the
European Patent Office. It should be
noted that the Office's database
complies with the confidentiality
requirement imposed by 35 U.S.C. 122.
Pending application sequences are
maintained in the database separately
from published or patented sequences.
That is, the Office will not exchange or
make public any information on any
sequence until the patent application
containing that information is
63 Federal Register No. 104,
pp. 29,620e29,643 http://
www.gpo.gov/fdsys/pkg/
FR-1998-06-01/pdf/98-
14194.pdf
O.A.Jeffersonetal./WorldPatentInformation43(2015)12e2414
published or matures into a patent, or
as otherwise allowed by 35 U.S.C. 122.”
(http://www.uspto.gov/web/offices/
pac/mpep/s2422.html)
07-11-2000 37 CFR 1.52 (e) as amended to provide for the filing of tables comprising
sequence listings on compact discs. The disc must be either a read-only
or a write-once disc. The disc must also be ASCII compliant and the
specification must contain a cross-reference to it. (http://www.ladas.
com/Patents/Patent Practice/USPractice/USPatLawRevisions-9_.html)
37 CFR Section 1.821(c) was also amended to provide “that a ‘‘Sequence
Listing’’ must be submitted either: (1) on paper, or (2) on a compact disc,
as defined in the amended x 1.52(e) and as further specified in x 1.823(a)
(2). For nucleotide and/or amino acid sequences, no change is made to
the computer readable form (CRF) practice under x 1.821(e)”. The
requirement for a paper copy of the sequences under x 1.821(c) is
modified to allow applicants to satisfy that section with either a paper
version or a submission on a CDeROM or CDeR (submitted in
duplicate). Submission on compact disk is in addition to and not a
replacement for the CRF required under x 1.821(e) (http://www.gpo.
gov/fdsys/pkg/FR-2000-09-08/pdf/00-22392.pdf)
A Sequence Listing may be submitted as “1.
Paper and disc (containing an ASCII text
computer-readable form on CD) 2. ASCII text
uploaded via Electronic Filing System (EFS) 3.
“Paperless Submission” consisting of multiple
CD submissions, but no paper.” (http://www.
seqidno.com/sequence-listing-services/rules-
summary/) (http://www.wipo.int/pct/en/texts/
pdf/pct_regulations_history.pdf)
US sequence rules were effective as of
July 1, 1998 whereas WIPO ST.25 (the
Sequence Rules) were effective as of
January 1, 1999 (Wax and Coburn's
paper 2003). Standard definitions of
“specifically defined” nucleotides and
amino acids are used based on the
World Intellectual Property
Organization (WIPO) Handbook on
Industrial Property Information and
Documentation, Standard ST.25:
Standard for the Presentation of
Nucleotide and Amino Acid Sequence
Listings in Patent Applications (1998),
including Tables 1 through 6 in
Appendix 2.
65 Federal Register No. 175,
pp 54620e54681 http://
www.gpo.gov/fdsys/pkg/
FR-2000-09-08/pdf/00-
22392.pdf, http://www.
seqidno.com/sequence-
listing-services/rules-
summary/
14-10-2006 With the release of the Electronic Filing System (EFS) version 1.1, filing
of Sequence listings became easier. Filers would need to only submit a
single.txt file, provided the file is ASCII compliant (to serve both the
paper copy required by x 1.821(c) and the CRF required by x 1.821(e)),
for a sequence listing and 37 C.F.R.x 1.52(e) (5) requires that the
specification be amended to contain a reference to the material in the
text file in a separate paragraph which identifies the name of the text
file, the date of its creation, and the size of the text file in bytes.
Sequence listing text files submitted by EFS-Web have a size limit of 100
megabytes. (http://www.patentdocs.org/2009/01/sequence-listing-
efiling-options-using-efsweb.html)
Electronic submission using EFS-Web version
1.1 preferably as.txt file (.pdf are acceptable but
discouraged) (http://www.patentdocs.org/
2006/11/hasslefree_fili.html)
The requirements of US sequence rules
are less stringent than the requirements
of WIPO Standard ST.25 (1998). Under
ST.25: (1) Submissions from a Mac
computers are not accepted; (2) the
answers in fields <221> and <222>
must use selections from Tables 5 and 6
of the WIPO standard .25; (3) any free
text in field <223> will not be
translated and thus must appear in the
specification; (4) A CRF will not be
considered to be part of the disclosure
or published if filed after the filing of an
application under the PCT; and (5)
Paragraphs 24 and 39 of the WIPO
standard.25 require speficific
compliance criteria within the
sequence listing. MPEP (ROBERT WAX
and JAMES COBURN. Sequence Rule
Compliance Separating the Wizards
from the Muggles. 22 Biotechnology
Law Report 397 Number 4 (August
2003)).
Sections XIII, XVII, and XVIII
of the EFS-Web Legal
Framework http://www.
patentdocs.org/2009/01/
sequence-listing-efiling-
options-using-efsweb.html,
http://www.patentdocs.
org/2006/11/hasslefree_fili.
html
27-03-2007 USPTO rescinds the partial waiver of 37 CFR 1.141 et seq. for restriction
practice in national applications filed under 35 U.S.C. 111(a), and 37 CFR
1.475 et seq. for unity of invention determinations in both PCT
international applications and the resulting national stage applications
under 35 U.S.C. 371. “For National applications, polynucleotide
inventions will be considered for restriction, rejoinder, and examination
practice in accordance with the standards set forth in MPEP Chapter 800
(except for MPEP 803.04 which is superseded by this Notice). Claims to
polynucleotide molecules will be considered for independence,
relatedness, distinction and burden as for claims to any other type of
molecule. For International applications and national stage filings of
international applications under 35 U.S.C. 371, unity of invention
determination will be made in view of PCT Rule 13.2, 37 CFR 1.475 and
Chapter 10 of the ISPE Guidelines. Unity of invention will exist when the
polynucleotide molecules, as claimed, share a general inventive
Electronic submission using EFS-Web version
1.1 preferably as.txt file (.pdf are acceptable but
discouraged) (http://www.patentdocs.org/
2006/11/hasslefree_fili.html)
There is NO filing fee for submitting a
sequence listing as part of a U.S. patent
application. There is a filing fee for a
sequence listing filed in an international
application IF the application is more
than 30 pages. A $13 filing fee for each
page over 30 pages. There are NO page
fees for sequence listings submitted via
Electronic Filing System-Web in the
proper text format (http://www.gpo.
gov/fdsys/pkg/FR-2009-08-11/pdf/E9-
19179.pdf) Under 37 CFR 1.16(s) and
1.492(j), both U.S. and international
patent applications with paper
sequences listings that exceed 100
OG Notices: 27 March 2007
http://www.uspto.gov/
web/offices/com/sol/og/
2007/week13/patsequ.htm,
http://www.patentdocs.
org/2006/11/hasslefree_fili.
html
(continued on next page)
O.A.Jeffersonetal./WorldPatentInformation43(2015)12e2415
concept, i.e., share a technical feature which makes a contribution over
the prior art.” (http://www.uspto.gov/web/offices/com/sol/og/2007/
week13/patsequ.htm)
pages, may be subject to an application
size fee of $270 (or $135 for small
entities) for each additional 50 pages or
fraction thereof (https://www.
federalregister.gov/articles/2009/08/
11/E9-19179/requirements-for-patent-
applications-containing-nucleotide-
sequence-andor-amino-acid-
sequence#p-27).
WIPO 01-07-1992 International applications with a nucleotide and/or amino acid
sequence disclosure need to contain a listing of sequence in the
description that is in a format complying with WIPO standard .23 and in
accordance with Annex C of the Administrative Instructions.
International Search Authority may invite applicants to furnish a listing
of the sequence in a machine readable form provided for in the
Administrative Instructions (http://www.wipo.int/pct/en/texts/pdf/
pct_regulations_history.pdf)
Only USPTO mandated submission of sequence
listings in machine readable form and the file
encoded in a subset of the ASCII. European
Patent Office made the requirement mandatory
in January 1993. JPO recommended it with its
own character code, and IP Australia did not
require it but accepted the submission in ASCII.
Patent offices in Austria, Russia, Sweden, and
UK did not require submission of sequence
listings in machine readable form. (http://www.
uspto.gov/web/offices/pac/mpep/old/E5R16_AI.
pdf) (rev 14, November 1992)
Optical Character Recognition (OCR)
format was not been adopted by USPTO
whereas it was adopted by some
jurisdictions and complied with either
WIPO ST.22 or .23.
Rules 5.2 and 13ter of PCT
regulations, Sections 208,
513, 610, and Annex C of
the Administrative
Instructions (AI) as in force
from July 1, 1992 http://
www.wipo.int/pct/en/
texts/pdf/pct_regulations_
history.pdf
28-11-1994 USPTO, EPO, and JPO worked with WIPO to establish the basis of what
was to become the WIPO Standard, ST.25 (1998). (http://www.wipo.int/
meetings/en/doc_details.jsp?doc_id¼4412, PCT/MIA/VI/15)
Various formats were entertained. USPTO, EPO, and JPO proposed the
development of a single patent
sequence database with a front and
back ends (http://www.wipo.int/
meetings/en/details.jsp?meeting_
id¼2529)
PCT/MIA/V/1 and PCT/MIA/
V/2 http://www.wipo.int/
meetings/en/details.jsp?
meeting_id¼2529
01-04-1995 If International Search Authority is prepared to transcribe the sequence
listing into a machine readable form, it may request payment for the
cost of such transcription (http://www.wipo.int/pct/en/texts/pdf/pct_
regulations_history.pdf)
Various formats were entertained. USPTO, EPO,
and JPO proposed the development of a single
patent sequence database with a front and back
ends (http://www.wipo.int/meetings/en/
details.jsp?meeting_id¼2529)
The format of sequence listings in paper
and electronic form differs based on
different patent offices requirements
and sequence listings were required to
be translated for consideration in the
national stage. (http://www.uspto.gov/
web/offices/com/sol/notices/fr019819.
html)
Rule 13ter.1(a) of PCT
regulations http://www.
wipo.int/pct/en/texts/pdf/
pct_regulations_history.pdf
01-07-1998 A sequence listing would need to be a separate part of the description in
accordance with Annex C of Administrative Instructions and if that
sequence listing contains any free text, that free text would need to
appear in the main part of the description as well in the language
thereof. WIPO new standard, ST.25, replaced ST.23 and ST.24 and
established the international meaning of “Sequence listing” for
nucleotide and/or amino acid sequence disclosure and allowed
applicants to submit a single sequence listing that is acceptable to all
receiving offices, International Search, and Preliminary Examining
Authorities (for the international phase) and designated and elected
offices (for the national phase). See Annex C in http://www.uspto.gov/
web/offices/pac/mpep/old/E7R0_AI.pdf
Submission of only one sequence listing in
paper and electronic form will be required now
and no translation is needed. Computer
readable form is only required when a
competent authority requires it (http://www.
wipo.int/standards/en/pdf/03-25-01.pdf)
(http://www.wipo.int/wipostad/en/standards/
st25-en/1-0/view#2255)
(http://www.wipo.int/standards/en/pdf/
archives/03-25-01arc2009.pdf) (http://www.
wipo.int/pct/en/texts/
pdf/pct_regulations_history.pdf)
International applications in electronic
form would have mandatory data
elements: 1. Applicant Name, 2. Title of
Invention, 3. Number of SEQ ID NOs, 4.
SEQ ID NO:, 5. Length (sequence length
expressed in number of base pairs or
amino acids), 6. Type (type of molecule
sequenced in SEQ ID NO: x, either DNA,
RNA or PRT; if a nucleotide sequence
contains both DNA and RNA fragments,
the value shall be “DNA”), 7. Organism
(Genus Species (that is, scientific name)
or “Artificial Sequence” or “Unknown”)
Sequence.
Rules 5.2 and 13ter.1(a) of
PCT regulations and section
513 of Administrative
instructions http://www.
wipo.int/wipostad/en/
standards/st25-en/1-0/
view#2255 http://www.
wipo.int/standards/en/pdf/
archives/03-25-01arc2009.
pdf
01-03-2001 New instructions were put in place to deal with the filing, format, fees,
preparation, and publications of extremely large international
applications containing nucleotide and/or amino acid sequence listings.
Sequence listings will be published on the Internet on the date of
publication of the rest of the international application. (http://www.
uspto.gov/web/offices/pac/mpep/old/E8R0_AI.pdf)
Sequence listings, filed as parts of the
international applications under the new
Section 801(a) of the Administrative
Instructions allowed the applicant to file the
sequence listings (and/or tables) as: “(i) only on
an electronic medium in the computer readable
form referred to in Annex C; or (ii) both on an
under Section 805, publication of
international applications in electronic
form is at the discretion of the Director
General. “As from 2 August 2001, the
sequence listing parts of the
international applications filed under
Section 801 of the Administrative
Part 8 of the Administrative
Instructions (sections 801
e806). http://www.wipo.
int/edocs/pctndocs/en/
2001/pct_news_2001_8.pdf
Table 1 (continued)
Patent
office
Entered
into force
Patentability requirements for nucleotide or peptide sequences Format of submitted sequence listings Format of published sequence listings/
Comments
Reference
O.A.Jeffersonetal./WorldPatentInformation43(2015)12e2416
electronic medium in that computer readable
form and on paper in the written form referred
to in Annex C; “Tables filed in computer
readable form under Section 801(a) shall
comply with one of the following character
formats: (i) UTF-8-encoded Unicode 3.0; or (ii)
XML format conforming to the “Application-
Body” Document Type Definition referred to in
Appendix I of Annex F; at the option of the
competent Authority.” (http://www.uspto.gov/
web/offices/pac/mpep/old/E8R1_AI.pdf)
Instructions under the PCT will be
published on the Internet on the date of
publication of the rest of the
international application of which it
forms a part. Publication of a given
international application containing a
sequence listing part filed under Section
801 will thus comprise two elements
published on the same day: (i) a paper
pamphlet, as now, for all parts other
than the sequence listing part, and (ii) a
new electronic portion for the sequence
listing part only; cross-references
between the two elements will be
included for the sake of clarity.” (http://
www.wipo.int/edocs/pctndocs/en/
2001/pct_news_2001_8.pdf)
01-01-2003 Standard options were introduced for Electronic Filing and Processing of
International Applications. Part 7 does not apply to international
applications containing sequence listings. Part 8 applies. However, if
applicants submit such applications electronically, they will be subject
to Part 7 and NOT part 8 of the administrative instructions. All technical
Requirements for the Presentation of Tables Related to Nucleotide and
Amino Acid Sequence Listings in International Patent Applications
under the PCT were provided in a new annex, Annex C-bis
Annex F provided the standard for electronic
filing but the details were published in the PCT
Gazette Special Issue No. S-04/2001 dated 27
December 2001. This was not available on WIPO
website on 10/4/2012. But a more recent
version of Annex F is available, for example, at
http://www.wipo.int/pct/en/texts/pdf/ai_anf.
pdf submission of sequence listings remained as
computer read form and on paper format. While
PCT charged for electronically submitted
sequence listings as txt file, USPTO did not
charge.
Sequence listings to be available on the
internet in multiple formats.
Part 7 of the Administrative
Instructions Annex C-bis
and Annex F http://www.
wipo.int/edocs/pctndocs/
en/2001/pct_news_2001_8.
pdf
01-04-2005 USPTO implemented the restriction requirement as of November 1996
to limit an applicant claims to no more than 10 nucleotide sequences in
one application. PCT/MIA/VI/9 Administrative Instructions under PCT
were silent on the restriction requirement.
Rule 13ter. of PCT regulations was amended to
provide consistent procedures before all
authorities and to request compliance with
either the electronic form or paper filing of
sequence listings contained in the international
applications in accordance with the Standard
established in Annex C (http://www.wipo.int/
pct/en/texts/pdf/pct_regulations_history.pdf)
Sequence listings to be available on the
internet in multiple formats.
Rule 13ter. of PCT
regulations http://www.
wipo.int/pct/en/texts/pdf/
pct_regulations_history.pdf
01-10-2007 New publication system was in place to provide: “ XML daily update
files. All SLs [sequence Listings] will be included (i.e. including the SLs
extracted from the pamphlets). SLs embedded in the description will be
gradually removed. A new structure is available as follows: publication/
year/week/WO_number for the SL files, updates/year/month for the
update files. All subsequently published SLs will be added to the
publication week directory and reported in the update file. All
subsequently deleted/replaced/added SLs will trigger the update of the
corresponding international application publication content and will be
reported in the update file.” (http://www.wipo.int/patentscope/en/
news/pctdb/2007/news_0010.html)
From 2001 until 2007, most sequence listings
were from the “mixed mode” electronic
submission (PCT application on paper whereas
the sequence listings filed electronically).
From 2001 until 2007, most PCT
application DID NOT comply with ST.25
text format rules. Most sequence
listings were in TIF or pdf to TIF files and
contained NON ASCII compliant text.
(http://www.fiz-zarlsruhe.de/uploads/
tx_ptgsarelatedfiles/0210_wipo_bbm.
pdf)
PatentScope, WIPO, and
STN International Website
http://www.wipo.int/
patentscope/en/news/
pctdb/2007/news_0010.
html
01-07-2009 In view of the practice of electronic submission for sequence listings,
Part 8 of the Administrative Instructions (Sections 801e806) and Annex
C-bis became irrelevant and were deleted from the administrative
instructions. (http://www.wipo.int/edocs/pctndocs/en/2009/pct_news_
2009_07.pdf) (http://www.wipo.int/pct/en/newslett/2009/06/article_
0002.html)
A number of other modifications were also
introduced to the Administrative Instructions
under the PCT in relation to the international
filing fees: 1. mixed mode sequence listing filing
(sequence listing and tables are filed in
electronic form while the remainder of the
international application is filed on paper, when
the receiving Office accepts the filing of such
“Where a copy of a ST.25-compliant
text format sequence listing has been
furnished to the ISA under Rule 13ter.1
(for the purposes of international
search only), the ISA will forward a copy
of such a sequence listing to the
International Bureau
The International Bureau will make a
Section 707(a-bis) of the
administrative Instructions
http://www.wipo.int/
export/sites/www/pct/en/
texts/pdf/ai_9.pdf
(continued on next page)
O.A.Jeffersonetal./WorldPatentInformation43(2015)12e2417
“mixed mode” applications, will no longer be
possible, 2. there will no longer be a page fee
payable for sequence listings filed in text format
as part of an international application filed in
electronic form, 3. full page fees will be payable
for all pages of a sequence listing filed in image
format (for example, PDF format) or on paper, 4.
sequence listings, filed only for the purposes of
international search, will become publicly
available, 5. Full page fees for tables containing
sequence listings regardless of the format
submitted in (image or paper or electronic)
(http://www.wipo.int/export/sites/www/pct/
en/texts/pdf/ai_9.pdf)
copy of all sequence listings in text
format received publicly available on
PATENTSCOPE®
” (http://www.wipo.int/
pct/en/newslett/2009/06/article_0002.
html). The 2009 rules did not seem to
impact on the compliance with ST.25
text format rules and WIPO is still
sorting the 1999e2006 backlog of
sequence listings (mostly image files)
(http://www.fiz-karlsruhe.de/uploads/
tx_ptgsarelatedfiles/0210_wipo_bbm.
pdf)
01-01-2011 Paragraphs 2, 3bis, 4bis, 38 and 42 of Annex C of the Administrative
Instructions under the PCT was amended in relation to the correction,
rectification or amendment of sequence listings. These changes are only
applicable in respect to international applications files on or after
January 1, 2011. See page 1 in http://www.wipo.int/edocs/pctndocs/en/
2012/pct_news_2012_13.pdf
Annex C of the Administrative
Instructions, Rule 13ter http://www.
wipo.int/edocs/pctndocs/en/2012/pct_
news_2012_13.pdf
(April 3, 2012) A circular will be sent to all receiving Offices, International Searching
Authorities and designated Offices to introduce the new ST.26 XML
standard with links to example sequence listings in XML format (and
comparing features using ST.25 standard with those using the new
ST.26 XML standard). The Circular will inquire on when and how
implementation of the new standard can be facilitated and
accomplished over time.
Currently, the sequence listing software tool,
PatentIn, is being replaced by BISSAP, which is
expected to support both ST.25 and a draft
version of the new ST.26 XML standard. BISSAP
is being developed by European Patent Office
and will be used across all offices to help in the
preparation and processing of sequence listings.
(www.wipo.int/edocs/mdocs/pct/en/pct_wg_5/
pct_wg_5_14.doc)
Thirty percent of the sequence listings
downloadable from WIPO website are
in txt format, the rest is in image or pdf
unsearchable formats and they are
difficut to render in searchable format.
PCT/WG/5/14 www.wipo.
int/edocs/mdocs/pct/en/
pct_wg_5/pct_wg_5_14.
doc
EPO 01-01-1993 Rule 27 a (1) If nucleotide or amino acid sequences are disclosed in the
European patent application the description shall contain a sequence
listing conforming to the rules laid down by the President of the
European Patent Office for the standardized representation of
nucleotide and amino acid sequences. (4) A sequence listing filed after
the date of filing shall not form part of the description.
European Patent Convention (EPC
1973) Rule 27a (1), (4) (OJ EPO 1992,
342 ff). http://www.epo.org/law-
practice/legal-texts/html/epc/1973/e/
r27a.html
October 2,1998 Rule 27 a amended (2) The President of the European Patent Office may
require that, in addition to the written application documents, a
sequence listing in accordance with paragraph 1 be submitted on a data
carrier prescribed by him accompanied by a statement that the
information recorded on the data carrier is identical to the written
sequence listing, (3) If a sequence listing is filed or corrected after the
date of filing, the applicant shall submit a statement that the sequence
listing so filed or corrected does not include matter which goes beyond
the content of the application as filed.
The Sequence is to be submitted on a data
carrier
EPC 1998, R. 27a(2), (3) (Suppl. No. 2 to
OJ EPO 11/1998) http://www.epo.org/
law-practice/legal-texts/html/epc/
1973/e/r27a.html
13-12-2007 Rule 30 was introduced to meet the requirements of European patent
applications relating to nucleotide and amino acid sequences. Art. 56,
57, 80
R. 42 are relevant here. (1) If nucleotide or amino acid sequences are
disclosed in the European patent application, the description shall
contain a sequence listing conforming to the rules laid down by the
President of the European Patent Office for the standardized
representation of nucleotide and amino acid sequences, (2) A sequence
listing filed after the date of filing shall not form part of the description,
Disclosed sequences within the meaning of Rule
30(1) in the European patent application are to
be represented in a sequence listing which
conforms to WIPO Standard ST. 25. They can be
filed electronically and on paper. In such a case,
a copy of the sequence listing must also be
submitted in computer-readable form. (Special
edition No. 3, OJ EPO 2007, C.1 and C2)
Access to published patent sequence
data is via the EBI's website and you
purchase and bulk download patent
sequences from EPO site (http://www.
epo.org/searching/free/publication-
server/sequence-listings.html)
EPC(1973) to EPC Rule 30
replaced rule 27a(1) and
(4).Rule 27a (2), (3) was
deleted and a new clause
(3) was added to Rule 30.
http://www.epo.org/law-
practice/legal-texts/html/
epc/2010/e/r30.html
Table 1 (continued)
Patent
office
Entered
into force
Patentability requirements for nucleotide or peptide sequences Format of submitted sequence listings Format of published sequence listings/
Comments
Reference
O.A.Jeffersonetal./WorldPatentInformation43(2015)12e2418
(3) Where the applicant has not filed a sequence listing complying with
the require-ments under paragraph 1 at the date of filing, the European
Patent Office shall invite the applicant to furnish such a sequence listing
and pay the late furnishing fee. If the applicant does not furnish the
required sequence listing and pay the required late furnishing fee
within a period of two months after such an invitation, the application
shall be refused.
28-04-2011 European Patent Office in collaboration with national patent offices and
the European Bioinformatics Institute developed BiSSAP, which is a
computer program designed to facilitate submission of sequence
listings in patent applications (http://archive.epo.org/epo/pubs/oj011/
06_11/06_3761.pdf)
“BiSSAP can be used to prepare and verify
sequences, generate the sequence listing files
for submission, import existing sequence
listings in WIPO ST. 25 and convert between
sequence listing formats (WIPO ST. 25 and XML
proposal). It also contains a “batch verification”
module allowing users to verify collections of
sequence listings.”
Art. 6(2) Dec. of the President of the EPO
dated 28 April 2011 on the filing of
sequence listings, OJ EPO 2011, 372
requires conversion of sequence listings
into a pdf format. If they can not be
searchable, then public access will be
affected.
Rule 30 EPC, Rule 5.2 PCT,
and the Decision of the
President and Notice from
the EPO dated 28 April 2011
(OJ EPO 6/2011, 372 ff).
http://www.epo.org/law-
practice/legal-texts/html/
epc/2010/e/r30.html,
http://archive.epo.org/epo/
pubs/oj011/06_11/06_
3761.pdf
18-10-2013 Filing of Sequence listings was amended to: “1.1 If nucleotide or amino
acid sequences are disclosed in a European patent application, the
description must contain a sequence listing complying with WIPO
Standard ST.25 (Standard for the Presentation of Nucleotide and Amino
Acid Sequence Listings in Patent Applications - hereinafter referred to as
the “Standard”) (Rule 30(1) EPC in conjunction with Article 1 of the
decision of the President).”
“1.4 Under Article 1(1) of the decision of the
President, sequence listings must be submitted
in electronic form, i.e. in text format (TXT).
Further information about the document format
is set out in the Standard. The sequence listing
should no longer be filed on paper or, in the case
of electronic filing of the application, in PDF
format (see Article 1(1) and (2) of the decision of
the President). If the applicant also files the
sequence listing of his own accord on paper or
in PDF format, he must submit a statement that
the sequence listings in electronic form and on
paper or in PDF format are identical. In this case,
the paper or PDF form will be disregarded in the
further procedure.” http://archive.epo.org/epo/
pubs/oj013/11_13/11_5423.pdf
Check user feedback on the proposed
XML format ST.26 for public access to
sequence listings at (http://documents.
epo.org/projects/babylon/eponet.nsf/0/
97F67F6DDAF14D59C1257A070032C35B/$File/CL2012-522-0425User%
20feedback-on-ST.26-final.pdf)
Notice from EPO http://archive.epo.org/epo/
pubs/oj013/11_13/11_5423.pdf
O.A.Jeffersonetal./WorldPatentInformation43(2015)12e2419
Table 2
Issued patents that reference a 15 mer portion of GALNT18 gene (GGTTGGTGTGGTTGG) in the claims under various SEQ IDs along with the corresponding claim, the manually analyzed claim category, the applicant name, and the
filing date of that patent document.
Patent number_SEQ ID Claims Claim category Filing date Applicant
US_5840867_A_21 A composition consisting essentially of the aptamer having the formula: GGTTGGTGTGGTTGG (SEQ
ID NO:19), GGTTGGTGTGGTTGG.sup.#G.sup.#T (SEQ ID NO:20), GGTTGGTGTGGTT.sup.*G.sup.*G
(SEQ ID NO:21),
G.sup.*G.sup.*T.sup.*T.sup.*G.sup.*G.sup.*T.sup.*G.sup.*T.sup.*G.sup.*G.sup.*T.sup.*T.sup.*G.sup.*G
(SEQ ID NO:22)
Artificial 05-03-1994 GILEAD SCIENCES INC
US_5840867_A_22 A composition consisting essentially of the aptamer having the formula: GGTTGGTGTGGTTGG (SEQ
ID NO:19), GGTTGGTGTGGTTGG.sup.#G.sup.#T (SEQ ID NO:20), GGTTGGTGTGGTT.sup.*G.sup.*G
(SEQ ID NO:21),
G.sup.*G.sup.*T.sup.*T.sup.*G.sup.*G.sup.*T.sup.*G.sup.*T.sup.*G.sup.*G.sup.*T.sup.*T.sup.*G.sup.*G
(SEQ ID NO:22),
Artificial 05-03-1994 GILEAD SCIENCES INC
US_5840867_A_19 A composition consisting essentially of the aptamer having the formula: GGTTGGTGTGGTTGG (SEQ
ID NO:19), GGTTGGTGTGGTTGG.sup.#G.sup.#T (SEQ ID NO:20), GGTTGGTGTGGTT.sup.*G.sup.*G
(SEQ ID NO:21),
G.sup.*G.sup.*T.sup.*T.sup.*G.sup.*G.sup.*T.sup.*G.sup.*T.sup.*G.sup.*G.sup.*T.sup.*T.sup.*G.sup.*G
(SEQ ID NO:22),
Sequence claimed 05-03-1994 GILEAD SCIENCES INC
US_5756291_A_29 A method to detect the presence or absence of thrombin, which method comprises: a) contacting a
sample suspected of containing thrombin with a single-stranded DNA aptamer coupled to a label
under conditions wherein a complex between thrombin and the aptamer is formed; and b)
detecting the presence or absence of said complex indicating the presence or absence of thrombin;
wherein said aptamer comprises the sequence:##STR17## wherein N is A, T or G. The method of
claim 1 wherein N is T. The method of claim 1 wherein said aptamer comprises the
sequence:##STR18## wherein: N is A, T, or G; and Z is an integer from 2 to 5. The method of claim 3
wherein Z is 3. The method of claim 4 wherein said aptamer comprises the sequence:##STR19##
The method of claim 5 wherein N is T (SEQ ID NO: 29).
Probe or primer used in a method claim 06-07-1995 GILEAD SCIENCES INC
US_6323185_B1_53 An oligonucleotide having a nucleotide sequence chosen from the group consisting of SEQ ID NOS 2
e27, 29, 31e39, 46e52 and 53e87, wherein said nucleotide sequence is optionally modified at the
30
terminus or 50
terminus by attachment of a substituent moiety selected from the group consisting
of propylamine, poly-L-lysine, cholesterol, fatty acid chains of length 2 to 24 carbons, and vitamin E.
Sequence claimed 17/07/1996 US HEALTH
US_5691145_A_3 An oligonucleotide which forms an intramolecular G-quartet structure, the oligonucleotide being
labeled with a donor fluorophore and an acceptor, the donor fluorophore and the acceptor selected
such that fluorescence of the donor fluorophore is quenched by the acceptor when the
oligonucleotide forms the G-quartet structure and quenching of donor fluorophore fluorescence is
reduced upon unfolding of the G-quartet structure. The oligonucleotide of claim 1 consisting of SEQ
ID NO:3 labeled with the donor fluorophore and the acceptor.
Artificial 27/08/1996 BECTON DICKINSON CO
US_5882870_A_1 A kit for the reversible anticoagulation of blood comprising: a nucleic acid ligand which binds
thrombin, said nucleic acid ligand selected from the group consisting of a nucleic acid ligand having
SEQ ID NO:1, a nucleic acid ligand having SEQ ID NO:2, a bi-directional nucleic acid ligand
comprising two oligonucleotide segments having SEQ ID NO:5 linked at their respective 30
ends to a
phosphodiester group each of which is linked to a hexaethylene glycol chain, and a bi-directional
nucleic acid ligand comprising two oligonucleotide segments having SEQ ID NO:6 linked at their
respective 30
ends to a phosphodiester group each of which is linked to a glycerol derivative; and a
reversing agent which has greater affinity for the nucleic acid ligand than does thrombin, said
reversing agent selected from the group consisting of compositions comprising a nucleic acid
sequence complementary to that of said nucleic acid ligand, single-stranded DNA binding proteins,
copper (II), mercury (II), silver (I) and platinum complexes.
Subpart 14/01/1998 BECTON DICKINSON CO
US_6780850_B1_1 A composition comprising: a nucleic acid, that is derivatized at the 50
or 30
end or at both the 50
and 30
ends with streptavidin or a variant of streptavidin that retains biotin binding activity, that
specifically binds to thrombin, wherein said nucleic acid is 20
-fluoropyrimidine RNA or 20
-
aminopyrimidine RNA. The composition of claim 1, wherein the nucleic acid comprises nucleotides
having the sequence of SEQ ID NO: 1 or SEQ ID NO: 2. The composition of claim 13, wherein the
nucleic acid comprises nucleotides having the RNA sequence corresponding to SEQ ID NO: 1 or SEQ
ID NO: 2.
Subpart 22/06/2000 TRIUMF
O.A.Jeffersonetal./WorldPatentInformation43(2015)12e2420
As a single gene patent [29] may disclose from one to millions of
sequences but claim a selection individually in various patent
family members, or in different combinations in other versions/
family members, the volume of the redundant data can be over-
whelming. At present, sifting through such information requires
the use of all public databases and/or the costly use of combined
searches from commercially available databases [30].
An open public dataset with incorporated transparency metrics
that allow access to global published patent sequence data with
links to corresponding gene patents and interrogation through a
sequence search facility, is needed. Below is the progress report
towards building such a public facility.
4. PatSeq data platform
The new open and interactive online platform, PatSeq Data [31],
enables access to patents disclosing genetic sequences and bulk
downloads of disclosed sequence data based on jurisdiction,
document type, and either sequence type or sequence location. It
also serves as a global, open repository for national systems to
enable public sharing of sequence data associated with patents.
Three new transparency metrics were implemented in PatSeq
Data to foster confidence in the quality and quantity of the data,
whenever is made available; 1) detailed account of which juris-
dictions provide (or not) the sequence listings data, what they
provide, and how their data compares with that of PatSeq database;
2) ability to dynamically monitor and compare the degree of
overlap between each of data sources with each release date,
including the data sources from the public databases and patent
offices; 3) ability to link from PatSeq Data to the Lens and other
PatSeq tools, such as PatSeq Finder to conduct sequence searches,
view family members and download relevant information
including original patent documents (Fig. 1).
Using the second metric, for example, we were able to identify
missing sequence entries from 1990 to 2001 and missing bulk
sequence listings with more than 250 k sequence entries in Gen-
Bank. Moreover, we learned that protein sequence records from EBI
do not match those of GenBank or DDBJ patent division and may
not be synchronized with the EPO data. Table 3 depicts the progress
made so far on PatSeq database and shows the latest release data
from May 29, 2015. In summary, the current holdings of PatSeq
database are 232,590,639 sequences corresponding to more than
425,028 biological patent documents and the plan is to continue
parsing and adding any data available, especially from EPO in the
near future.
Considering that patents can be important for domestic and
global policymaking, within each jurisdiction, sequence-filing rules
were also tracked, whenever the information was available, and
displayed in the Dossier view of PatSeq Data. Sequence location
within a patent document was shown as well wherever possible
and sequence types based on publication year and document type
depicted. In the patent offices that act as an International Search
Fig. 1. Dossier view of an example jurisdiction, United States of America, in PatSeq Data, compares biological patent holdings found in the Lens with national and regional patent
offices databases, views sequence disclosures across jurisdictions over time, and allows download of sequence collections or link to the Lens to perform other searches and analyses.
This figure also depicts the newly introduced transparency metrics to accurately account for sequence listings across public databases, national and regional patent offices.
O.A. Jefferson et al. / World Patent Information 43 (2015) 12e24 21
Authority, we developed timelines for relevant legal changes dur-
ing the past 30 years (Fig. 2).
As users enter the PatSeq Data site, they are offered a globe, map
or table [32] summary view wherein they can monitor the latest
holdings of PatSeq database based on the release date depicted
under the total holdings. Users can brush over the patent or
sequence data based on publication year and document type, link to
each year's patent collection in the Lens where they can explore
other PatSeq tools, such as PatSeq Finder, or simply download the
disclosed sequences. Sequences are downloadable based on
Fig. 2. Dossier view of an example jurisdiction, United States of America, in PatSeq Data showing the timelines for patentability requirements for sequences disclosed in patent
applications (see upper sections) and legal changes (legislative, administrative, and Court cases in lower sections). Clicking on each event would allow users to view a short
description of that event and to expand the view to check the reference of that information.
Table 3
Shared data between the various data sources based on PatSeq database holdings as of May 29, 2015 release date. Available sequence/
document counts are depicted for each database and as shared between two databases.
Data available/shared between two databasesa
Sequence count Patent count
EMBL_EBI 3,80,20,495 2,09,951
EMBL_EBI shares with USPTO 1,19,82,981 71,242
EMBL_EBI shares with WIPO 50,82,962 13,579
NCBI 3,59,21,386 1,84,878
NCBI shares with USPTO 1,20,84,974 72,534
NCBI shares with WIPO 47,10,219 10,524
DDBJ 3,69,69,064 2,01,130
DDBJ shares with USPTO 1,15,36,423 61,114
DDBJ shares with WIPO 47,78,925 11,789
DDBJ shares with NCBI 3,51,44,194 1,70,752
DDBJ shares with EMBL_EBI 3,61,23,009 1,83,885
NCBI shares with EMBL_EBI 3,58,06,474 1,83,338
USPTO 15,75,31,318 2,14,512
WIPO 3,52,73,473 29,662
CIPO 1,70,40,805 38,284
a
Sequence listings from EPO full text documents are yet to be included in the database.
O.A. Jefferson et al. / World Patent Information 43 (2015) 12e2422
document type, sequence type, and sequence location in document.
For example, the title, “Grants: Nucleotides (all)”, refers to nucle-
otide sequences disclosed in granted patent documents regardless
of where they are referenced in the documents whereas “Grants:
Nucleotides (in Claims)” refers to a subset of the earlier collection
and wherein the nucleotide sequences are referenced in the claims
of the granted patent documents. The data is available at no cost for
non-commercial users and for a fee for commercial users.
By using graphical globe or map function and hovering over a
jurisdiction, users can view in a floating tooltip the type of
contextual information available from that jurisdiction at the time
of their visit, access it in the dossier view and if sequence disclo-
sures were shared with us, users would be able to download them.
Once in the dossier view, for instance Germany [33], users may
choose to explore and link to all related biological patents available
for Germany in the Lens, view the mechanism by which Germany
shares publicly the data, learn about its format and coverage
whenever provided, examine annual biological patent holdings,
and compare these holdings with the declared holdings of the
patent office, as they become available.
Under “Sequences”, users can learn more about the nature of the
disclosed sequences. For example, while brushing over sequence
holdings in a particular year, the proportion of sequence types and
their distribution in the patent documents (in claims, summary,
drawings, example, and specification), and sources of data are
displayed dynamically allowing for a direct comparison of data
sources. The statistics reflect PatSeq data as of the published release
date shown in the summary view and in each of the jurisdiction
dossier view, and will update automatically with the regular data
feed updates (currently it is at monthly intervals) or as more
sequence listing data sources are added in the master database. For
example, in April 15, 2015, the contents of PatSeq database
increased by 25%.
The other relevant contextual information in the dossier view
includes; a) sequence filing rules for either nucleotide or peptide in
that jurisdiction based on Cambia 2011- and WIPO 2001 surveys
[34], b) a timeline for relevant legal changes in the jurisdictions that
act as International Search Authority (Fig. 2), and c) contact details
of the officer who contributed the information or a link to the actual
patent office website for more details if that information is pro-
vided on the official website.
Before releasing this facility, more than 50 patent offices were
consulted with and some of their requested features incorporated
in PatSeq Data. Moreover, offices such as USPTO, IP Australia, CIPO,
and GPTO have contributed sequence data to be included in PatSeq
database and while others promised to do so, some offices were not
in a position to provide the data such as the Danish Patent and
Trademark Office or the Israel patent office that does not even
publish the sequence data along with the patent (personal com-
munications). As it is now clear that many patent offices simply do
not have access to the analytic or server capabilities to host their
sequences in a useful manner, the PatSeq Data tool and the entire
PatSeq facility as a global non-government activity, offers them
such a service. The Lens collaborative project will continue reaching
out to other patent offices to demonstrate the public value of Pat-
Seq facility.
5. Conclusion
Many governments face tough policy choices around the pro-
tection or use of IP on biological technologies and materials. The
addition of new data from diverse patent offices and com-
plementing the missing data through patent family association will
enable users to compare patenting activity between various juris-
dictions, and engage in better-informed debates on the appropriate
degree of gene patenting to optimize economic and social impacts.
PatSeq Data allows offices to upload and share their holdings, and
for users to download and analyze sequence sets associated with
global patent documents.
Acknowledgments
This work was supported, in part, by the Bill & Melinda Gates
Foundation, Global Health Grant ID 52239; Gordon and Betty
Moore Foundation “Grant GBMF3465”; Queensland University
Technology “Grant 321121-0023/08”; and Queensland University
Technology and Syngenta Crop Protection AG “Research collabo-
ration No: 1400001566”. We thank the Lens team for their
continued support and improvement of the Lens functionalities
and Small Multiples, a private visualization company in Sydney,
Australia for implementing the open source-globe feature in the
platform design of PatSeq Data. We also appreciate the assistance of
Nina Prasolova and Innokenti Epichev in the research phase of this
project.
References
[1] A. Devlin, The misunderstood function of disclosure, Pat. Law Harv. J. Law
Technol. 23 (2010) 401e446.
[2] P. Drahos, Rethinking the Role of the Patent Office from the Perspective of
Responsive Regulation, Chapter 5 in Emerging Markets and the World Patent
Order: The Forces of Change by F.M. Abbott, C.M. Correa and P. Drahos,
Edward Elgar Publishing, Cheltenham, 2014, pp. 78e99.
[3] J. Kraus, T. Takenaka, Construction of an Efficient and Balanced Patent System:
Patentability and Patent Scope of Isolated DNA Sequence Under US Patent Act
and EU Biotech Directive, Chapter 11 in Constructing European Intellectual
Property : Achievements and New Perspectives by C. Geiger, Edward Elgar
Publishing, Cheltenham, 2013, pp. 255e270.
[4] http://seqdata.uspto.gov/sequence.html?DocID¼20010000241.
[5] R. Wax, J. Coburn, Sequence rule compliance dseparating the wizards from
the muggles, Biotechnol. Law Rep. 22 (2003) 397e400.
[6] http://www.wipo.int/standards/en/pdf/03-25-01.pdf.
[7] www.wipo.int/edocs/mdocs/pct/en/pct_wg_5/pct_wg_5_14.doc.
[8] R. Jones, Errors in patent application sequence listings, Nat. Biotechnol. 21
(2003) 1239e1240.
[9] https://www.stn-international.org/uploads/tx_ptgsarelatedfiles/0210_wipo_
bbm.pdf.
[10] http://www.ncbi.nlm.nih.gov/genbank/.
[11] D.A. Benson, I. Karsch-Mizrachi, D.J. Lipman, J. Ostell, D.L. Wheeler, GenBank,
Nucleic Acids Res. 33 (Database issue) (2005) D34eD38. Available at: http://
www.ncbi.nlm.nih.gov/pmc/articles/PMC540017/.
[12] http://www.ebi.ac.uk.
[13] http://www.ddbj.nig.ac.jp/.
[14] http://www.insdc.org.
[15] I. Karsch-Mizrachi, Y. Nakamura, G. Cochrane, The international nucleotide
sequence database collaboration, Nucleic Acids Res. 40 (Database issue)
(2012) D33eD37. Available at http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3244996/.
[16] http://www.ebi.ac.uk/ena.
[17] G. Cochrane, P. Aldebert, N. Althorpe, M. Andersson, W. Baker, A. Baldwin,
K. Bates, S. Bhattacharyya, P. Browne, A. van den Broek, et al., EMBL nucleotide
sequence database: developments in 2005, Nucleic Acids Res. 34 (Database
issue) (2006) D10eD15. Available at: http://www.ncbi.nlm.nih.gov/pmc/
articles/PMC1347492/.
[18] http://www.ebi.ac.uk/ebisearch/advancedsearchwizard.ebi?
domain¼patentdb.
[19] W. Li, H. McWilliam, A.R. de la Torre, A. Grodowski, I. Benediktovich,
M. Goujon, S. Nauche, R. Lopez, Non-redundant patent sequence databases
with value added annotations at two levels, Nucleic Acids Res. 38 (Database
issue) (2010) D52eD56. Available at http://www.ncbi.nlm.nih.gov/pmc/
articles/PMC2808894/.
[20] J. McDowall, Prioritizing patent sequence search results using annotation-rich
data, World Pat. Inf. 33 (2011) 236.
[21] K. Okubo, H. Sugawara, T. Gojobori, Y. Tateno, DDBJ in preparation for over-
view of research activities behind data submissions, Nucleic Acids Res. 34
(Database issue) (2006) D6eD9. Available at: http://www.ncbi.nlm.nih.gov/
pmc/articles/PMC1347473/.
[22] Eli Kaminuma, Takehide Kosuge, Yuichi Kodama, Hideo Aono,
Jun Mashima,Takashi Gojobori, Hideaki Sugawara, Osamu Ogasawara,
Toshihisa Takagi, Kousaku Okubo, Yasukazu Nakamura, DDBJ progress report,
Nucleic Acids Res. 39 (Database issue) (2011) D22eD27.
[23] Ibid
[24] http://verdi.kobic.re.kr/patome_kr_en/.
O.A. Jefferson et al. / World Patent Information 43 (2015) 12e24 23
[25] http://www.intellogist.com/wiki/NASDAP.
[26] P.J. Andree, et al., A comparative study of patent sequence databases, World
Pat. Inf. 30 (2008) 300e308.
[27] O.A. Jefferson, D. K€ollhofer, T.H. Ehrich, R.A. Jefferson, Transparency tools in
gene patenting for informing policy and practice, Nat. Biotechnol. 31 (2013)
1086e1093. http://www.nature.com/nbt/journal/v31/n12/full/nbt.2755.html.
[28] https://www.lens.org/lens/bio/patseqanalyzer#psa//homo_sapiens/latest/
chromosome/11/11494656-11612692.
[29] We use the term ‘gene patent’ to include patents and patent applications that
disclose and/ or claim nucleotide or peptide sequences. Thus not all ‘gene
patents’ in this use have enforceable rights, nor do they necessarily include
sequences as essentially claimed material
[30] Ibid, Supra note 26.
[31] https://www.lens.org/lens/bio/patseqdata.
[32] https://www.lens.org/lens/bio/patseqdata#globe/; https://www.lens.org/lens/
bio/patseqdata#map/US/; and https://www.lens.org/lens/bio/patseqdata#table
/US/.
[33] http://patseqdev.lens.org/lens/bio/patseqdata#globe/DE/.
[34] WIPO Secretariat 48, WIPO, Geneva, 2001. Available at: http://www.wipo.int/
edocs/mdocs/tk/en/wipo_grtkf_ic_1/wipo_grtkf_ic_1_6.pdf.
O.A. Jefferson et al. / World Patent Information 43 (2015) 12e2424

More Related Content

What's hot

TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)Dag Endresen
 
The role of biodiversity informatics in GBIF, 2021-05-18
The role of biodiversity informatics in GBIF, 2021-05-18The role of biodiversity informatics in GBIF, 2021-05-18
The role of biodiversity informatics in GBIF, 2021-05-18Dag Endresen
 
GBIF towards 2030 (November 2018)
GBIF towards 2030 (November 2018)GBIF towards 2030 (November 2018)
GBIF towards 2030 (November 2018)Dag Endresen
 
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)Dag Endresen
 
Integrating Covid-19 Bioassays in the Open Research Knowledge Graph
Integrating Covid-19 Bioassays in the Open Research Knowledge GraphIntegrating Covid-19 Bioassays in the Open Research Knowledge Graph
Integrating Covid-19 Bioassays in the Open Research Knowledge GraphJennifer D'Souza
 
FAIR and open biodiversity collection data management
FAIR and open biodiversity collection data managementFAIR and open biodiversity collection data management
FAIR and open biodiversity collection data managementDag Endresen
 
Museum collections as research data - October 2019
Museum collections as research data - October 2019Museum collections as research data - October 2019
Museum collections as research data - October 2019Dag Endresen
 
Data Exchange Model Of EPGRIS, seminar at the Vavilov Institute in St Petersb...
Data Exchange Model Of EPGRIS, seminar at the Vavilov Institute in St Petersb...Data Exchange Model Of EPGRIS, seminar at the Vavilov Institute in St Petersb...
Data Exchange Model Of EPGRIS, seminar at the Vavilov Institute in St Petersb...Dag Endresen
 
GBIF and Biodiversity informatics for museums, 15 March 2021
GBIF and Biodiversity informatics for museums, 15 March 2021GBIF and Biodiversity informatics for museums, 15 March 2021
GBIF and Biodiversity informatics for museums, 15 March 2021Dag Endresen
 
The Biodiversity Informatics Landscape
The Biodiversity Informatics LandscapeThe Biodiversity Informatics Landscape
The Biodiversity Informatics LandscapeVince Smith
 
Subject Headings Authority File for Filipiniana materials / Annelyn C. Kim an...
Subject Headings Authority File for Filipiniana materials / Annelyn C. Kim an...Subject Headings Authority File for Filipiniana materials / Annelyn C. Kim an...
Subject Headings Authority File for Filipiniana materials / Annelyn C. Kim an...CILIP MDG
 
110- Freyman Knowledge flows Linking big dataset
110- Freyman Knowledge flows Linking big dataset110- Freyman Knowledge flows Linking big dataset
110- Freyman Knowledge flows Linking big datasetinnovationoecd
 
GBIF data publishing. GBIF seminar in Bergen. 2016-12-14
GBIF data publishing. GBIF seminar in Bergen. 2016-12-14GBIF data publishing. GBIF seminar in Bergen. 2016-12-14
GBIF data publishing. GBIF seminar in Bergen. 2016-12-14Dag Endresen
 
Antimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureAntimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureChris Southan
 
Global Biodiversity Information Facility (GBIF) - 2012
Global Biodiversity Information Facility (GBIF) - 2012Global Biodiversity Information Facility (GBIF) - 2012
Global Biodiversity Information Facility (GBIF) - 2012Dag Endresen
 
Research data management: definitions, drivers and resources
Research data management: definitions, drivers and resourcesResearch data management: definitions, drivers and resources
Research data management: definitions, drivers and resourcesMartin Donnelly
 
Toxicological information in PubChem
Toxicological information in PubChemToxicological information in PubChem
Toxicological information in PubChemSunghwan Kim
 

What's hot (20)

TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
 
The role of biodiversity informatics in GBIF, 2021-05-18
The role of biodiversity informatics in GBIF, 2021-05-18The role of biodiversity informatics in GBIF, 2021-05-18
The role of biodiversity informatics in GBIF, 2021-05-18
 
GBIF towards 2030 (November 2018)
GBIF towards 2030 (November 2018)GBIF towards 2030 (November 2018)
GBIF towards 2030 (November 2018)
 
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
 
Integrating Covid-19 Bioassays in the Open Research Knowledge Graph
Integrating Covid-19 Bioassays in the Open Research Knowledge GraphIntegrating Covid-19 Bioassays in the Open Research Knowledge Graph
Integrating Covid-19 Bioassays in the Open Research Knowledge Graph
 
FAIR and open biodiversity collection data management
FAIR and open biodiversity collection data managementFAIR and open biodiversity collection data management
FAIR and open biodiversity collection data management
 
Museum collections as research data - October 2019
Museum collections as research data - October 2019Museum collections as research data - October 2019
Museum collections as research data - October 2019
 
Data Exchange Model Of EPGRIS, seminar at the Vavilov Institute in St Petersb...
Data Exchange Model Of EPGRIS, seminar at the Vavilov Institute in St Petersb...Data Exchange Model Of EPGRIS, seminar at the Vavilov Institute in St Petersb...
Data Exchange Model Of EPGRIS, seminar at the Vavilov Institute in St Petersb...
 
GBIF and Biodiversity informatics for museums, 15 March 2021
GBIF and Biodiversity informatics for museums, 15 March 2021GBIF and Biodiversity informatics for museums, 15 March 2021
GBIF and Biodiversity informatics for museums, 15 March 2021
 
The Biodiversity Informatics Landscape
The Biodiversity Informatics LandscapeThe Biodiversity Informatics Landscape
The Biodiversity Informatics Landscape
 
Subject Headings Authority File for Filipiniana materials / Annelyn C. Kim an...
Subject Headings Authority File for Filipiniana materials / Annelyn C. Kim an...Subject Headings Authority File for Filipiniana materials / Annelyn C. Kim an...
Subject Headings Authority File for Filipiniana materials / Annelyn C. Kim an...
 
110- Freyman Knowledge flows Linking big dataset
110- Freyman Knowledge flows Linking big dataset110- Freyman Knowledge flows Linking big dataset
110- Freyman Knowledge flows Linking big dataset
 
GBIF data publishing. GBIF seminar in Bergen. 2016-12-14
GBIF data publishing. GBIF seminar in Bergen. 2016-12-14GBIF data publishing. GBIF seminar in Bergen. 2016-12-14
GBIF data publishing. GBIF seminar in Bergen. 2016-12-14
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
 
EMBL
EMBLEMBL
EMBL
 
Antimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureAntimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosure
 
Global Biodiversity Information Facility (GBIF) - 2012
Global Biodiversity Information Facility (GBIF) - 2012Global Biodiversity Information Facility (GBIF) - 2012
Global Biodiversity Information Facility (GBIF) - 2012
 
EMBL-ABR_ AGRF2016
EMBL-ABR_ AGRF2016EMBL-ABR_ AGRF2016
EMBL-ABR_ AGRF2016
 
Research data management: definitions, drivers and resources
Research data management: definitions, drivers and resourcesResearch data management: definitions, drivers and resources
Research data management: definitions, drivers and resources
 
Toxicological information in PubChem
Toxicological information in PubChemToxicological information in PubChem
Toxicological information in PubChem
 

Viewers also liked

mark_neveu_resume_2015
mark_neveu_resume_2015mark_neveu_resume_2015
mark_neveu_resume_2015Mark Neveu
 
7วิชาสามัญ เคมี (1)
7วิชาสามัญ เคมี (1)7วิชาสามัญ เคมี (1)
7วิชาสามัญ เคมี (1)FlookBoss Black
 
Stabilization Release Worksheet
Stabilization Release WorksheetStabilization Release Worksheet
Stabilization Release WorksheetSam Major
 
DelURideSponsorProposal_APR21
DelURideSponsorProposal_APR21DelURideSponsorProposal_APR21
DelURideSponsorProposal_APR21Charlie Wildman
 
Tamer Anwar - CV (1) (2)
Tamer Anwar - CV (1) (2)Tamer Anwar - CV (1) (2)
Tamer Anwar - CV (1) (2)Tamer Anwar
 
7วิชาสามัญ ชีววิทยา
7วิชาสามัญ ชีววิทยา7วิชาสามัญ ชีววิทยา
7วิชาสามัญ ชีววิทยาFlookBoss Black
 
7วิชาสามัญ เคมี
7วิชาสามัญ เคมี7วิชาสามัญ เคมี
7วิชาสามัญ เคมีFlookBoss Black
 
Thromboembolism 7- 5-15
Thromboembolism 7- 5-15Thromboembolism 7- 5-15
Thromboembolism 7- 5-15Md. Shameem
 
7วิชาสามัญ เคมี
7วิชาสามัญ เคมี7วิชาสามัญ เคมี
7วิชาสามัญ เคมีFlookBoss Black
 

Viewers also liked (18)

mark_neveu_resume_2015
mark_neveu_resume_2015mark_neveu_resume_2015
mark_neveu_resume_2015
 
7วิชาสามัญ เคมี (1)
7วิชาสามัญ เคมี (1)7วิชาสามัญ เคมี (1)
7วิชาสามัญ เคมี (1)
 
Stabilization Release Worksheet
Stabilization Release WorksheetStabilization Release Worksheet
Stabilization Release Worksheet
 
Animal abuse
Animal abuseAnimal abuse
Animal abuse
 
DelURideSponsorProposal_APR21
DelURideSponsorProposal_APR21DelURideSponsorProposal_APR21
DelURideSponsorProposal_APR21
 
Tamer Anwar - CV (1) (2)
Tamer Anwar - CV (1) (2)Tamer Anwar - CV (1) (2)
Tamer Anwar - CV (1) (2)
 
snj kat
snj katsnj kat
snj kat
 
7วิชาสามัญ ชีววิทยา
7วิชาสามัญ ชีววิทยา7วิชาสามัญ ชีววิทยา
7วิชาสามัญ ชีววิทยา
 
Joint therapy What is new in horses?
Joint therapy What is new in horses?Joint therapy What is new in horses?
Joint therapy What is new in horses?
 
resume dec 2015
resume dec 2015resume dec 2015
resume dec 2015
 
Prachir Dhandhania pic
Prachir Dhandhania picPrachir Dhandhania pic
Prachir Dhandhania pic
 
Split, Croatia
Split, Croatia Split, Croatia
Split, Croatia
 
7วิชาสามัญ เคมี
7วิชาสามัญ เคมี7วิชาสามัญ เคมี
7วิชาสามัญ เคมี
 
Prachir dhandhania pic
Prachir dhandhania picPrachir dhandhania pic
Prachir dhandhania pic
 
Thromboembolism 7- 5-15
Thromboembolism 7- 5-15Thromboembolism 7- 5-15
Thromboembolism 7- 5-15
 
7วิชาสามัญ เคมี
7วิชาสามัญ เคมี7วิชาสามัญ เคมี
7วิชาสามัญ เคมี
 
QUTGP20151102Patentpremise
QUTGP20151102PatentpremiseQUTGP20151102Patentpremise
QUTGP20151102Patentpremise
 
Academic Acknowledgements
Academic AcknowledgementsAcademic Acknowledgements
Academic Acknowledgements
 

Similar to WPI172219015000848

Horizon 2020: Outline of a Pilot for Open Research Data
Horizon 2020: Outline of a Pilot for Open Research Data  Horizon 2020: Outline of a Pilot for Open Research Data
Horizon 2020: Outline of a Pilot for Open Research Data LIBER Europe
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonAfrican Open Science Platform
 
Issues and Challenges in Public Licensing of Biodiversity Data and Publications
Issues and Challenges in Public Licensing of Biodiversity Data and PublicationsIssues and Challenges in Public Licensing of Biodiversity Data and Publications
Issues and Challenges in Public Licensing of Biodiversity Data and PublicationsBob Chao
 
NordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk Open Access Reykjavik 14-15/8-2014:RdaNordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk Open Access Reykjavik 14-15/8-2014:RdaNordForsk
 
Science, technology and society
Science, technology and societyScience, technology and society
Science, technology and societyfammy86
 
The Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresThe Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresguest0dc425
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalWaqas Tariq
 
OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...Barry Hardy
 
Oxford University Geospatial Metadata Workshop 20110415
Oxford University Geospatial Metadata Workshop 20110415Oxford University Geospatial Metadata Workshop 20110415
Oxford University Geospatial Metadata Workshop 20110415EDINA, University of Edinburgh
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureRoss Mounce
 
Eu policy on open access april 2019 tsoukala
Eu policy on open access april 2019 tsoukalaEu policy on open access april 2019 tsoukala
Eu policy on open access april 2019 tsoukalaVictoria Tsoukala
 
Geospatial metadata and spatial data workshop: 19 June 2014
Geospatial metadata and spatial data workshop: 19 June 2014Geospatial metadata and spatial data workshop: 19 June 2014
Geospatial metadata and spatial data workshop: 19 June 2014EDINA, University of Edinburgh
 
EOSC-hub: first steps towards realising EOSC vision
EOSC-hub: first steps towards realising EOSC visionEOSC-hub: first steps towards realising EOSC vision
EOSC-hub: first steps towards realising EOSC visionEUDAT
 
Gl15 Keynote Visual First Amendment portion
Gl15 Keynote Visual First Amendment portionGl15 Keynote Visual First Amendment portion
Gl15 Keynote Visual First Amendment portionDebbie Rabina
 
I o dav data workshop prof wafula final 19.9.17
I o dav data workshop prof wafula final 19.9.17I o dav data workshop prof wafula final 19.9.17
I o dav data workshop prof wafula final 19.9.17Tom Nyongesa
 
Making Knowledge Infrastructure by “Identification”
Making Knowledge Infrastructure by “Identification” Making Knowledge Infrastructure by “Identification”
Making Knowledge Infrastructure by “Identification” ORCID, Inc
 

Similar to WPI172219015000848 (20)

Horizon 2020: Outline of a Pilot for Open Research Data
Horizon 2020: Outline of a Pilot for Open Research Data  Horizon 2020: Outline of a Pilot for Open Research Data
Horizon 2020: Outline of a Pilot for Open Research Data
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon Hodson
 
Issues and Challenges in Public Licensing of Biodiversity Data and Publications
Issues and Challenges in Public Licensing of Biodiversity Data and PublicationsIssues and Challenges in Public Licensing of Biodiversity Data and Publications
Issues and Challenges in Public Licensing of Biodiversity Data and Publications
 
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLANINCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
INCLUSION OF DATA ARCHIVES IN DATA MANAGEMENT PLAN
 
NordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk Open Access Reykjavik 14-15/8-2014:RdaNordForsk Open Access Reykjavik 14-15/8-2014:Rda
NordForsk Open Access Reykjavik 14-15/8-2014:Rda
 
Science, technology and society
Science, technology and societyScience, technology and society
Science, technology and society
 
The Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresThe Developing Needs for e-infrastructures
The Developing Needs for e-infrastructures
 
Energy files
Energy filesEnergy files
Energy files
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information Retrieval
 
OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...
 
Oxford University Geospatial Metadata Workshop 20110415
Oxford University Geospatial Metadata Workshop 20110415Oxford University Geospatial Metadata Workshop 20110415
Oxford University Geospatial Metadata Workshop 20110415
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | Future
 
Eu policy on open access april 2019 tsoukala
Eu policy on open access april 2019 tsoukalaEu policy on open access april 2019 tsoukala
Eu policy on open access april 2019 tsoukala
 
Geospatial metadata and spatial data workshop: 19 June 2014
Geospatial metadata and spatial data workshop: 19 June 2014Geospatial metadata and spatial data workshop: 19 June 2014
Geospatial metadata and spatial data workshop: 19 June 2014
 
Eco data
Eco dataEco data
Eco data
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
EOSC-hub: first steps towards realising EOSC vision
EOSC-hub: first steps towards realising EOSC visionEOSC-hub: first steps towards realising EOSC vision
EOSC-hub: first steps towards realising EOSC vision
 
Gl15 Keynote Visual First Amendment portion
Gl15 Keynote Visual First Amendment portionGl15 Keynote Visual First Amendment portion
Gl15 Keynote Visual First Amendment portion
 
I o dav data workshop prof wafula final 19.9.17
I o dav data workshop prof wafula final 19.9.17I o dav data workshop prof wafula final 19.9.17
I o dav data workshop prof wafula final 19.9.17
 
Making Knowledge Infrastructure by “Identification”
Making Knowledge Infrastructure by “Identification” Making Knowledge Infrastructure by “Identification”
Making Knowledge Infrastructure by “Identification”
 

WPI172219015000848

  • 1. Public disclosure of biological sequences in global patent practice Osmat A. Jefferson a, b, * , Deniz K€ollhofer a, b , Prabha Ajjikuttira a, b , Richard A. Jefferson a, b a Queensland University of Technology, Brisbane, QLD 4000, Australia b Cambia, P.O Box 3200, Canberra, ACT 2601, Australia a r t i c l e i n f o Article history: Received 5 January 2015 Received in revised form 20 July 2015 Accepted 23 August 2015 Available online xxx Keywords: Patent Biological patent Patent sequence Patent office Sequence listings Patent sequence data Patent sequence download PatSeq tools Patent disclosure a b s t r a c t Biological sequences are an important part of global patenting, with unique challenges for their effective and equitable use in practice and in policy. Because their function can only be determined with computer-aided technology, the form in which sequences are disclosed matters greatly. Similarly, the scope of patent rights sought and granted requires computer readable data and tools for comparison. Critically, the primary data provided to the national patent offices and thence to the public, must be comprehensive, standardized, timely and meaningful. It is not yet. The proposed global Patent Sequence (PatSeq) Data platform can enable national and regional jurisdictions meet the desired standards. © 2015 Cambia. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). 1. Introduction In the traditional working of the patent system, an inventor secures governmental rights to exclude others from making, using, or selling his/her invention for a limited time in exchange for publicly disclosing the full details of the invention - what is called ‘the teachings’. The teachings derived from the disclosure and the practice of an invention enable the public to use the invention through licensing, to use the invention freely without license outside the jurisdiction, scope and timeframe of protection, build upon the invention through research and development, improve upon it, or design around it to advance scientific and technological capabilities and ultimately to benefit society. In the contemporary use of patents to secure rights over genetic material, the quality of these teachings has come under public scrutiny and the role of patent offices in the disclosure process has been challenged [1,2]. Within patent documents, genetic sequences have been viewed both legally and practically as either chemical compounds or as information-encoding elements, and within the context of patent eligibility or infringement issues, their structure and function value has gained more importance as various jurisdictions e including the United States and Europe - attempt to balance competing in- terests either in favor of the inventors, as the case in Europe, or the public, as the case in USA [3]. As genetic sequences are made up of combinations of four bases e designated as A, C, G, and T (U), in the case of DNA (RNA) e or 20 amino acids each with different chemical properties - designated with single or triple letter codes - in the case of protein, they can only be interpreted using specialized computer software tools. Such tools clarify the structure, function and similarity of any sequence relevant to other sequences. Therefore, during the disclosure pro- cess, the applicant, the patent office, and upon publication, the public should be able to access the disclosed sequence data and use the computer tools to interrogate it within the context of all known sequence listings to interpret, understand, and value their com- bined effect on biological innovations. While some patent offices claim to have internal computer-mediated searching, analysis and visual tools to interpret the contextual value or meaning of patent sequences, public access is still lacking. Moreover, creating patent landscapes that can integrate sequence information with global patent rights and disclosures remain expensive, slow and * Corresponding author. Cambia/QUT, P.O. Box 3200, Canberra ACT 2601, Australia. E-mail address: Osmat@cambia.org (O.A. Jefferson). Contents lists available at ScienceDirect World Patent Information journal homepage: www.elsevier.com/locate/worpatin http://dx.doi.org/10.1016/j.wpi.2015.08.005 0172-2190/© 2015 Cambia. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). World Patent Information 43 (2015) 12e24
  • 2. cumbersome to the public and to those professionals who cannot afford the costly services of commercial providers. Rules for handling of sequences in patent prosecution, imple- mented by United States Patent and Trademark Office (USPTO) and other major patent offices in the 1990s, required the submission of any sequence (nucleotide or peptide) disclosed in any national or foreign application [4]. At that time, the disclosure standard format, known as “Sequence Listing”, was simple and file submissions were accepted either electronically or on paper [5]. As sequence disclo- sures grew exponentially over time, more legal rulings were introduced regarding submissions and with respect to compliance with standard formats. While the major offices recommended in- ternational standards such as ST. 25 [6], for the disclosure of sequence listings in the submitted patent applications, the sub- mitted file formats remained flexible until recently (Table 1). Full compliance with ST. 25 and the inclusion of the associated meta- data such as the origin of the sequence, its length and type, func- tion, and other markup in a computer readable format [7], were actually achieved in only a few offices; variations in the readability of file formats of disclosed sequence data and in its accurate matching when transferred to public databases persist [8,9]. For example, from 2001 until 2007, most international applications did not comply with ST. 25 text format rules and the disclosed se- quences were in tiff or pdf files and contained NON ASCII binary data (Table 1, “Format of published sequences” category at WIPO in 2007). 2. Availability of published patent sequences to the public Each of the major patent offices adopted a strategy regarding the publication and provision of sequence listings to the public. Table 1, column “Format of published sequence listings” depicts the prac- tice adopted over time by USPTO, World Intellectual Property Office (WIPO), and European Patent Office (EPO). Throughout the past 25 years, variations have existed among these offices. For example, the published sequence data from US patent documents is available for bulk downloads under various file formats, however USPTO does not offer a sequence search facility to interrogate the data. The office passes its published data to the National Center for Biotechnology Information (NCBI) [10]. This center provides a comprehensive public sequence search facility, BLAST, allowing contextual interrogation of sequence data and a world class data- base, GenBank [11], hosting nucleotide and peptide sequences from primarily large sequencing projects and individual labs as well as over 12 million sequences from US granted patents since 1982. In an effort to enable access and interrogation of larger sets of patent sequence data, NCBI and two other major public databases providers, the European Bioinformatics Institute (EMBL-EBI) [12], and the DNA Databank of Japan (DDBJ) [13] initiated an informal collaboration in the early 1990s; The International Nucleotide Sequence Database Collaboration (INSDC) [14], to exchange nucleotide (DNA or RNA) -not protein-sequences, including those disclosed in patents [15], and allow public access and interrogation of the data. Similarly, the European Patent Office releases their published sequence listings mainly from published patent applications to EMBL-EBI that incorporates into ENA [16] database within the patent data class (PAT). The sequences are served to the public along with other received sequence listings from partner in- stitutions. The EMBL-EBI databases also provide access to protein- based sequences in the Universal Protein Resource (Uniprot) [17]. Unlike NCBI, EMBL-EBI parses the received sequence listings and extracts associated metadata before serving it in the ENA database [18]. Furthermore, EMBL-EBI provides non-redundant sequence databases based on patent sequences stored in ENA and protein databases. The non-redundant databases are created at two levels and contain additional annotation, patent family information and links to patent literature [19,20]. Sequence listings disclosed in published patent documents from Japan Patent Office (JPO) and Korean Intellectual Property Office (KIPO) are shared through DDBJ, which is administered by the Center for Information Biology of the National Institute of Genetics in Japan. The Databank includes the nucleotide-based sequence listings from patent documents published in Japan and Korea since 1997 [21]. In 2010, two amendments were introduced into this database. First, the NCBI taxonomy ID was added to each sequence listing based on the original organism declared for that sequence in the patent application and the newly revised entries for nucleotides and proteins were released in May 2010 with a scheduled update once per year [22]. The second amendment included the release of protein sequence listings from JPO and KIPO for ftp downloading and later the availability of a sequence similarity search facility for protein sequence listings from USPTO, EPO, JPO, and KIPO [23]. Other public databases that provide access to and search facility of yet smaller collections of published patent sequences include Patome@Korea database serving nucleotide and protein patent sequences provided by the Korean Intellectual Property Office (KIPO) [24] from 2004 to 2008 and maintained by the Korean Bioinformation Center (KOBIC). Similarly, NASDAP, a semi-public Chinese database, provided free sequence search services to explore Chinese gene patents (applications and grants from 1999eFeb 2006), but it seems it is no longer available in our latest search of May 2015. The database covered 123,218 sequence listings from 8563 Chinese patents acquired from State Intellectual Prop- erty Office as hard copies or images [25]. 3. Why do we need a global and transparent patent sequence dataset? As NCBI, EMBL-EBI, and DDBJ decide which sequence listing data to include in their databases and what sequence search facility to provide on what data and when, accurate and comprehensive ac- counting of published sequence data as disclosed in patents is then hard to achieve. Upon reviewing the maze of the available patent sequences from the public or commercial sources, Andree et al. (2008) reported that each public database has still a unique dataset and for any comprehensive searching and analysis, users may need to access and use several databases [26]. Moreover, Cambia's 2011 survey of patent offices reveals that over the past twenty years [27], there has been progress in harmonizing sequence filing rules but sharing that knowledge in a meaningful way and at a global level with the public has lagged, as has ensuring compliance with these rules both by applicants and internally. An optimally functioning patent office embracing such a public disclosure responsibility would meet certain standards. Biological inventions often disclose biological sequences, such as DNA or proteins or portions of them, which may or may not be claimed, and their teaching value depends on obtaining a clear understanding of the nature and function, clear differentiation between what is disclosed and what is claimed, and how such sequences are used in follow-on inventions, and in innovations (products and services) by whom, and where in the world. For example, zooming on GALNT18 gene in PatSeq Analyzer [28] reveals that a 15 mer portion in the 30 end region (GGTTGGTGTGGTTGG) can be/has been used in several patent documents in different contexts. Table 2 lists the issued patents that reference that sequence in the claims and under various SEQ IDs. The table also depicts the corresponding claim referencing the SEQ ID, the claim category based on the use of that SEQ ID within each patent, the applicant name, and the filing date. O.A. Jefferson et al. / World Patent Information 43 (2015) 12e24 13
  • 3. Table 1 Changes in patentability requirements for nucleotide or peptide sequences and in submitted and published sequence formats as adopted by United States Patent and Trademark Office (USPTO), World Intellectual Property Organization (WIPO), and European Patent Office since 1990s. Patent office Entered into force Patentability requirements for nucleotide or peptide sequences Format of submitted sequence listings Format of published sequence listings/ Comments Reference USPTO 01-10-1990 Every sequence described as cited art, used in a comparison figure or table, or not claimed or disclosed in the specification, claims, and figures is covered by the sequence rules and must appear in a “Sequence Listing” section. “Sequence Listing” refers to a standard format for the submission of even one “unbranched” nucleotide or amino acid sequence. Branched sequences are excluded. The rules apply to any nucleic acid sequence of ten or more nucleotides or peptide sequence of four or more amino acids. Submission in the standard format “Sequence Listing” was done either on paper or Compact Disk eR. “A copy of the “Sequence Listing” is available in electronic form from the USPTO web site (http://seqdata.uspto. gov/sequence.html? DocID¼20010000241). An electronic copy of the “Sequence Listing” will also be available from the USPTO upon request and payment of the fee set forth in 37 CFR 1.19(b) (3) (http://www. uspto.gov/web/offices/pac/mpep/ s2435.html) 55 Federal Register No. 84, p. 18230, ROBERT WAX and JAMES COBURN. Sequence Rule Compliance- Separating the Wizards from the Muggles. 22 Biotechnology Law Report 397 Number 4 (August 2003). http://online. liebertpub.com/doi/abs/10. 1089/ 073003103769015915? journalCode¼blr 19-11-1996 The requirements for restriction pursuant to 37 CFR 1.141(a) were waived and applicants were permitted to claim, have examined, in a single application, up to ten independent and distinct inventions described by their nucleotide sequences. And for unity of invention determinations pursuant to 37 CFR 1.475 et seq., up to ten, independent and distinct molecules described by their nucleotide sequence in a single patent application can be searched and examined in international applications or national stage applications filed under 35 USC 371 with four more additional sequences if applicants paid additional fees for search and/or examination. Submission in the standard format “Sequence Listing” was done either on paper or Compact Disk eR. “Sequence data may also be accessed in a more readily searchable manner from the National Center for Biotechnology Information (NCBI) at http://www.ncbi. nlm.nih.gov or from a commercial vendor. The USPTO forwards a copy of the sequence data to NCBI when a patent including a “Sequence Listing” is granted, and when an application containing a sequence is published pursuant to 35 U.S.C. 122(b). If NCBI elects to include the sequence data in one of its databases, NCBI indexes the sequence data according to patent or patent application publication number. There is currently no fee for the public to use the NCBI site.” (http://www. uspto.gov/web/offices/pac/mpep/ s2435.html) 1192 Off. Gaz.Pat. Office 68 http://www.uspto.gov/ web/offices/pac/dapp/opla/ preognotice/ sequence02212007.pdf 01-07-1998 The requirements for patent applications containing nucleotide sequence and/or amino acid disclosures were published to set an international standard with a language neutral format and using numeric identifiers rather than the current subject headings for “Sequence Listings”. Rules under Title 37 Code for Federal Regulation (CFR) x 1.821e1.825 apply ONLY to applications containing sequences that include at least ten nucleotides (four or more of which are specifically defined) or four or more amino acids (of which four or more are specifically defined) or both. The rules were amended to be consistent with the new WIPO standard, ST.25 (https://www. federalregister.gov/articles/2009/08/11/E9-19179/requirements-for- patent-applications-containing-nucleotide-sequence-andor-amino- acid-sequence#p-27) a Sequence Listing must be submitted “as a computer-readable American Standard Code for Information Interchange (ASCII) file (the CRF) on a diskette (Compact Disk-Recordable (CD-R) for large Sequence Listings), as well as a printed version of the same (or, again, CD-R[RAW2], as 37 CFR x1.52(e) contains the requirement of filing a second CD-R in lieu of the “paper” copy). Additionally, a statement must accompany the Sequence Listing (“Statement to support…”) verifying that (1) the submission does not contain new matter and (2) the paper and electronic copy of the listing are the same.” (http://www.uspto.gov/web/offices/com/sol/ og/con/files/cons082.htm) “37 CFR 1.821(e) requires the submission of a copy of the “Sequence Listing” in computer readable form. The information on the computer readable form will be entered into the Office's database for searching and printing nucleotide and amino acid sequences. This electronic database will also enable the Office to exchange patented sequence data, in electronic form, with the Japanese Patent Office and the European Patent Office. It should be noted that the Office's database complies with the confidentiality requirement imposed by 35 U.S.C. 122. Pending application sequences are maintained in the database separately from published or patented sequences. That is, the Office will not exchange or make public any information on any sequence until the patent application containing that information is 63 Federal Register No. 104, pp. 29,620e29,643 http:// www.gpo.gov/fdsys/pkg/ FR-1998-06-01/pdf/98- 14194.pdf O.A.Jeffersonetal./WorldPatentInformation43(2015)12e2414
  • 4. published or matures into a patent, or as otherwise allowed by 35 U.S.C. 122.” (http://www.uspto.gov/web/offices/ pac/mpep/s2422.html) 07-11-2000 37 CFR 1.52 (e) as amended to provide for the filing of tables comprising sequence listings on compact discs. The disc must be either a read-only or a write-once disc. The disc must also be ASCII compliant and the specification must contain a cross-reference to it. (http://www.ladas. com/Patents/Patent Practice/USPractice/USPatLawRevisions-9_.html) 37 CFR Section 1.821(c) was also amended to provide “that a ‘‘Sequence Listing’’ must be submitted either: (1) on paper, or (2) on a compact disc, as defined in the amended x 1.52(e) and as further specified in x 1.823(a) (2). For nucleotide and/or amino acid sequences, no change is made to the computer readable form (CRF) practice under x 1.821(e)”. The requirement for a paper copy of the sequences under x 1.821(c) is modified to allow applicants to satisfy that section with either a paper version or a submission on a CDeROM or CDeR (submitted in duplicate). Submission on compact disk is in addition to and not a replacement for the CRF required under x 1.821(e) (http://www.gpo. gov/fdsys/pkg/FR-2000-09-08/pdf/00-22392.pdf) A Sequence Listing may be submitted as “1. Paper and disc (containing an ASCII text computer-readable form on CD) 2. ASCII text uploaded via Electronic Filing System (EFS) 3. “Paperless Submission” consisting of multiple CD submissions, but no paper.” (http://www. seqidno.com/sequence-listing-services/rules- summary/) (http://www.wipo.int/pct/en/texts/ pdf/pct_regulations_history.pdf) US sequence rules were effective as of July 1, 1998 whereas WIPO ST.25 (the Sequence Rules) were effective as of January 1, 1999 (Wax and Coburn's paper 2003). Standard definitions of “specifically defined” nucleotides and amino acids are used based on the World Intellectual Property Organization (WIPO) Handbook on Industrial Property Information and Documentation, Standard ST.25: Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications (1998), including Tables 1 through 6 in Appendix 2. 65 Federal Register No. 175, pp 54620e54681 http:// www.gpo.gov/fdsys/pkg/ FR-2000-09-08/pdf/00- 22392.pdf, http://www. seqidno.com/sequence- listing-services/rules- summary/ 14-10-2006 With the release of the Electronic Filing System (EFS) version 1.1, filing of Sequence listings became easier. Filers would need to only submit a single.txt file, provided the file is ASCII compliant (to serve both the paper copy required by x 1.821(c) and the CRF required by x 1.821(e)), for a sequence listing and 37 C.F.R.x 1.52(e) (5) requires that the specification be amended to contain a reference to the material in the text file in a separate paragraph which identifies the name of the text file, the date of its creation, and the size of the text file in bytes. Sequence listing text files submitted by EFS-Web have a size limit of 100 megabytes. (http://www.patentdocs.org/2009/01/sequence-listing- efiling-options-using-efsweb.html) Electronic submission using EFS-Web version 1.1 preferably as.txt file (.pdf are acceptable but discouraged) (http://www.patentdocs.org/ 2006/11/hasslefree_fili.html) The requirements of US sequence rules are less stringent than the requirements of WIPO Standard ST.25 (1998). Under ST.25: (1) Submissions from a Mac computers are not accepted; (2) the answers in fields <221> and <222> must use selections from Tables 5 and 6 of the WIPO standard .25; (3) any free text in field <223> will not be translated and thus must appear in the specification; (4) A CRF will not be considered to be part of the disclosure or published if filed after the filing of an application under the PCT; and (5) Paragraphs 24 and 39 of the WIPO standard.25 require speficific compliance criteria within the sequence listing. MPEP (ROBERT WAX and JAMES COBURN. Sequence Rule Compliance Separating the Wizards from the Muggles. 22 Biotechnology Law Report 397 Number 4 (August 2003)). Sections XIII, XVII, and XVIII of the EFS-Web Legal Framework http://www. patentdocs.org/2009/01/ sequence-listing-efiling- options-using-efsweb.html, http://www.patentdocs. org/2006/11/hasslefree_fili. html 27-03-2007 USPTO rescinds the partial waiver of 37 CFR 1.141 et seq. for restriction practice in national applications filed under 35 U.S.C. 111(a), and 37 CFR 1.475 et seq. for unity of invention determinations in both PCT international applications and the resulting national stage applications under 35 U.S.C. 371. “For National applications, polynucleotide inventions will be considered for restriction, rejoinder, and examination practice in accordance with the standards set forth in MPEP Chapter 800 (except for MPEP 803.04 which is superseded by this Notice). Claims to polynucleotide molecules will be considered for independence, relatedness, distinction and burden as for claims to any other type of molecule. For International applications and national stage filings of international applications under 35 U.S.C. 371, unity of invention determination will be made in view of PCT Rule 13.2, 37 CFR 1.475 and Chapter 10 of the ISPE Guidelines. Unity of invention will exist when the polynucleotide molecules, as claimed, share a general inventive Electronic submission using EFS-Web version 1.1 preferably as.txt file (.pdf are acceptable but discouraged) (http://www.patentdocs.org/ 2006/11/hasslefree_fili.html) There is NO filing fee for submitting a sequence listing as part of a U.S. patent application. There is a filing fee for a sequence listing filed in an international application IF the application is more than 30 pages. A $13 filing fee for each page over 30 pages. There are NO page fees for sequence listings submitted via Electronic Filing System-Web in the proper text format (http://www.gpo. gov/fdsys/pkg/FR-2009-08-11/pdf/E9- 19179.pdf) Under 37 CFR 1.16(s) and 1.492(j), both U.S. and international patent applications with paper sequences listings that exceed 100 OG Notices: 27 March 2007 http://www.uspto.gov/ web/offices/com/sol/og/ 2007/week13/patsequ.htm, http://www.patentdocs. org/2006/11/hasslefree_fili. html (continued on next page) O.A.Jeffersonetal./WorldPatentInformation43(2015)12e2415
  • 5. concept, i.e., share a technical feature which makes a contribution over the prior art.” (http://www.uspto.gov/web/offices/com/sol/og/2007/ week13/patsequ.htm) pages, may be subject to an application size fee of $270 (or $135 for small entities) for each additional 50 pages or fraction thereof (https://www. federalregister.gov/articles/2009/08/ 11/E9-19179/requirements-for-patent- applications-containing-nucleotide- sequence-andor-amino-acid- sequence#p-27). WIPO 01-07-1992 International applications with a nucleotide and/or amino acid sequence disclosure need to contain a listing of sequence in the description that is in a format complying with WIPO standard .23 and in accordance with Annex C of the Administrative Instructions. International Search Authority may invite applicants to furnish a listing of the sequence in a machine readable form provided for in the Administrative Instructions (http://www.wipo.int/pct/en/texts/pdf/ pct_regulations_history.pdf) Only USPTO mandated submission of sequence listings in machine readable form and the file encoded in a subset of the ASCII. European Patent Office made the requirement mandatory in January 1993. JPO recommended it with its own character code, and IP Australia did not require it but accepted the submission in ASCII. Patent offices in Austria, Russia, Sweden, and UK did not require submission of sequence listings in machine readable form. (http://www. uspto.gov/web/offices/pac/mpep/old/E5R16_AI. pdf) (rev 14, November 1992) Optical Character Recognition (OCR) format was not been adopted by USPTO whereas it was adopted by some jurisdictions and complied with either WIPO ST.22 or .23. Rules 5.2 and 13ter of PCT regulations, Sections 208, 513, 610, and Annex C of the Administrative Instructions (AI) as in force from July 1, 1992 http:// www.wipo.int/pct/en/ texts/pdf/pct_regulations_ history.pdf 28-11-1994 USPTO, EPO, and JPO worked with WIPO to establish the basis of what was to become the WIPO Standard, ST.25 (1998). (http://www.wipo.int/ meetings/en/doc_details.jsp?doc_id¼4412, PCT/MIA/VI/15) Various formats were entertained. USPTO, EPO, and JPO proposed the development of a single patent sequence database with a front and back ends (http://www.wipo.int/ meetings/en/details.jsp?meeting_ id¼2529) PCT/MIA/V/1 and PCT/MIA/ V/2 http://www.wipo.int/ meetings/en/details.jsp? meeting_id¼2529 01-04-1995 If International Search Authority is prepared to transcribe the sequence listing into a machine readable form, it may request payment for the cost of such transcription (http://www.wipo.int/pct/en/texts/pdf/pct_ regulations_history.pdf) Various formats were entertained. USPTO, EPO, and JPO proposed the development of a single patent sequence database with a front and back ends (http://www.wipo.int/meetings/en/ details.jsp?meeting_id¼2529) The format of sequence listings in paper and electronic form differs based on different patent offices requirements and sequence listings were required to be translated for consideration in the national stage. (http://www.uspto.gov/ web/offices/com/sol/notices/fr019819. html) Rule 13ter.1(a) of PCT regulations http://www. wipo.int/pct/en/texts/pdf/ pct_regulations_history.pdf 01-07-1998 A sequence listing would need to be a separate part of the description in accordance with Annex C of Administrative Instructions and if that sequence listing contains any free text, that free text would need to appear in the main part of the description as well in the language thereof. WIPO new standard, ST.25, replaced ST.23 and ST.24 and established the international meaning of “Sequence listing” for nucleotide and/or amino acid sequence disclosure and allowed applicants to submit a single sequence listing that is acceptable to all receiving offices, International Search, and Preliminary Examining Authorities (for the international phase) and designated and elected offices (for the national phase). See Annex C in http://www.uspto.gov/ web/offices/pac/mpep/old/E7R0_AI.pdf Submission of only one sequence listing in paper and electronic form will be required now and no translation is needed. Computer readable form is only required when a competent authority requires it (http://www. wipo.int/standards/en/pdf/03-25-01.pdf) (http://www.wipo.int/wipostad/en/standards/ st25-en/1-0/view#2255) (http://www.wipo.int/standards/en/pdf/ archives/03-25-01arc2009.pdf) (http://www. wipo.int/pct/en/texts/ pdf/pct_regulations_history.pdf) International applications in electronic form would have mandatory data elements: 1. Applicant Name, 2. Title of Invention, 3. Number of SEQ ID NOs, 4. SEQ ID NO:, 5. Length (sequence length expressed in number of base pairs or amino acids), 6. Type (type of molecule sequenced in SEQ ID NO: x, either DNA, RNA or PRT; if a nucleotide sequence contains both DNA and RNA fragments, the value shall be “DNA”), 7. Organism (Genus Species (that is, scientific name) or “Artificial Sequence” or “Unknown”) Sequence. Rules 5.2 and 13ter.1(a) of PCT regulations and section 513 of Administrative instructions http://www. wipo.int/wipostad/en/ standards/st25-en/1-0/ view#2255 http://www. wipo.int/standards/en/pdf/ archives/03-25-01arc2009. pdf 01-03-2001 New instructions were put in place to deal with the filing, format, fees, preparation, and publications of extremely large international applications containing nucleotide and/or amino acid sequence listings. Sequence listings will be published on the Internet on the date of publication of the rest of the international application. (http://www. uspto.gov/web/offices/pac/mpep/old/E8R0_AI.pdf) Sequence listings, filed as parts of the international applications under the new Section 801(a) of the Administrative Instructions allowed the applicant to file the sequence listings (and/or tables) as: “(i) only on an electronic medium in the computer readable form referred to in Annex C; or (ii) both on an under Section 805, publication of international applications in electronic form is at the discretion of the Director General. “As from 2 August 2001, the sequence listing parts of the international applications filed under Section 801 of the Administrative Part 8 of the Administrative Instructions (sections 801 e806). http://www.wipo. int/edocs/pctndocs/en/ 2001/pct_news_2001_8.pdf Table 1 (continued) Patent office Entered into force Patentability requirements for nucleotide or peptide sequences Format of submitted sequence listings Format of published sequence listings/ Comments Reference O.A.Jeffersonetal./WorldPatentInformation43(2015)12e2416
  • 6. electronic medium in that computer readable form and on paper in the written form referred to in Annex C; “Tables filed in computer readable form under Section 801(a) shall comply with one of the following character formats: (i) UTF-8-encoded Unicode 3.0; or (ii) XML format conforming to the “Application- Body” Document Type Definition referred to in Appendix I of Annex F; at the option of the competent Authority.” (http://www.uspto.gov/ web/offices/pac/mpep/old/E8R1_AI.pdf) Instructions under the PCT will be published on the Internet on the date of publication of the rest of the international application of which it forms a part. Publication of a given international application containing a sequence listing part filed under Section 801 will thus comprise two elements published on the same day: (i) a paper pamphlet, as now, for all parts other than the sequence listing part, and (ii) a new electronic portion for the sequence listing part only; cross-references between the two elements will be included for the sake of clarity.” (http:// www.wipo.int/edocs/pctndocs/en/ 2001/pct_news_2001_8.pdf) 01-01-2003 Standard options were introduced for Electronic Filing and Processing of International Applications. Part 7 does not apply to international applications containing sequence listings. Part 8 applies. However, if applicants submit such applications electronically, they will be subject to Part 7 and NOT part 8 of the administrative instructions. All technical Requirements for the Presentation of Tables Related to Nucleotide and Amino Acid Sequence Listings in International Patent Applications under the PCT were provided in a new annex, Annex C-bis Annex F provided the standard for electronic filing but the details were published in the PCT Gazette Special Issue No. S-04/2001 dated 27 December 2001. This was not available on WIPO website on 10/4/2012. But a more recent version of Annex F is available, for example, at http://www.wipo.int/pct/en/texts/pdf/ai_anf. pdf submission of sequence listings remained as computer read form and on paper format. While PCT charged for electronically submitted sequence listings as txt file, USPTO did not charge. Sequence listings to be available on the internet in multiple formats. Part 7 of the Administrative Instructions Annex C-bis and Annex F http://www. wipo.int/edocs/pctndocs/ en/2001/pct_news_2001_8. pdf 01-04-2005 USPTO implemented the restriction requirement as of November 1996 to limit an applicant claims to no more than 10 nucleotide sequences in one application. PCT/MIA/VI/9 Administrative Instructions under PCT were silent on the restriction requirement. Rule 13ter. of PCT regulations was amended to provide consistent procedures before all authorities and to request compliance with either the electronic form or paper filing of sequence listings contained in the international applications in accordance with the Standard established in Annex C (http://www.wipo.int/ pct/en/texts/pdf/pct_regulations_history.pdf) Sequence listings to be available on the internet in multiple formats. Rule 13ter. of PCT regulations http://www. wipo.int/pct/en/texts/pdf/ pct_regulations_history.pdf 01-10-2007 New publication system was in place to provide: “ XML daily update files. All SLs [sequence Listings] will be included (i.e. including the SLs extracted from the pamphlets). SLs embedded in the description will be gradually removed. A new structure is available as follows: publication/ year/week/WO_number for the SL files, updates/year/month for the update files. All subsequently published SLs will be added to the publication week directory and reported in the update file. All subsequently deleted/replaced/added SLs will trigger the update of the corresponding international application publication content and will be reported in the update file.” (http://www.wipo.int/patentscope/en/ news/pctdb/2007/news_0010.html) From 2001 until 2007, most sequence listings were from the “mixed mode” electronic submission (PCT application on paper whereas the sequence listings filed electronically). From 2001 until 2007, most PCT application DID NOT comply with ST.25 text format rules. Most sequence listings were in TIF or pdf to TIF files and contained NON ASCII compliant text. (http://www.fiz-zarlsruhe.de/uploads/ tx_ptgsarelatedfiles/0210_wipo_bbm. pdf) PatentScope, WIPO, and STN International Website http://www.wipo.int/ patentscope/en/news/ pctdb/2007/news_0010. html 01-07-2009 In view of the practice of electronic submission for sequence listings, Part 8 of the Administrative Instructions (Sections 801e806) and Annex C-bis became irrelevant and were deleted from the administrative instructions. (http://www.wipo.int/edocs/pctndocs/en/2009/pct_news_ 2009_07.pdf) (http://www.wipo.int/pct/en/newslett/2009/06/article_ 0002.html) A number of other modifications were also introduced to the Administrative Instructions under the PCT in relation to the international filing fees: 1. mixed mode sequence listing filing (sequence listing and tables are filed in electronic form while the remainder of the international application is filed on paper, when the receiving Office accepts the filing of such “Where a copy of a ST.25-compliant text format sequence listing has been furnished to the ISA under Rule 13ter.1 (for the purposes of international search only), the ISA will forward a copy of such a sequence listing to the International Bureau The International Bureau will make a Section 707(a-bis) of the administrative Instructions http://www.wipo.int/ export/sites/www/pct/en/ texts/pdf/ai_9.pdf (continued on next page) O.A.Jeffersonetal./WorldPatentInformation43(2015)12e2417
  • 7. “mixed mode” applications, will no longer be possible, 2. there will no longer be a page fee payable for sequence listings filed in text format as part of an international application filed in electronic form, 3. full page fees will be payable for all pages of a sequence listing filed in image format (for example, PDF format) or on paper, 4. sequence listings, filed only for the purposes of international search, will become publicly available, 5. Full page fees for tables containing sequence listings regardless of the format submitted in (image or paper or electronic) (http://www.wipo.int/export/sites/www/pct/ en/texts/pdf/ai_9.pdf) copy of all sequence listings in text format received publicly available on PATENTSCOPE® ” (http://www.wipo.int/ pct/en/newslett/2009/06/article_0002. html). The 2009 rules did not seem to impact on the compliance with ST.25 text format rules and WIPO is still sorting the 1999e2006 backlog of sequence listings (mostly image files) (http://www.fiz-karlsruhe.de/uploads/ tx_ptgsarelatedfiles/0210_wipo_bbm. pdf) 01-01-2011 Paragraphs 2, 3bis, 4bis, 38 and 42 of Annex C of the Administrative Instructions under the PCT was amended in relation to the correction, rectification or amendment of sequence listings. These changes are only applicable in respect to international applications files on or after January 1, 2011. See page 1 in http://www.wipo.int/edocs/pctndocs/en/ 2012/pct_news_2012_13.pdf Annex C of the Administrative Instructions, Rule 13ter http://www. wipo.int/edocs/pctndocs/en/2012/pct_ news_2012_13.pdf (April 3, 2012) A circular will be sent to all receiving Offices, International Searching Authorities and designated Offices to introduce the new ST.26 XML standard with links to example sequence listings in XML format (and comparing features using ST.25 standard with those using the new ST.26 XML standard). The Circular will inquire on when and how implementation of the new standard can be facilitated and accomplished over time. Currently, the sequence listing software tool, PatentIn, is being replaced by BISSAP, which is expected to support both ST.25 and a draft version of the new ST.26 XML standard. BISSAP is being developed by European Patent Office and will be used across all offices to help in the preparation and processing of sequence listings. (www.wipo.int/edocs/mdocs/pct/en/pct_wg_5/ pct_wg_5_14.doc) Thirty percent of the sequence listings downloadable from WIPO website are in txt format, the rest is in image or pdf unsearchable formats and they are difficut to render in searchable format. PCT/WG/5/14 www.wipo. int/edocs/mdocs/pct/en/ pct_wg_5/pct_wg_5_14. doc EPO 01-01-1993 Rule 27 a (1) If nucleotide or amino acid sequences are disclosed in the European patent application the description shall contain a sequence listing conforming to the rules laid down by the President of the European Patent Office for the standardized representation of nucleotide and amino acid sequences. (4) A sequence listing filed after the date of filing shall not form part of the description. European Patent Convention (EPC 1973) Rule 27a (1), (4) (OJ EPO 1992, 342 ff). http://www.epo.org/law- practice/legal-texts/html/epc/1973/e/ r27a.html October 2,1998 Rule 27 a amended (2) The President of the European Patent Office may require that, in addition to the written application documents, a sequence listing in accordance with paragraph 1 be submitted on a data carrier prescribed by him accompanied by a statement that the information recorded on the data carrier is identical to the written sequence listing, (3) If a sequence listing is filed or corrected after the date of filing, the applicant shall submit a statement that the sequence listing so filed or corrected does not include matter which goes beyond the content of the application as filed. The Sequence is to be submitted on a data carrier EPC 1998, R. 27a(2), (3) (Suppl. No. 2 to OJ EPO 11/1998) http://www.epo.org/ law-practice/legal-texts/html/epc/ 1973/e/r27a.html 13-12-2007 Rule 30 was introduced to meet the requirements of European patent applications relating to nucleotide and amino acid sequences. Art. 56, 57, 80 R. 42 are relevant here. (1) If nucleotide or amino acid sequences are disclosed in the European patent application, the description shall contain a sequence listing conforming to the rules laid down by the President of the European Patent Office for the standardized representation of nucleotide and amino acid sequences, (2) A sequence listing filed after the date of filing shall not form part of the description, Disclosed sequences within the meaning of Rule 30(1) in the European patent application are to be represented in a sequence listing which conforms to WIPO Standard ST. 25. They can be filed electronically and on paper. In such a case, a copy of the sequence listing must also be submitted in computer-readable form. (Special edition No. 3, OJ EPO 2007, C.1 and C2) Access to published patent sequence data is via the EBI's website and you purchase and bulk download patent sequences from EPO site (http://www. epo.org/searching/free/publication- server/sequence-listings.html) EPC(1973) to EPC Rule 30 replaced rule 27a(1) and (4).Rule 27a (2), (3) was deleted and a new clause (3) was added to Rule 30. http://www.epo.org/law- practice/legal-texts/html/ epc/2010/e/r30.html Table 1 (continued) Patent office Entered into force Patentability requirements for nucleotide or peptide sequences Format of submitted sequence listings Format of published sequence listings/ Comments Reference O.A.Jeffersonetal./WorldPatentInformation43(2015)12e2418
  • 8. (3) Where the applicant has not filed a sequence listing complying with the require-ments under paragraph 1 at the date of filing, the European Patent Office shall invite the applicant to furnish such a sequence listing and pay the late furnishing fee. If the applicant does not furnish the required sequence listing and pay the required late furnishing fee within a period of two months after such an invitation, the application shall be refused. 28-04-2011 European Patent Office in collaboration with national patent offices and the European Bioinformatics Institute developed BiSSAP, which is a computer program designed to facilitate submission of sequence listings in patent applications (http://archive.epo.org/epo/pubs/oj011/ 06_11/06_3761.pdf) “BiSSAP can be used to prepare and verify sequences, generate the sequence listing files for submission, import existing sequence listings in WIPO ST. 25 and convert between sequence listing formats (WIPO ST. 25 and XML proposal). It also contains a “batch verification” module allowing users to verify collections of sequence listings.” Art. 6(2) Dec. of the President of the EPO dated 28 April 2011 on the filing of sequence listings, OJ EPO 2011, 372 requires conversion of sequence listings into a pdf format. If they can not be searchable, then public access will be affected. Rule 30 EPC, Rule 5.2 PCT, and the Decision of the President and Notice from the EPO dated 28 April 2011 (OJ EPO 6/2011, 372 ff). http://www.epo.org/law- practice/legal-texts/html/ epc/2010/e/r30.html, http://archive.epo.org/epo/ pubs/oj011/06_11/06_ 3761.pdf 18-10-2013 Filing of Sequence listings was amended to: “1.1 If nucleotide or amino acid sequences are disclosed in a European patent application, the description must contain a sequence listing complying with WIPO Standard ST.25 (Standard for the Presentation of Nucleotide and Amino Acid Sequence Listings in Patent Applications - hereinafter referred to as the “Standard”) (Rule 30(1) EPC in conjunction with Article 1 of the decision of the President).” “1.4 Under Article 1(1) of the decision of the President, sequence listings must be submitted in electronic form, i.e. in text format (TXT). Further information about the document format is set out in the Standard. The sequence listing should no longer be filed on paper or, in the case of electronic filing of the application, in PDF format (see Article 1(1) and (2) of the decision of the President). If the applicant also files the sequence listing of his own accord on paper or in PDF format, he must submit a statement that the sequence listings in electronic form and on paper or in PDF format are identical. In this case, the paper or PDF form will be disregarded in the further procedure.” http://archive.epo.org/epo/ pubs/oj013/11_13/11_5423.pdf Check user feedback on the proposed XML format ST.26 for public access to sequence listings at (http://documents. epo.org/projects/babylon/eponet.nsf/0/ 97F67F6DDAF14D59C1257A070032C35B/$File/CL2012-522-0425User% 20feedback-on-ST.26-final.pdf) Notice from EPO http://archive.epo.org/epo/ pubs/oj013/11_13/11_5423.pdf O.A.Jeffersonetal./WorldPatentInformation43(2015)12e2419
  • 9. Table 2 Issued patents that reference a 15 mer portion of GALNT18 gene (GGTTGGTGTGGTTGG) in the claims under various SEQ IDs along with the corresponding claim, the manually analyzed claim category, the applicant name, and the filing date of that patent document. Patent number_SEQ ID Claims Claim category Filing date Applicant US_5840867_A_21 A composition consisting essentially of the aptamer having the formula: GGTTGGTGTGGTTGG (SEQ ID NO:19), GGTTGGTGTGGTTGG.sup.#G.sup.#T (SEQ ID NO:20), GGTTGGTGTGGTT.sup.*G.sup.*G (SEQ ID NO:21), G.sup.*G.sup.*T.sup.*T.sup.*G.sup.*G.sup.*T.sup.*G.sup.*T.sup.*G.sup.*G.sup.*T.sup.*T.sup.*G.sup.*G (SEQ ID NO:22) Artificial 05-03-1994 GILEAD SCIENCES INC US_5840867_A_22 A composition consisting essentially of the aptamer having the formula: GGTTGGTGTGGTTGG (SEQ ID NO:19), GGTTGGTGTGGTTGG.sup.#G.sup.#T (SEQ ID NO:20), GGTTGGTGTGGTT.sup.*G.sup.*G (SEQ ID NO:21), G.sup.*G.sup.*T.sup.*T.sup.*G.sup.*G.sup.*T.sup.*G.sup.*T.sup.*G.sup.*G.sup.*T.sup.*T.sup.*G.sup.*G (SEQ ID NO:22), Artificial 05-03-1994 GILEAD SCIENCES INC US_5840867_A_19 A composition consisting essentially of the aptamer having the formula: GGTTGGTGTGGTTGG (SEQ ID NO:19), GGTTGGTGTGGTTGG.sup.#G.sup.#T (SEQ ID NO:20), GGTTGGTGTGGTT.sup.*G.sup.*G (SEQ ID NO:21), G.sup.*G.sup.*T.sup.*T.sup.*G.sup.*G.sup.*T.sup.*G.sup.*T.sup.*G.sup.*G.sup.*T.sup.*T.sup.*G.sup.*G (SEQ ID NO:22), Sequence claimed 05-03-1994 GILEAD SCIENCES INC US_5756291_A_29 A method to detect the presence or absence of thrombin, which method comprises: a) contacting a sample suspected of containing thrombin with a single-stranded DNA aptamer coupled to a label under conditions wherein a complex between thrombin and the aptamer is formed; and b) detecting the presence or absence of said complex indicating the presence or absence of thrombin; wherein said aptamer comprises the sequence:##STR17## wherein N is A, T or G. The method of claim 1 wherein N is T. The method of claim 1 wherein said aptamer comprises the sequence:##STR18## wherein: N is A, T, or G; and Z is an integer from 2 to 5. The method of claim 3 wherein Z is 3. The method of claim 4 wherein said aptamer comprises the sequence:##STR19## The method of claim 5 wherein N is T (SEQ ID NO: 29). Probe or primer used in a method claim 06-07-1995 GILEAD SCIENCES INC US_6323185_B1_53 An oligonucleotide having a nucleotide sequence chosen from the group consisting of SEQ ID NOS 2 e27, 29, 31e39, 46e52 and 53e87, wherein said nucleotide sequence is optionally modified at the 30 terminus or 50 terminus by attachment of a substituent moiety selected from the group consisting of propylamine, poly-L-lysine, cholesterol, fatty acid chains of length 2 to 24 carbons, and vitamin E. Sequence claimed 17/07/1996 US HEALTH US_5691145_A_3 An oligonucleotide which forms an intramolecular G-quartet structure, the oligonucleotide being labeled with a donor fluorophore and an acceptor, the donor fluorophore and the acceptor selected such that fluorescence of the donor fluorophore is quenched by the acceptor when the oligonucleotide forms the G-quartet structure and quenching of donor fluorophore fluorescence is reduced upon unfolding of the G-quartet structure. The oligonucleotide of claim 1 consisting of SEQ ID NO:3 labeled with the donor fluorophore and the acceptor. Artificial 27/08/1996 BECTON DICKINSON CO US_5882870_A_1 A kit for the reversible anticoagulation of blood comprising: a nucleic acid ligand which binds thrombin, said nucleic acid ligand selected from the group consisting of a nucleic acid ligand having SEQ ID NO:1, a nucleic acid ligand having SEQ ID NO:2, a bi-directional nucleic acid ligand comprising two oligonucleotide segments having SEQ ID NO:5 linked at their respective 30 ends to a phosphodiester group each of which is linked to a hexaethylene glycol chain, and a bi-directional nucleic acid ligand comprising two oligonucleotide segments having SEQ ID NO:6 linked at their respective 30 ends to a phosphodiester group each of which is linked to a glycerol derivative; and a reversing agent which has greater affinity for the nucleic acid ligand than does thrombin, said reversing agent selected from the group consisting of compositions comprising a nucleic acid sequence complementary to that of said nucleic acid ligand, single-stranded DNA binding proteins, copper (II), mercury (II), silver (I) and platinum complexes. Subpart 14/01/1998 BECTON DICKINSON CO US_6780850_B1_1 A composition comprising: a nucleic acid, that is derivatized at the 50 or 30 end or at both the 50 and 30 ends with streptavidin or a variant of streptavidin that retains biotin binding activity, that specifically binds to thrombin, wherein said nucleic acid is 20 -fluoropyrimidine RNA or 20 - aminopyrimidine RNA. The composition of claim 1, wherein the nucleic acid comprises nucleotides having the sequence of SEQ ID NO: 1 or SEQ ID NO: 2. The composition of claim 13, wherein the nucleic acid comprises nucleotides having the RNA sequence corresponding to SEQ ID NO: 1 or SEQ ID NO: 2. Subpart 22/06/2000 TRIUMF O.A.Jeffersonetal./WorldPatentInformation43(2015)12e2420
  • 10. As a single gene patent [29] may disclose from one to millions of sequences but claim a selection individually in various patent family members, or in different combinations in other versions/ family members, the volume of the redundant data can be over- whelming. At present, sifting through such information requires the use of all public databases and/or the costly use of combined searches from commercially available databases [30]. An open public dataset with incorporated transparency metrics that allow access to global published patent sequence data with links to corresponding gene patents and interrogation through a sequence search facility, is needed. Below is the progress report towards building such a public facility. 4. PatSeq data platform The new open and interactive online platform, PatSeq Data [31], enables access to patents disclosing genetic sequences and bulk downloads of disclosed sequence data based on jurisdiction, document type, and either sequence type or sequence location. It also serves as a global, open repository for national systems to enable public sharing of sequence data associated with patents. Three new transparency metrics were implemented in PatSeq Data to foster confidence in the quality and quantity of the data, whenever is made available; 1) detailed account of which juris- dictions provide (or not) the sequence listings data, what they provide, and how their data compares with that of PatSeq database; 2) ability to dynamically monitor and compare the degree of overlap between each of data sources with each release date, including the data sources from the public databases and patent offices; 3) ability to link from PatSeq Data to the Lens and other PatSeq tools, such as PatSeq Finder to conduct sequence searches, view family members and download relevant information including original patent documents (Fig. 1). Using the second metric, for example, we were able to identify missing sequence entries from 1990 to 2001 and missing bulk sequence listings with more than 250 k sequence entries in Gen- Bank. Moreover, we learned that protein sequence records from EBI do not match those of GenBank or DDBJ patent division and may not be synchronized with the EPO data. Table 3 depicts the progress made so far on PatSeq database and shows the latest release data from May 29, 2015. In summary, the current holdings of PatSeq database are 232,590,639 sequences corresponding to more than 425,028 biological patent documents and the plan is to continue parsing and adding any data available, especially from EPO in the near future. Considering that patents can be important for domestic and global policymaking, within each jurisdiction, sequence-filing rules were also tracked, whenever the information was available, and displayed in the Dossier view of PatSeq Data. Sequence location within a patent document was shown as well wherever possible and sequence types based on publication year and document type depicted. In the patent offices that act as an International Search Fig. 1. Dossier view of an example jurisdiction, United States of America, in PatSeq Data, compares biological patent holdings found in the Lens with national and regional patent offices databases, views sequence disclosures across jurisdictions over time, and allows download of sequence collections or link to the Lens to perform other searches and analyses. This figure also depicts the newly introduced transparency metrics to accurately account for sequence listings across public databases, national and regional patent offices. O.A. Jefferson et al. / World Patent Information 43 (2015) 12e24 21
  • 11. Authority, we developed timelines for relevant legal changes dur- ing the past 30 years (Fig. 2). As users enter the PatSeq Data site, they are offered a globe, map or table [32] summary view wherein they can monitor the latest holdings of PatSeq database based on the release date depicted under the total holdings. Users can brush over the patent or sequence data based on publication year and document type, link to each year's patent collection in the Lens where they can explore other PatSeq tools, such as PatSeq Finder, or simply download the disclosed sequences. Sequences are downloadable based on Fig. 2. Dossier view of an example jurisdiction, United States of America, in PatSeq Data showing the timelines for patentability requirements for sequences disclosed in patent applications (see upper sections) and legal changes (legislative, administrative, and Court cases in lower sections). Clicking on each event would allow users to view a short description of that event and to expand the view to check the reference of that information. Table 3 Shared data between the various data sources based on PatSeq database holdings as of May 29, 2015 release date. Available sequence/ document counts are depicted for each database and as shared between two databases. Data available/shared between two databasesa Sequence count Patent count EMBL_EBI 3,80,20,495 2,09,951 EMBL_EBI shares with USPTO 1,19,82,981 71,242 EMBL_EBI shares with WIPO 50,82,962 13,579 NCBI 3,59,21,386 1,84,878 NCBI shares with USPTO 1,20,84,974 72,534 NCBI shares with WIPO 47,10,219 10,524 DDBJ 3,69,69,064 2,01,130 DDBJ shares with USPTO 1,15,36,423 61,114 DDBJ shares with WIPO 47,78,925 11,789 DDBJ shares with NCBI 3,51,44,194 1,70,752 DDBJ shares with EMBL_EBI 3,61,23,009 1,83,885 NCBI shares with EMBL_EBI 3,58,06,474 1,83,338 USPTO 15,75,31,318 2,14,512 WIPO 3,52,73,473 29,662 CIPO 1,70,40,805 38,284 a Sequence listings from EPO full text documents are yet to be included in the database. O.A. Jefferson et al. / World Patent Information 43 (2015) 12e2422
  • 12. document type, sequence type, and sequence location in document. For example, the title, “Grants: Nucleotides (all)”, refers to nucle- otide sequences disclosed in granted patent documents regardless of where they are referenced in the documents whereas “Grants: Nucleotides (in Claims)” refers to a subset of the earlier collection and wherein the nucleotide sequences are referenced in the claims of the granted patent documents. The data is available at no cost for non-commercial users and for a fee for commercial users. By using graphical globe or map function and hovering over a jurisdiction, users can view in a floating tooltip the type of contextual information available from that jurisdiction at the time of their visit, access it in the dossier view and if sequence disclo- sures were shared with us, users would be able to download them. Once in the dossier view, for instance Germany [33], users may choose to explore and link to all related biological patents available for Germany in the Lens, view the mechanism by which Germany shares publicly the data, learn about its format and coverage whenever provided, examine annual biological patent holdings, and compare these holdings with the declared holdings of the patent office, as they become available. Under “Sequences”, users can learn more about the nature of the disclosed sequences. For example, while brushing over sequence holdings in a particular year, the proportion of sequence types and their distribution in the patent documents (in claims, summary, drawings, example, and specification), and sources of data are displayed dynamically allowing for a direct comparison of data sources. The statistics reflect PatSeq data as of the published release date shown in the summary view and in each of the jurisdiction dossier view, and will update automatically with the regular data feed updates (currently it is at monthly intervals) or as more sequence listing data sources are added in the master database. For example, in April 15, 2015, the contents of PatSeq database increased by 25%. The other relevant contextual information in the dossier view includes; a) sequence filing rules for either nucleotide or peptide in that jurisdiction based on Cambia 2011- and WIPO 2001 surveys [34], b) a timeline for relevant legal changes in the jurisdictions that act as International Search Authority (Fig. 2), and c) contact details of the officer who contributed the information or a link to the actual patent office website for more details if that information is pro- vided on the official website. Before releasing this facility, more than 50 patent offices were consulted with and some of their requested features incorporated in PatSeq Data. Moreover, offices such as USPTO, IP Australia, CIPO, and GPTO have contributed sequence data to be included in PatSeq database and while others promised to do so, some offices were not in a position to provide the data such as the Danish Patent and Trademark Office or the Israel patent office that does not even publish the sequence data along with the patent (personal com- munications). As it is now clear that many patent offices simply do not have access to the analytic or server capabilities to host their sequences in a useful manner, the PatSeq Data tool and the entire PatSeq facility as a global non-government activity, offers them such a service. The Lens collaborative project will continue reaching out to other patent offices to demonstrate the public value of Pat- Seq facility. 5. Conclusion Many governments face tough policy choices around the pro- tection or use of IP on biological technologies and materials. The addition of new data from diverse patent offices and com- plementing the missing data through patent family association will enable users to compare patenting activity between various juris- dictions, and engage in better-informed debates on the appropriate degree of gene patenting to optimize economic and social impacts. PatSeq Data allows offices to upload and share their holdings, and for users to download and analyze sequence sets associated with global patent documents. Acknowledgments This work was supported, in part, by the Bill & Melinda Gates Foundation, Global Health Grant ID 52239; Gordon and Betty Moore Foundation “Grant GBMF3465”; Queensland University Technology “Grant 321121-0023/08”; and Queensland University Technology and Syngenta Crop Protection AG “Research collabo- ration No: 1400001566”. We thank the Lens team for their continued support and improvement of the Lens functionalities and Small Multiples, a private visualization company in Sydney, Australia for implementing the open source-globe feature in the platform design of PatSeq Data. We also appreciate the assistance of Nina Prasolova and Innokenti Epichev in the research phase of this project. References [1] A. Devlin, The misunderstood function of disclosure, Pat. Law Harv. J. Law Technol. 23 (2010) 401e446. [2] P. Drahos, Rethinking the Role of the Patent Office from the Perspective of Responsive Regulation, Chapter 5 in Emerging Markets and the World Patent Order: The Forces of Change by F.M. Abbott, C.M. Correa and P. Drahos, Edward Elgar Publishing, Cheltenham, 2014, pp. 78e99. [3] J. Kraus, T. Takenaka, Construction of an Efficient and Balanced Patent System: Patentability and Patent Scope of Isolated DNA Sequence Under US Patent Act and EU Biotech Directive, Chapter 11 in Constructing European Intellectual Property : Achievements and New Perspectives by C. Geiger, Edward Elgar Publishing, Cheltenham, 2013, pp. 255e270. [4] http://seqdata.uspto.gov/sequence.html?DocID¼20010000241. [5] R. Wax, J. Coburn, Sequence rule compliance dseparating the wizards from the muggles, Biotechnol. Law Rep. 22 (2003) 397e400. [6] http://www.wipo.int/standards/en/pdf/03-25-01.pdf. [7] www.wipo.int/edocs/mdocs/pct/en/pct_wg_5/pct_wg_5_14.doc. [8] R. Jones, Errors in patent application sequence listings, Nat. Biotechnol. 21 (2003) 1239e1240. [9] https://www.stn-international.org/uploads/tx_ptgsarelatedfiles/0210_wipo_ bbm.pdf. [10] http://www.ncbi.nlm.nih.gov/genbank/. [11] D.A. Benson, I. Karsch-Mizrachi, D.J. Lipman, J. Ostell, D.L. Wheeler, GenBank, Nucleic Acids Res. 33 (Database issue) (2005) D34eD38. Available at: http:// www.ncbi.nlm.nih.gov/pmc/articles/PMC540017/. [12] http://www.ebi.ac.uk. [13] http://www.ddbj.nig.ac.jp/. [14] http://www.insdc.org. [15] I. Karsch-Mizrachi, Y. Nakamura, G. Cochrane, The international nucleotide sequence database collaboration, Nucleic Acids Res. 40 (Database issue) (2012) D33eD37. Available at http://www.ncbi.nlm.nih.gov/pmc/articles/ PMC3244996/. [16] http://www.ebi.ac.uk/ena. [17] G. Cochrane, P. Aldebert, N. Althorpe, M. Andersson, W. Baker, A. Baldwin, K. Bates, S. Bhattacharyya, P. Browne, A. van den Broek, et al., EMBL nucleotide sequence database: developments in 2005, Nucleic Acids Res. 34 (Database issue) (2006) D10eD15. Available at: http://www.ncbi.nlm.nih.gov/pmc/ articles/PMC1347492/. [18] http://www.ebi.ac.uk/ebisearch/advancedsearchwizard.ebi? domain¼patentdb. [19] W. Li, H. McWilliam, A.R. de la Torre, A. Grodowski, I. Benediktovich, M. Goujon, S. Nauche, R. Lopez, Non-redundant patent sequence databases with value added annotations at two levels, Nucleic Acids Res. 38 (Database issue) (2010) D52eD56. Available at http://www.ncbi.nlm.nih.gov/pmc/ articles/PMC2808894/. [20] J. McDowall, Prioritizing patent sequence search results using annotation-rich data, World Pat. Inf. 33 (2011) 236. [21] K. Okubo, H. Sugawara, T. Gojobori, Y. Tateno, DDBJ in preparation for over- view of research activities behind data submissions, Nucleic Acids Res. 34 (Database issue) (2006) D6eD9. Available at: http://www.ncbi.nlm.nih.gov/ pmc/articles/PMC1347473/. [22] Eli Kaminuma, Takehide Kosuge, Yuichi Kodama, Hideo Aono, Jun Mashima,Takashi Gojobori, Hideaki Sugawara, Osamu Ogasawara, Toshihisa Takagi, Kousaku Okubo, Yasukazu Nakamura, DDBJ progress report, Nucleic Acids Res. 39 (Database issue) (2011) D22eD27. [23] Ibid [24] http://verdi.kobic.re.kr/patome_kr_en/. O.A. Jefferson et al. / World Patent Information 43 (2015) 12e24 23
  • 13. [25] http://www.intellogist.com/wiki/NASDAP. [26] P.J. Andree, et al., A comparative study of patent sequence databases, World Pat. Inf. 30 (2008) 300e308. [27] O.A. Jefferson, D. K€ollhofer, T.H. Ehrich, R.A. Jefferson, Transparency tools in gene patenting for informing policy and practice, Nat. Biotechnol. 31 (2013) 1086e1093. http://www.nature.com/nbt/journal/v31/n12/full/nbt.2755.html. [28] https://www.lens.org/lens/bio/patseqanalyzer#psa//homo_sapiens/latest/ chromosome/11/11494656-11612692. [29] We use the term ‘gene patent’ to include patents and patent applications that disclose and/ or claim nucleotide or peptide sequences. Thus not all ‘gene patents’ in this use have enforceable rights, nor do they necessarily include sequences as essentially claimed material [30] Ibid, Supra note 26. [31] https://www.lens.org/lens/bio/patseqdata. [32] https://www.lens.org/lens/bio/patseqdata#globe/; https://www.lens.org/lens/ bio/patseqdata#map/US/; and https://www.lens.org/lens/bio/patseqdata#table /US/. [33] http://patseqdev.lens.org/lens/bio/patseqdata#globe/DE/. [34] WIPO Secretariat 48, WIPO, Geneva, 2001. Available at: http://www.wipo.int/ edocs/mdocs/tk/en/wipo_grtkf_ic_1/wipo_grtkf_ic_1_6.pdf. O.A. Jefferson et al. / World Patent Information 43 (2015) 12e2424