I don’t know anything about saving our searches – we just look at the front page;)
Be aware that the word “patent family” is a little difficult for the scientist to understand. I mainly use grouping to get an overview over the families – will be interesting to see what else the grouping can be used for
We need to explain why there are e.g. 148 results in the first line…
You can do WHAT with GenomeQuest? (Almost) 101 Things You May Not Know
1 Company Confidential Do Not Distribute
You can do WHAT with
101 things (almost) you may not know
GQ Life Sciences
Sr Product Manager
2 Company Confidential Do Not Distribute
3 Company Confidential Do Not Distribute
Search: Query Design
4 Company Confidential Do Not Distribute
In text searching we try to allow for all possibilities: alternate
(flavor/flavour) or mis-spellings, synonyms, other possibilities
N76D, N-76-D, N 76 D, N/76/D, asp76asn, ASP-76-ASN, “position
76 may be asp (aspartic acid) or asn (asparagine) or it may be
We do the same for sequence searching, but consider the ways a
sequence can be represented, in both query design and result
5 Company Confidential Do Not Distribute
First the Basics -
Where do Sequences Come From?
How was a sequence described by the inventor?
• In generalities? “Any protease from a micro-organism”
• As a cross-reference “Genbank accession number ABC12345”
• Was a listing filed? Part of a table? Shown in
an alignment in an image?
• Or is the whole thing very difficult to decipher?
• Are there multiple Markush positions, represented by Xaa and
described in words?
• Are subsequences called out?
If it’s not in the listing, or at least written out as a sequence somewhere, it’s
probably not in any sequence database!
6 Company Confidential Do Not Distribute
In one aspect, an automatic dishwashing detergent composition
comprising a variant protease of a parent protease, said parent
protease amino acid sequence being identical to the amino acid
sequence of SEQ ID NO:1, said variant protease of said parent protease
mutations consisting of one of the following sets of mutations versus
said parent protease:
(i) N76D + S87R + G118R + S128L + P129Q + S130A
GQ Motif algorithm can find sequences IF they are present in
the sequence listing…most variants ARE NOT!
7 Company Confidential Do Not Distribute
• Index the sequences by writing out each sequence as an explicit
in the database
• That can work for one or two positions with a limited number of
substitutions; however a sequence with six positions x 2
possibilities/position = 64 (26) possible sequences.
• Increase the variability to three variations per position by
adding X as an option and we now have 729 possible
sequences! Four variations? 4096!
Variant Sequence Representation
Question to ponder – if it’s impractical to write out all the explicits, what percentage of variants from
any patent are present in sequence listings OR in ANY sequence database?
8 Company Confidential Do Not Distribute
Retrieves records and/or sequences by patent number, so
• Create a saved, searchable virtual database;
• Obtain sequence(s) for download and ultimate use in
sequence listing, IDS prep, other molecular biology
• Review through GUI or download or both;
• Link out to public patent sources;
• With Platinum subscription, download full text PDF
Patent Number Searches
GenomeQuest Keyword Interface
10 Company Confidential Do Not Distribute
11 Company Confidential Do Not Distribute
Save Your Preferred Search Settings
Be aware of maximum results setting
12 Company Confidential Do Not Distribute
Make Your Own Database
Three Different Ways!
1. Upload your own sequences into GQ
2. Through Keyword Search>browse protein (or
DNA), filter as desired, and make into database
3. From your search results via
Analyses>Extract sequences>Subject sequences
These Virtual Databases (vDBs) can be selected
to search normally as if they were any regular
database like GQ-Pat.
13 Company Confidential Do Not Distribute
1. Upload Your Own Sequences into GQ
• Any standard format – GENBANK, EMBL,
FASTA file containing one or many sequences
• Must be just protein or just nucleotide.
• Will show up under MY DATA
14 Company Confidential Do Not Distribute
2. Browse Database, Save as vDB
15 Company Confidential Do Not Distribute
Keyword Result View
16 Company Confidential Do Not Distribute
3. Make vDB from Search Results
Results must be just DNA or just protein; don’t use
17 Company Confidential Do Not Distribute
CDR Query Setup
• You can design a query to look for CDRs in isolation, or all three in a
single subject, or both!
• If you want the exact CDR sequence (or sequence with specified
variations) MOTIF is the best option, and is the only algorithm that
works for a single query, linking all three CDRs.
• If you want non-specific variability, then GenePast should be used,
limiting number of differences.
• Later on we’ll see how to GROUP results to detect one subject
comprising multiple queries.
• This method is not limited to CDRs; it’s applicable for any group of
subsequences contained in a single, longer sequence.
18 Company Confidential Do Not Distribute
MOTIF on full length – Direct Strike
The long sequence gives hits comprising all three CDRs in the specific order
provided. *. Represents “any number of unspecified residues, including zero”. If
there is even a single mismatch, or the order is incorrect, it will not be found. If a
CDR in the database uses Xaa, it will not be found unless specified in the query
sequence as an alternative to the wt amino acid.
Note the relationship between
the two long sequences. There
were 27 patents hit, and both
sequences are present in all 27.
LC and HC perhaps?
19 Company Confidential Do Not Distribute
Methodology – Searching CDRs
All 3 CDRs in subject or patent
Here we are searching the three CDRs in
isolation. This can be done with either MOTIF
or GenePast. Click on the intersection to see
all 27 patents that contain all three queries.
Extra credit – how many results will you get on
the results view?
The 27 patents will
contain all three
CDRs; however, are
they present in
isolation, in a specific
subject, or both?
81 minimum (3 x 27)
20 Company Confidential Do Not Distribute
CDR Query Comments
• We recommend searching all three (or six) CDRs
as individual sequences.
• The concatentated query is very useful for a direct
strike, but shouldn’t be used exclusively, as you will
miss hits to individual CDRs.
• This accomplishes the same thing as grouping by
subject, but it’s more specific and you get a smaller
number of results (1/3 as many in this case).
CDR1 CDR2 CDR3anything anything
21 Company Confidential Do Not Distribute
• [KX] equivalent to anything, it will retrieve K or anything else, including X,
in that position.
• Degeneracy characters in subject not found automatically; they have to be
– [KV] will find either K or V, but not X.
– [GA] will find either G or A but not R
• Degeneracy characters in query interpreted as what they represent:
[NACGTURYK MSWBDHV][R GA][YTUC][ SGC][WATU]
• Always consider how an inventor might represent a sequence in the
listing, and consider either using degeneracy characters (nucleotide) or
including an explicit X in protein queries.
• There’s a special way to search for that explicit X.
• Tip: look at the query sequence in MOTIF results, it will be written out with
the degeneracy characters expanded (e.g. N will be written as AGCT)
Degeneracy Characters are Difficult!
22 Company Confidential Do Not Distribute
Motif Search Methods
23 Company Confidential Do Not Distribute
• Use the MOTIF algorithm to search for 100% identity to either allele.
• Reminder: with a single mismatch anywhere, MOTIF will not find the
• GenePast/Blast are also good choices; use coordinate filters to select
only those results crossing the SNP region(s).
24 Company Confidential Do Not Distribute
25 Company Confidential Do Not Distribute
Lots of Good Information Here
Count of sequences
that didn’t have hits
Is your total hit
count < your max?
Be sure to
Venn – it is on the
PATENT level, not
For >3 queries, you
can use the
26 Company Confidential Do Not Distribute
27 Company Confidential Do Not Distribute
Intermediate Page Analysis
Validate results, look for fundamental issues:
• Do I have at least one hit for each of my query sequences?
• Repeat this overview after applying each filter set
• Did I “max out” my results?
• Set my max for 500 (default) and at least one query has 500 results
28 Company Confidential Do Not Distribute
29 Company Confidential Do Not Distribute
Are All My Queries Present?
30 Company Confidential Do Not Distribute
You Can Have Many Analysis Views
• Multiple views can be saved and switching between them is as simple
as a mouse click
31 Company Confidential Do Not Distribute
Customized Views for Analysis
32 Company Confidential Do Not Distribute
Views are Created by
33 Company Confidential Do Not Distribute
• The heart of GQ’s power
• Full Boolean with nesting capabilities
• Just like views, you can save multiple filters
• Very flexible combinations
• GUI allows on-the-fly changes – try it, you don’t like it then try something
– I often use this capability to narrow down large resultsets by finding a
cutoff that affects the majority of results
• The AUDIT TRAIL page (found in exported Excel files) includes the applied
– I add a screenshot of my filters and paste it into the report for readability.
34 Company Confidential Do Not Distribute
My Starting Point Nested Boolean Filter
In order to get the most out of GQ, you need to really
understand the different percent identities: query, subject
35 Company Confidential Do Not Distribute
36 Company Confidential Do Not Distribute
37 Company Confidential Do Not Distribute
GenePast Gap Filters
• Huge improvement! Converted me from Blast.
• Prior issue was Query % ID ignored gaps, so you could get hits with
multiple gaps show up as 100% Query ID.
38 Company Confidential Do Not Distribute
InDel Detection with Gap Filters
INDEL Type Query
One of each 1 1
Additional use – INDEL detection!
39 Company Confidential Do Not Distribute
Can Also Display for InDel Analysis
40 Company Confidential Do Not Distribute
SNP Detection – Coordinate Filter
• SNP analysis often focuses on specific position(s), therefore the overall
% identities are frequently irrelevant.
• Only those alignments that cover the region of interest will pass
• Use coordinates to narrow to these regions
41 Company Confidential Do Not Distribute
SNP Detection – Coordinate
Example : SNP is at position 1501
Filter for Query Start <=1500 and Query Stop >=1502
42 Company Confidential Do Not Distribute
43 Company Confidential Do Not Distribute
• Results can be grouped for immediate feedback
How many families/patents/sequences pass these filters?
Are there any hits (including SIDs) that contain multiple query
Which subjects contain:
My three CDRs?
My unique promoter and gene?
Variation 1 but not Variation 2?
44 Company Confidential Do Not Distribute
Find queries (or patents or families) with a disproportionate hit count
45 Company Confidential Do Not Distribute
A New Way to Analyze Data
– GQ’s New Result Browser
• Simplified Interface
• Very different viewing and analysis paradigm
• YOU CAN EXPORT ALIGNMENTS (coming right up!)
46 Company Confidential Do Not Distribute
Single Step Analysis of
Patent, Family, UFS Distribution
47 Company Confidential Do Not Distribute
Unique Family Sequence
A Special Beast
• Purpose is to show distribution of the identical sequence within a given
• The identical sequence may have many different UFS values . Any given
UFS value is only unique for a single family.
• THERE IS NO GUARANTEE THAT THE SEQ ID NO AND/OR PSL IS
IDENTICAL FOR A GIVEN UFS throughout the family.
• It is extremely useful for studying the distribution of a sequence hit of
interest throughout a family
48 Company Confidential Do Not Distribute
UFS PSL, SID variability
49 Company Confidential Do Not Distribute
Reporting Tips & Tricks
50 Company Confidential Do Not Distribute
Reporting Tricks & Tips
• Share results with other GQ users
• Visualize subjects, patents or both containing multiple query
• View alignments adjacent to full text of claims through LifeQuest
• IDS and ST25 Preparation
• Export alignments
• Family portrait, result analysis
• Excel tips:
– Freeze top row
– Link back from Excel to each alignment and make that link
available to other licensed GQ users
– Prepare Excel pivot tables summarizing search results which
can easily be changed to summarize results by many different
51 Company Confidential Do Not Distribute
Share Your Results
1. Create a folder for
2. Make it a shared folder
3. Set Permissions
4. Move Results to Folder
52 Company Confidential Do Not Distribute
Visualize Multiple Queries
Aligned to a Single Subject
53 Company Confidential Do Not Distribute
Multiple Sequence Alignments
54 Company Confidential Do Not Distribute
View Claims Adjacent to Alignments
55 Company Confidential Do Not Distribute
• Export as FASTA and import into Excel (requires a little
• May want second tabular export for organism and molecule type.
• Be sure to have sequences properly ordered;
• Use Excel formulae to clean up and error check, then convert
into PatentIn import format.
• This does take some Excel skill to do right!
GQ Can Help Prepare Sequence Listings
56 Company Confidential Do Not Distribute
Generating Sequence Documents for IDS
• IDS (information disclosure statements) may be filed during
prosecution, either on the initiative of the patent practitioner, or in
response to an Office Action.
• These are essentially citations, and may be journal reference, patents
filings, or sequence documents, or any combination.
• The sequence documents are really easy to prepare from GQ, and
with minimal training, may be done by clerical workers or other
assistants. No knowledge of sequence is needed.
57 Company Confidential Do Not Distribute
Sample Genbank-Formatted Export
• Uses standard sequence export interface.
• Sequences can be obtained from regular search results or by keyword search.
• Can export multiple sequences but they will need to be broken out into individual
58 Company Confidential Do Not Distribute
NRB - Excel Export of Alignments
59 Company Confidential Do Not Distribute
Family Portrait Report
Click on a family to see the list of patents matching your
60 Company Confidential Do Not Distribute
Analysis Report for > 3 Queries
61 Company Confidential Do Not Distribute
To 400 Million and Beyond Contest!
62 Company Confidential Do Not Distribute
Upcoming – QUARTERLY LIVE WEBINARS ON
SPECIAL TOPICS – Stay tuned! You are also
invited to submit a topic for consideration. Email
with your suggestions.
New offering – Consulting and Custom Training
Stop by our booth at PIUG for further information!
63 Company Confidential Do Not Distribute
Thank You for Attending