SlideShare a Scribd company logo
1 of 61
1 Company Confidential Do Not Distribute
You can do WHAT with GenomeQuest?
(Almost) 101 things you may not know
Ellen Sherin
Sr. Product Manager
GQ Life Sciences
Ellen.Sherin@aptean.com
2 Company Confidential Do Not Distribute
Overview
3 Company Confidential Do Not Distribute
Query Design
4 Company Confidential Do Not Distribute
Anticipatory Searching
In text searching we try to allow for all possibilities: alternate (flavor/flavour) or
mis-spellings, synonyms, other possibilities
N76D, N-76-D, N 76 D, N/76/D, asp76asn, ASP-76-ASN, “position 76 may be
asp (aspartic acid) or asn (asparagine) or it may be deleted”
We do the same for sequence searching, but consider the ways a sequence can
be represented, in both query design and result analysis.
5 Company Confidential Do Not Distribute
First the Basics -
Where do Sequences Come From?
How was a sequence described by the inventor?
• In generalities? “Any protease from a micro-organism”
• As a cross-reference “Genbank accession
number ABC12345”
• Was a listing filed? Part of a table? Shown in
an alignment in an image?
• Or is the whole thing very difficult to decipher?
• Are there multiple Markush positions, represented by
Xaa and described in words?
• Are subsequences called out?
If it’s not in the listing, or at least written out as a sequence
somewhere, it’s probably not in any sequence database!
6 Company Confidential Do Not Distribute
US 20150087572
In one aspect, an automatic dishwashing detergent composition
comprising a variant protease of a parent protease, said parent
protease amino acid sequence being identical to the amino acid
sequence of SEQ ID NO:1, said variant protease of said parent protease
mutations consisting of one of the following sets of mutations versus
said parent protease:
(i) N76D + S87R + G118R + S128L + P129Q + S130A
Markush Sequences
GQ Motif algorithm can find
sequences IF they are present in
the sequence listing…most variants
ARE NOT!
7 Company Confidential Do Not Distribute
Retrieves records and/or sequences by patent
number, so you can:
• Create a saved, searchable virtual database;
• Obtain sequence(s) for download and ultimate use
in sequence listing, IDS prep, other molecular
biology programs;
• Review through GUI or download or both;
• Link out to public patent sources;
• With Platinum subscription, download full text PDF
Patent Number Searches
GenomeQuest Keyword Interface
8 Company Confidential Do Not Distribute
9 Company Confidential Do Not Distribute
Search Setup
10 Company Confidential Do Not Distribute
Save Your Preferred Search Settings
11 Company Confidential Do Not Distribute
Make Your Own Database
Three Different Ways!
1. Upload your own sequences into GQ
2. Through Keyword Search>browse protein (or
DNA), filter as desired, and make into database
3. From your search results via
Analyses>Extract sequences>Subject sequences
These Virtual Databases (vDBs) can be selected
to search normally as if they were any regular
database like GQ-Pat.
12 Company Confidential Do Not Distribute
1. Upload Your Own Sequences into GQ
• Any standard format – GENBANK, EMBL,
FASTA file containing one or many sequences
• Must be just protein or just nucleotide.
• Will show up under MY DATA
13 Company Confidential Do Not Distribute
2. Browse Database, Save as vDB
• Just DNA or just protein
• Best way is to “browse” DNA or protein databases then filter for your desired
criteria
• Although a vDB can be created from text search results, if you haven’t “hard-
coded” JUST protein or DNA, it won’t show up in your search database choices
14 Company Confidential Do Not Distribute
Result View
15 Company Confidential Do Not Distribute
3. Make vDB from Search Results
Results must be just DNA or just protein; don’t use
mixed results.
16 Company Confidential Do Not Distribute
CDR Query Setup
• You can design a query to look for CDRs in isolation, or all three in a
single subject, or both!
• If you want the exact CDR sequence (or sequence with specified
variations) MOTIF is the best option, and is the only algorithm that
works for a single query, linking all three CDRs.
• If you want non-specific variability, then GenePast should be used,
limiting number of differences.
• Later on we’ll see how to GROUP results to detect one subject
comprising multiple queries.
• This method is not limited to CDRs; it’s applicable for any group of
subsequences contained in a single, longer sequence.
17 Company Confidential Do Not Distribute
MOTIF on full length – Direct Strike
The long sequence gives hits comprising all three CDRs in the specific order
provided. *. Represents “any number of unspecified residues, including zero”. If
there is even a single mismatch, or the order is incorrect, it will not be found. If a
CDR in the database uses Xaa, it will not be found unless specified in the query
sequence as an alternative to the wt amino acid.
>37-motif
DLSIH.*GFDPQDGETIYAQKFQG.*GSSSSWFDP
>9-motif
RASQGISSWLA.*GASNLES.*QQANSFPWT
Note the relationship between
the two long sequences. There
were 27 patents hit, and both
sequences are present in all 27.
LC and HC perhaps?
18 Company Confidential Do Not Distribute
Methodology – Searching CDRs
All 3 CDRs in subject or patent
Here we are searching the three CDRs in
isolation. This can be done with either MOTIF
or GenePast. Click on the intersection to see
all 27 patents that contain all three queries.
Extra credit – how many results will you get on
the results view?
The 27 patents will
contain all three
CDRs; however, are
they present in
isolation, in a specific
subject, or both?
81 minimum (3 x 27)
19 Company Confidential Do Not Distribute
CDR Query Comments
• We recommend searching all three (or six) CDRs
as individual sequences.
• The concatentated query is very useful for a direct
strike, but shouldn’t be used exclusively, as you will
miss hits to individual CDRs.
• This accomplishes the same thing as grouping by
subject, but it’s more specific and you get a smaller
number of results (1/3 as many in this case).
CDR1 CDR2 CDR3anything anything
20 Company Confidential Do Not Distribute
• [KX] equivalent to anything, it will retrieve K or anything else, including X,
in that position.
• Degeneracy characters in subject not found automatically; they have to be
searched explicitly.
– [KV] will find either K or V, but not X.
– [GA] will find either G or A but not R
• Degeneracy characters in query interpreted as what they represent:
[NACGTURYK MSWBDHV][R GA][YTUC][ SGC][WATU]
• Always consider how an inventor might represent a sequence in the
listing, and consider either using degeneracy characters (nucleotide) or
including an explicit X in protein queries.
• There’s a special way to search for that explicit X.
• Tip: look at the query sequence in MOTIF results, it will be written out with
the degeneracy characters expanded (e.g. N will be written as AGCT)
Degeneracy Characters are Difficult!
21 Company Confidential Do Not Distribute
For More on Variable Sequences
22 Company Confidential Do Not Distribute
SNP Queries
• Use the MOTIF algorithm to search for 100% identity to either allele.
• Reminder: with a single mismatch anywhere, MOTIF will not find the
hit!
• GenePast/Blast are also good choices; use coordinate filters to select
only those results crossing the SNP region(s).
23 Company Confidential Do Not Distribute
Results Overview
Intermediate Page
24 Company Confidential Do Not Distribute
Lots of Good Information Here
• Correct query sequence count
• Count of sequences that didn’t have hits
• Is your total hit count < your max?
• Be sure to understand the Venn – it is on
the PATENT level, not subject
• For >3 queries, you can use the Statistics
Report functionality
25 Company Confidential Do Not Distribute
Intermediate
Page
26 Company Confidential Do Not Distribute
Analysis Report for > 3 Queries
27 Company Confidential Do Not Distribute
Intermediate Page Analysis
Validate results, look for fundamental issues:
• Do I have at least one hit for each of my query sequences?
• Repeat this overview after applying each filter set
• Did I “max out” my results?
• Set my max for 500 (default) and at least one query has 500 results
28 Company Confidential Do Not Distribute
Analysis
29 Company Confidential Do Not Distribute
Are All My Queries Present?
30 Company Confidential Do Not Distribute
You Can Have Many Analysis Views
• Multiple views can be saved and switching between them is as simple
as a mouse click
31 Company Confidential Do Not Distribute
Customized Views for Analysis
Numeric
Bibliographic
32 Company Confidential Do Not Distribute
Views are Created by
DEFINE COLUMNS
33 Company Confidential Do Not Distribute
• The heart of GQ’s power
• Full Boolean with nesting capabilities
• Just like views, you can save multiple filters
• Very flexible combinations
• GUI allows on-the-fly changes – try it, you don’t like it then try something
else!
– I often use this capability to narrow down large resultsets by finding a
cutoff that affects the majority of results
• The AUDIT TRAIL page (found in exported Excel files) includes the applied
filters.
– I add a screenshot of my filters and paste it into the report for readability.
GenomeQuest Filters
34 Company Confidential Do Not Distribute
My Starting Point Nested Boolean Filter
In order to get the most out of GQ, you need to really
understand the different percent identities: query, subject
and alignment!
35 Company Confidential Do Not Distribute
Wildcarding Works!
MultipleValuesare“OR”
36 Company Confidential Do Not Distribute
Text ANDing
37 Company Confidential Do Not Distribute
GenePast Gap Filters
• Huge improvement! Converted me from Blast.
• Prior issue was Query % ID ignored gaps, so you could get hits with
multiple gaps show up as 100% Query ID.
38 Company Confidential Do Not Distribute
InDel Detection with Gap Filters
INDEL Type Query
Gaps
Subject
Gaps
Alignment
Insertion
mutant
1 0
Deletion
mutant
0 1
One of each 1 1
Additional use – INDEL detection!
(thanks Bjarne!)
39 Company Confidential Do Not Distribute
Can Also Display for InDel Analysis
40 Company Confidential Do Not Distribute
SNP Detection – Coordinate Filter
• SNP analysis often focuses on specific position(s), therefore the overall
% identities are frequently irrelevant.
• Only those alignments that cover the region of interest will pass
screening
• Use coordinates to narrow to these regions
41 Company Confidential Do Not Distribute
SNP Detection – Coordinate
Filters
Example : SNP is at position 1501
Filter for Query Start <=1500 and Query Stop >=1502
42 Company Confidential Do Not Distribute
Viewing Alignments
43 Company Confidential Do Not Distribute
• Results can be grouped for immediate feedback
How many families/patents/sequences pass these filters?
Are there any hits (including SIDs) that contain multiple query
sequences?
Which subjects contain:
My three CDRs?
My unique promoter and gene?
Variation 1 but not Variation 2?
Grouping
44 Company Confidential Do Not Distribute
Grouping
Find queries (or patents or families) with a disproportionate hit count
45 Company Confidential Do Not Distribute
A New Way to Analyze Data
– GQ’s New Result Browser
• Simplified Interface
• Very different viewing and analysis paradigm
• YOU CAN EXPORT ALIGNMENTS (coming right up!)
46 Company Confidential Do Not Distribute
Single Step Analysis of
Patent, Family, UFS Distribution
47 Company Confidential Do Not Distribute
Universal Family Sequence
A Special Beast
• Purpose is to show distribution of the identical
sequence within a given family
• The identical sequence may have many different
UFS values . Any given UFS value is only unique
for a single family.
• THERE IS NO GUARANTEE THAT THE SEQ ID
NO AND/OR PSL IS IDENTICAL FOR A GIVEN
UFS throughout the family.
• It is extremely useful for studying the distribution of
a sequence hit of interest throughout a family
48 Company Confidential Do Not Distribute
UFS PSL, SID variability
49 Company Confidential Do Not Distribute
Reporting Tips & Tricks
50 Company Confidential Do Not Distribute
Share Your Results
1. Create a folder for
results
2. Make it a shared folder
3. Set Permissions
4. Move Results to Folder
51 Company Confidential Do Not Distribute
Reporting Tricks & Tips
• Share results with other GQ users
• Visualize subjects, patents or both containing multiple query
sequences
• View alignments adjacent to full text of claims through LifeQuest
• IDS and ST25 Preparation
• Export alignments
• Family portrait, result analysis
• Excel tips:
– Freeze top row
– Link back from Excel to each alignment and make that link
available to other licensed GQ users
– Prepare Excel pivot tables summarizing search results which
can easily be changed to summarize results by many different
parameters
52 Company Confidential Do Not Distribute
Visualize Multiple Queries
Aligned to a Single Subject
53 Company Confidential Do Not Distribute
View Claims Adjacent to Alignments
54 Company Confidential Do Not Distribute
• Export as FASTA and import into Excel (requires a little
manipulation);
• May want second tabular export for organism and molecule type.
• Be sure to have sequences properly ordered;
• Use Excel formulae to clean up and error check, then convert
into PatentIn import format.
<my-seq-name;moltype;organism>
sequence
• This does take some Excel skill to do right!
GQ Can Help Prepare Sequence Listings
55 Company Confidential Do Not Distribute
Generating Sequence Documents for IDS
Prep
• IDS (information disclosure statements) may be filed during
prosecution, either on the initiative of the patent practitioner, or in
response to an Office Action.
• These are essentially citations, and may be journal reference, patents
filings, or sequence documents, or any combination.
• The sequence documents are really easy to prepare from GQ, and
with minimal training, may be done by clerical workers or other
assistants. No knowledge of sequence is needed.
56 Company Confidential Do Not Distribute
Sample Genbank-Formatted Export
for IDS
• Uses standard sequence export interface.
• Sequences can be obtained from regular search results or by keyword search.
• Can export multiple sequences but they will need to be broken out into individual
files.
57 Company Confidential Do Not Distribute
NRB - Excel Export of Alignments
nrb export.xls
58 Company Confidential Do Not Distribute
Family Portrait Report
Click on a family to see the list of patents matching your
sequence
59 Company Confidential Do Not Distribute
Analysis Report for > 3 Queries
60 Company Confidential Do Not Distribute
Excel Link Backs & Pivot Table
Sample export
for link back.xlsx
61 Company Confidential Do Not Distribute
Acknowledgments
• Bjarne Due Larsen
• Mary Jane Reeve
• Steven Altman
• Bob March
• Bill Perkins
• Henk Heus
• Danyu Wu
• Stephen Allen

More Related Content

Similar to You can do WHAT with GenomeQuest? (Almost) 101 Things You May Not Know

Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 
Finding a good development partner
Finding a good development partnerFinding a good development partner
Finding a good development partnerKevin Poorman
 
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...Alex Pinto
 
Testing 2 - Thinking Like A Tester
Testing 2 - Thinking Like A TesterTesting 2 - Thinking Like A Tester
Testing 2 - Thinking Like A TesterArleneAndrews2
 
Psy 870 module 3 problem set answers
Psy 870  module 3 problem set answersPsy 870  module 3 problem set answers
Psy 870 module 3 problem set answersbestwriter
 
Search Engine Results: The Best Measure?
Search Engine Results: The Best Measure? Search Engine Results: The Best Measure?
Search Engine Results: The Best Measure? Fan Foundry
 
Effective Searching: Part 3 - Narrow your search (Generic Web)
Effective Searching: Part 3 - Narrow your search (Generic Web)Effective Searching: Part 3 - Narrow your search (Generic Web)
Effective Searching: Part 3 - Narrow your search (Generic Web)Jamie Bisset
 
Leveraging artificial intelligence to build algorithmic trading strategies
Leveraging artificial intelligence to build algorithmic trading strategiesLeveraging artificial intelligence to build algorithmic trading strategies
Leveraging artificial intelligence to build algorithmic trading strategiesQuantInsti
 
Introduction to machine learning with GPUs
Introduction to machine learning with GPUsIntroduction to machine learning with GPUs
Introduction to machine learning with GPUsCarol McDonald
 
SAS AX 2018 - Manufacturing Insights by William Nadolski
SAS AX 2018 - Manufacturing Insights by William NadolskiSAS AX 2018 - Manufacturing Insights by William Nadolski
SAS AX 2018 - Manufacturing Insights by William NadolskiWilliam Nadolski
 
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxMACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxVijayalakshmi171563
 
Modern Search: Using ML & NLP advances to enhance search and discovery
Modern Search: Using ML & NLP advances to enhance search and discoveryModern Search: Using ML & NLP advances to enhance search and discovery
Modern Search: Using ML & NLP advances to enhance search and discoveryAll Things Open
 
Boundary and equivalnce systematic test design
Boundary and equivalnce   systematic test designBoundary and equivalnce   systematic test design
Boundary and equivalnce systematic test designIan McDonald
 
Regulatory Intelligence Series - How to find Predicate Devices SOFIE compared...
Regulatory Intelligence Series - How to find Predicate Devices SOFIE compared...Regulatory Intelligence Series - How to find Predicate Devices SOFIE compared...
Regulatory Intelligence Series - How to find Predicate Devices SOFIE compared...Graematter Inc
 
136 latest dot net interview questions
136  latest dot net interview questions136  latest dot net interview questions
136 latest dot net interview questionssandi4204
 
AI Pharma Summit Keynote Boston 7-26-17
AI Pharma Summit Keynote Boston 7-26-17AI Pharma Summit Keynote Boston 7-26-17
AI Pharma Summit Keynote Boston 7-26-17Brandon Allgood
 
Data Mining with SQL Server 2008
Data Mining with SQL Server 2008Data Mining with SQL Server 2008
Data Mining with SQL Server 2008Peter Gfader
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Greg Makowski
 

Similar to You can do WHAT with GenomeQuest? (Almost) 101 Things You May Not Know (20)

Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Finding a good development partner
Finding a good development partnerFinding a good development partner
Finding a good development partner
 
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
 
Testing 2 - Thinking Like A Tester
Testing 2 - Thinking Like A TesterTesting 2 - Thinking Like A Tester
Testing 2 - Thinking Like A Tester
 
Psy 870 module 3 problem set answers
Psy 870  module 3 problem set answersPsy 870  module 3 problem set answers
Psy 870 module 3 problem set answers
 
Search Engine Results: The Best Measure?
Search Engine Results: The Best Measure? Search Engine Results: The Best Measure?
Search Engine Results: The Best Measure?
 
Effective Searching: Part 3 - Narrow your search (Generic Web)
Effective Searching: Part 3 - Narrow your search (Generic Web)Effective Searching: Part 3 - Narrow your search (Generic Web)
Effective Searching: Part 3 - Narrow your search (Generic Web)
 
Leveraging artificial intelligence to build algorithmic trading strategies
Leveraging artificial intelligence to build algorithmic trading strategiesLeveraging artificial intelligence to build algorithmic trading strategies
Leveraging artificial intelligence to build algorithmic trading strategies
 
Introduction to machine learning with GPUs
Introduction to machine learning with GPUsIntroduction to machine learning with GPUs
Introduction to machine learning with GPUs
 
Code coverage
Code coverageCode coverage
Code coverage
 
SAS AX 2018 - Manufacturing Insights by William Nadolski
SAS AX 2018 - Manufacturing Insights by William NadolskiSAS AX 2018 - Manufacturing Insights by William Nadolski
SAS AX 2018 - Manufacturing Insights by William Nadolski
 
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxMACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
 
Modern Search: Using ML & NLP advances to enhance search and discovery
Modern Search: Using ML & NLP advances to enhance search and discoveryModern Search: Using ML & NLP advances to enhance search and discovery
Modern Search: Using ML & NLP advances to enhance search and discovery
 
클린 테스트
클린 테스트클린 테스트
클린 테스트
 
Boundary and equivalnce systematic test design
Boundary and equivalnce   systematic test designBoundary and equivalnce   systematic test design
Boundary and equivalnce systematic test design
 
Regulatory Intelligence Series - How to find Predicate Devices SOFIE compared...
Regulatory Intelligence Series - How to find Predicate Devices SOFIE compared...Regulatory Intelligence Series - How to find Predicate Devices SOFIE compared...
Regulatory Intelligence Series - How to find Predicate Devices SOFIE compared...
 
136 latest dot net interview questions
136  latest dot net interview questions136  latest dot net interview questions
136 latest dot net interview questions
 
AI Pharma Summit Keynote Boston 7-26-17
AI Pharma Summit Keynote Boston 7-26-17AI Pharma Summit Keynote Boston 7-26-17
AI Pharma Summit Keynote Boston 7-26-17
 
Data Mining with SQL Server 2008
Data Mining with SQL Server 2008Data Mining with SQL Server 2008
Data Mining with SQL Server 2008
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09
 

Recently uploaded

Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Jshifa
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 

Recently uploaded (20)

Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 

You can do WHAT with GenomeQuest? (Almost) 101 Things You May Not Know

  • 1. 1 Company Confidential Do Not Distribute You can do WHAT with GenomeQuest? (Almost) 101 things you may not know Ellen Sherin Sr. Product Manager GQ Life Sciences Ellen.Sherin@aptean.com
  • 2. 2 Company Confidential Do Not Distribute Overview
  • 3. 3 Company Confidential Do Not Distribute Query Design
  • 4. 4 Company Confidential Do Not Distribute Anticipatory Searching In text searching we try to allow for all possibilities: alternate (flavor/flavour) or mis-spellings, synonyms, other possibilities N76D, N-76-D, N 76 D, N/76/D, asp76asn, ASP-76-ASN, “position 76 may be asp (aspartic acid) or asn (asparagine) or it may be deleted” We do the same for sequence searching, but consider the ways a sequence can be represented, in both query design and result analysis.
  • 5. 5 Company Confidential Do Not Distribute First the Basics - Where do Sequences Come From? How was a sequence described by the inventor? • In generalities? “Any protease from a micro-organism” • As a cross-reference “Genbank accession number ABC12345” • Was a listing filed? Part of a table? Shown in an alignment in an image? • Or is the whole thing very difficult to decipher? • Are there multiple Markush positions, represented by Xaa and described in words? • Are subsequences called out? If it’s not in the listing, or at least written out as a sequence somewhere, it’s probably not in any sequence database!
  • 6. 6 Company Confidential Do Not Distribute US 20150087572 In one aspect, an automatic dishwashing detergent composition comprising a variant protease of a parent protease, said parent protease amino acid sequence being identical to the amino acid sequence of SEQ ID NO:1, said variant protease of said parent protease mutations consisting of one of the following sets of mutations versus said parent protease: (i) N76D + S87R + G118R + S128L + P129Q + S130A Markush Sequences GQ Motif algorithm can find sequences IF they are present in the sequence listing…most variants ARE NOT!
  • 7. 7 Company Confidential Do Not Distribute Retrieves records and/or sequences by patent number, so you can: • Create a saved, searchable virtual database; • Obtain sequence(s) for download and ultimate use in sequence listing, IDS prep, other molecular biology programs; • Review through GUI or download or both; • Link out to public patent sources; • With Platinum subscription, download full text PDF Patent Number Searches GenomeQuest Keyword Interface
  • 8. 8 Company Confidential Do Not Distribute
  • 9. 9 Company Confidential Do Not Distribute Search Setup
  • 10. 10 Company Confidential Do Not Distribute Save Your Preferred Search Settings
  • 11. 11 Company Confidential Do Not Distribute Make Your Own Database Three Different Ways! 1. Upload your own sequences into GQ 2. Through Keyword Search>browse protein (or DNA), filter as desired, and make into database 3. From your search results via Analyses>Extract sequences>Subject sequences These Virtual Databases (vDBs) can be selected to search normally as if they were any regular database like GQ-Pat.
  • 12. 12 Company Confidential Do Not Distribute 1. Upload Your Own Sequences into GQ • Any standard format – GENBANK, EMBL, FASTA file containing one or many sequences • Must be just protein or just nucleotide. • Will show up under MY DATA
  • 13. 13 Company Confidential Do Not Distribute 2. Browse Database, Save as vDB • Just DNA or just protein • Best way is to “browse” DNA or protein databases then filter for your desired criteria • Although a vDB can be created from text search results, if you haven’t “hard- coded” JUST protein or DNA, it won’t show up in your search database choices
  • 14. 14 Company Confidential Do Not Distribute Result View
  • 15. 15 Company Confidential Do Not Distribute 3. Make vDB from Search Results Results must be just DNA or just protein; don’t use mixed results.
  • 16. 16 Company Confidential Do Not Distribute CDR Query Setup • You can design a query to look for CDRs in isolation, or all three in a single subject, or both! • If you want the exact CDR sequence (or sequence with specified variations) MOTIF is the best option, and is the only algorithm that works for a single query, linking all three CDRs. • If you want non-specific variability, then GenePast should be used, limiting number of differences. • Later on we’ll see how to GROUP results to detect one subject comprising multiple queries. • This method is not limited to CDRs; it’s applicable for any group of subsequences contained in a single, longer sequence.
  • 17. 17 Company Confidential Do Not Distribute MOTIF on full length – Direct Strike The long sequence gives hits comprising all three CDRs in the specific order provided. *. Represents “any number of unspecified residues, including zero”. If there is even a single mismatch, or the order is incorrect, it will not be found. If a CDR in the database uses Xaa, it will not be found unless specified in the query sequence as an alternative to the wt amino acid. >37-motif DLSIH.*GFDPQDGETIYAQKFQG.*GSSSSWFDP >9-motif RASQGISSWLA.*GASNLES.*QQANSFPWT Note the relationship between the two long sequences. There were 27 patents hit, and both sequences are present in all 27. LC and HC perhaps?
  • 18. 18 Company Confidential Do Not Distribute Methodology – Searching CDRs All 3 CDRs in subject or patent Here we are searching the three CDRs in isolation. This can be done with either MOTIF or GenePast. Click on the intersection to see all 27 patents that contain all three queries. Extra credit – how many results will you get on the results view? The 27 patents will contain all three CDRs; however, are they present in isolation, in a specific subject, or both? 81 minimum (3 x 27)
  • 19. 19 Company Confidential Do Not Distribute CDR Query Comments • We recommend searching all three (or six) CDRs as individual sequences. • The concatentated query is very useful for a direct strike, but shouldn’t be used exclusively, as you will miss hits to individual CDRs. • This accomplishes the same thing as grouping by subject, but it’s more specific and you get a smaller number of results (1/3 as many in this case). CDR1 CDR2 CDR3anything anything
  • 20. 20 Company Confidential Do Not Distribute • [KX] equivalent to anything, it will retrieve K or anything else, including X, in that position. • Degeneracy characters in subject not found automatically; they have to be searched explicitly. – [KV] will find either K or V, but not X. – [GA] will find either G or A but not R • Degeneracy characters in query interpreted as what they represent: [NACGTURYK MSWBDHV][R GA][YTUC][ SGC][WATU] • Always consider how an inventor might represent a sequence in the listing, and consider either using degeneracy characters (nucleotide) or including an explicit X in protein queries. • There’s a special way to search for that explicit X. • Tip: look at the query sequence in MOTIF results, it will be written out with the degeneracy characters expanded (e.g. N will be written as AGCT) Degeneracy Characters are Difficult!
  • 21. 21 Company Confidential Do Not Distribute For More on Variable Sequences
  • 22. 22 Company Confidential Do Not Distribute SNP Queries • Use the MOTIF algorithm to search for 100% identity to either allele. • Reminder: with a single mismatch anywhere, MOTIF will not find the hit! • GenePast/Blast are also good choices; use coordinate filters to select only those results crossing the SNP region(s).
  • 23. 23 Company Confidential Do Not Distribute Results Overview Intermediate Page
  • 24. 24 Company Confidential Do Not Distribute Lots of Good Information Here • Correct query sequence count • Count of sequences that didn’t have hits • Is your total hit count < your max? • Be sure to understand the Venn – it is on the PATENT level, not subject • For >3 queries, you can use the Statistics Report functionality
  • 25. 25 Company Confidential Do Not Distribute Intermediate Page
  • 26. 26 Company Confidential Do Not Distribute Analysis Report for > 3 Queries
  • 27. 27 Company Confidential Do Not Distribute Intermediate Page Analysis Validate results, look for fundamental issues: • Do I have at least one hit for each of my query sequences? • Repeat this overview after applying each filter set • Did I “max out” my results? • Set my max for 500 (default) and at least one query has 500 results
  • 28. 28 Company Confidential Do Not Distribute Analysis
  • 29. 29 Company Confidential Do Not Distribute Are All My Queries Present?
  • 30. 30 Company Confidential Do Not Distribute You Can Have Many Analysis Views • Multiple views can be saved and switching between them is as simple as a mouse click
  • 31. 31 Company Confidential Do Not Distribute Customized Views for Analysis Numeric Bibliographic
  • 32. 32 Company Confidential Do Not Distribute Views are Created by DEFINE COLUMNS
  • 33. 33 Company Confidential Do Not Distribute • The heart of GQ’s power • Full Boolean with nesting capabilities • Just like views, you can save multiple filters • Very flexible combinations • GUI allows on-the-fly changes – try it, you don’t like it then try something else! – I often use this capability to narrow down large resultsets by finding a cutoff that affects the majority of results • The AUDIT TRAIL page (found in exported Excel files) includes the applied filters. – I add a screenshot of my filters and paste it into the report for readability. GenomeQuest Filters
  • 34. 34 Company Confidential Do Not Distribute My Starting Point Nested Boolean Filter In order to get the most out of GQ, you need to really understand the different percent identities: query, subject and alignment!
  • 35. 35 Company Confidential Do Not Distribute Wildcarding Works! MultipleValuesare“OR”
  • 36. 36 Company Confidential Do Not Distribute Text ANDing
  • 37. 37 Company Confidential Do Not Distribute GenePast Gap Filters • Huge improvement! Converted me from Blast. • Prior issue was Query % ID ignored gaps, so you could get hits with multiple gaps show up as 100% Query ID.
  • 38. 38 Company Confidential Do Not Distribute InDel Detection with Gap Filters INDEL Type Query Gaps Subject Gaps Alignment Insertion mutant 1 0 Deletion mutant 0 1 One of each 1 1 Additional use – INDEL detection! (thanks Bjarne!)
  • 39. 39 Company Confidential Do Not Distribute Can Also Display for InDel Analysis
  • 40. 40 Company Confidential Do Not Distribute SNP Detection – Coordinate Filter • SNP analysis often focuses on specific position(s), therefore the overall % identities are frequently irrelevant. • Only those alignments that cover the region of interest will pass screening • Use coordinates to narrow to these regions
  • 41. 41 Company Confidential Do Not Distribute SNP Detection – Coordinate Filters Example : SNP is at position 1501 Filter for Query Start <=1500 and Query Stop >=1502
  • 42. 42 Company Confidential Do Not Distribute Viewing Alignments
  • 43. 43 Company Confidential Do Not Distribute • Results can be grouped for immediate feedback How many families/patents/sequences pass these filters? Are there any hits (including SIDs) that contain multiple query sequences? Which subjects contain: My three CDRs? My unique promoter and gene? Variation 1 but not Variation 2? Grouping
  • 44. 44 Company Confidential Do Not Distribute Grouping Find queries (or patents or families) with a disproportionate hit count
  • 45. 45 Company Confidential Do Not Distribute A New Way to Analyze Data – GQ’s New Result Browser • Simplified Interface • Very different viewing and analysis paradigm • YOU CAN EXPORT ALIGNMENTS (coming right up!)
  • 46. 46 Company Confidential Do Not Distribute Single Step Analysis of Patent, Family, UFS Distribution
  • 47. 47 Company Confidential Do Not Distribute Universal Family Sequence A Special Beast • Purpose is to show distribution of the identical sequence within a given family • The identical sequence may have many different UFS values . Any given UFS value is only unique for a single family. • THERE IS NO GUARANTEE THAT THE SEQ ID NO AND/OR PSL IS IDENTICAL FOR A GIVEN UFS throughout the family. • It is extremely useful for studying the distribution of a sequence hit of interest throughout a family
  • 48. 48 Company Confidential Do Not Distribute UFS PSL, SID variability
  • 49. 49 Company Confidential Do Not Distribute Reporting Tips & Tricks
  • 50. 50 Company Confidential Do Not Distribute Share Your Results 1. Create a folder for results 2. Make it a shared folder 3. Set Permissions 4. Move Results to Folder
  • 51. 51 Company Confidential Do Not Distribute Reporting Tricks & Tips • Share results with other GQ users • Visualize subjects, patents or both containing multiple query sequences • View alignments adjacent to full text of claims through LifeQuest • IDS and ST25 Preparation • Export alignments • Family portrait, result analysis • Excel tips: – Freeze top row – Link back from Excel to each alignment and make that link available to other licensed GQ users – Prepare Excel pivot tables summarizing search results which can easily be changed to summarize results by many different parameters
  • 52. 52 Company Confidential Do Not Distribute Visualize Multiple Queries Aligned to a Single Subject
  • 53. 53 Company Confidential Do Not Distribute View Claims Adjacent to Alignments
  • 54. 54 Company Confidential Do Not Distribute • Export as FASTA and import into Excel (requires a little manipulation); • May want second tabular export for organism and molecule type. • Be sure to have sequences properly ordered; • Use Excel formulae to clean up and error check, then convert into PatentIn import format. <my-seq-name;moltype;organism> sequence • This does take some Excel skill to do right! GQ Can Help Prepare Sequence Listings
  • 55. 55 Company Confidential Do Not Distribute Generating Sequence Documents for IDS Prep • IDS (information disclosure statements) may be filed during prosecution, either on the initiative of the patent practitioner, or in response to an Office Action. • These are essentially citations, and may be journal reference, patents filings, or sequence documents, or any combination. • The sequence documents are really easy to prepare from GQ, and with minimal training, may be done by clerical workers or other assistants. No knowledge of sequence is needed.
  • 56. 56 Company Confidential Do Not Distribute Sample Genbank-Formatted Export for IDS • Uses standard sequence export interface. • Sequences can be obtained from regular search results or by keyword search. • Can export multiple sequences but they will need to be broken out into individual files.
  • 57. 57 Company Confidential Do Not Distribute NRB - Excel Export of Alignments nrb export.xls
  • 58. 58 Company Confidential Do Not Distribute Family Portrait Report Click on a family to see the list of patents matching your sequence
  • 59. 59 Company Confidential Do Not Distribute Analysis Report for > 3 Queries
  • 60. 60 Company Confidential Do Not Distribute Excel Link Backs & Pivot Table Sample export for link back.xlsx
  • 61. 61 Company Confidential Do Not Distribute Acknowledgments • Bjarne Due Larsen • Mary Jane Reeve • Steven Altman • Bob March • Bill Perkins • Henk Heus • Danyu Wu • Stephen Allen

Editor's Notes

  1. I don’t know anything about saving our searches – we just look at the front page;)
  2. Be aware that the word “patent family” is a little difficult for the scientist to understand. I mainly use grouping to get an overview over the families – will be interesting to see what else the grouping can be used for
  3. We need to explain why there are e.g. 148 results in the first line…