Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

GenomeQuest 101


Published on

GenomeQuest 101

Published in: Science
  • Be the first to comment

  • Be the first to like this

GenomeQuest 101

  1. 1. Sequence Search/Comparison/Analysis Stephen Allen- Solutions Consultant
  2. 2. Authority Document Count Sequence Count Database USA 320,873 215,305,722 Gold+ EPO 108,362 37,883,488 Gold+ WIPO 144,292 74,293,342 Gold+ Japan 104,108 27,355,841 Gold+ China 78,683 1,029,562 Platinum India 6,446 69,071 Platinum Canada 57,671 24,026,839 Gold+ Brazil 2,134 39,001 Platinum Others 81,148 3,913,808 Gold+ Total 903,717 383,916,674 Country Coverage World’s Largest Sequence Database
  3. 3. GQ Gold+ vs Platinum Topic Gold+ PLATINUM Traditional All Patents ST.25 Listings From US, EPO, WIPO, Korea, Japan All Patents ST.25 Listings From US, EPO, WIPO, Korea, Japan Traditional and Manual Curation GQ-Pat Sequences (including non-ST.25) from US, EPO, WIPO, Korea, Japan plus the following Authorities: AT, AU, BE, CA, CH, DE, ES, FR, GB, LU, NL, NO, TW GQ-Pat Sequences (including non-ST.25) from US, EPO, WIPO, Korea, Japan plus the following Authorities: AT, AU, BE, CA, CH, DE, ES, FR, GB, LU, NL, NO, TW  BRIC Country Documents: CN, BR, IN, RU + Emerging Country Documents Features Extended Legal Status (ELS) Extended Legal Status (ELS) Normalized Patent Assignee; Parent Normalized Patent Assignee; Parent Unique Family Sequence (UFS) Unique Family Sequence (UFS)  Access to PDF Downloads  Family Portrait Report
  4. 4. Results
  5. 5. Results Pre & Post filtering 560K sequences 2K sequencesFilter
  6. 6. Getting to Your Results  ALGORITHMS • Searches can be done in a broad inclusive manner by selecting the correct algorithm and a few basic settings  FILTERS • Broad searches can be narrowed quickly based on homology data, legal status, and many other critera  VIEWS • Views allow you to tailor the display to your liking – with specific columns and intelligent grouping
  7. 7. Search Setup
  8. 8. Filters • Filter your search based on specific legal status, homology, authority, or many other categories • Save your favorite – frequently used filters • Save multiple filters– different filters for different searches • Filters are categorized for fast access • Categories include alignment properties, subject text, subject dates, subject properties etc. • Filters reduce reported hits based on your criteria
  9. 9. Views & Grouping • Choose how to display your data on Results page • Tailored views are also used for Excel Table Export • Add Columns to View with Display List • Display fields are similar to filter fields • Display categories similar to filter categories • Save favorite – frequently used views • Save multiple views – different views for different searches • Group based on specific criteria • Patent ID, Patent Family, Patent Assignee • Display all records in group, or subset for streamlined analysis
  10. 10. Details and Alignments
  11. 11. LifeQuest Consolidated Sequence & Text Searching
  12. 12. Filter with LQ markup Filter by Stars Filter by Color
  13. 13. • Sequence Search • Filter • Export Results to LQ • Mark to distinguish sequence searches • LQ text search • ( ttl_abst_clm:IL-17*^5 OR ttl_abst_clm:IL17*^5) AND antibod*) • Mark to distinguish text search • Unite! • Filter • Highlight key hits • Export • Filter within Excel Sample Workflow
  14. 14. Non Sequence IP
  15. 15. Claims & Alignments Quickly add columns
  16. 16. Post Filtering Post Filter sequence searches, text searches, or combined searches
  17. 17. Additional Linkouts
  18. 18. Contact us at: Bill Questions?
  19. 19. LifeQuest • Unite Sequence Based & Text Based Searches • Create Virtual Sequence Database from LQ Results
  20. 20. Nested – Savable Filters Complex Boolean filters • Nested filters for fine tuning • Save standard filters for easy application
  21. 21. Alerts: See what’s new
  22. 22. Contact us at: Bill Questions?
  23. 23. Supplementary Slides Please contact with any questions
  24. 24. Q: S: LOCAL ALIGNMENT Part of the Query matches part of the Subject. BLAST, FASTA, and Smith & Waterman. S: Q: GLOBAL ALIGNMENT All of the Query matches all of the Subject. Needleman & Wunsch and algorithms like it. Q: S: BEST FIT ALIGNMENT All of the Query is fitted into the Subject. GenePast. Ideal for patent sequence searching. Alignment Types
  25. 25. Alignment Subject % ID Query % ID Subject % Coverage Query % Coverage 100% 100% 100% 100% 100% 50% 100% 50% 50% 100% 50% 100% 50% 50% 50% 50% 95% 95% 100% 100% Alignment % identity, corrected for the ratio of the alignment length to either the query or subject length. Query/Subject % Identity Definition This example assumes 100% alignment identity, the longer lines are 100 residues, the shorter lines are 50 residues. • By filtering for 100% subject coverage you can capture CDR to CDR matches • With variability % ID can drop, so % coverage is the preferable filter • This is a key feature to understand – these filters are very powerful 5 mismatches
  26. 26. Key Fields Legal Status Extended Legal Status And National Phase Legal Status US PAIR Legal Status • PAIR Legal status – Updates from US PAIR occur Monthly Live Links to Reports, Alignments • Links on analysis page carry over to Excel Reports • Simple Easy Sharing among groups Microsoft Excel 97 - 2004 Worksheet
  27. 27. Short sequences need GenePAST or Motif searches (BLAST may miss patents) • For short Query sequences – or for easy analysis of variants, GenePAST is the preferred algorithm.
  28. 28. MOTIF on full length – Direct Strike The long sequence gives hits comprising all three CDRs in the specific order provided. *. Represents “any number of unspecified residues, including zero”. Motif searches require 100% match in “defined” residues. >37-motif DLSIH.*GFDPQDGETIYAQKFQG.*GSSSSWFDP >9-motif RASQGISSWLA.*GASNLES.*QQANSFPWT
  29. 29. Unique Family Sequence UFS • Merge all identical sequences within a family • Based on strict criteria: identical sequence, patent family, sequence length • Examine a sequence’s status across authorities • Group By UFS can replace group by family for finer resolution of unique hits • UFS Identifier = MD5Sum + Sequence Length + Family ID • UFS IDs can be transient Normalized Sequence/Patent Family
  30. 30. Methodology – Searching CDRs All3CDRs(orprimer/ampliconsets)insubjectorpatent MOTIF – exact match GenePAST – variations By requiring a group size equal to three in the post search grouping – we show patents that contain all three CDRs • Fasta sequences for your search allows multiple queries at once • GenePAST will allow you to view patent hits with variability in the CDRs
  31. 31. Conservative Substitutions Subjectscomprisingall3CDRS Upto1substitution Subject and Query Gaps • Gaps in CDRs and primers can be ignored using the Query/Subject gap filter • Variations – i.e. number of differences can be adjusted without calculating % identity
  32. 32. Database Selection Tree Structure and Virtual Databases • Tree structure allows easy database search setup • Multiple virtual databases can be chosen • Virtual databases can be shared among teams • Save your own databases from keyword or IP searches – and search within results
  33. 33. Patent Statistics Report • For multiple queries quickly display patents that contain all or a subset of the queries