Experiences and adventures with no sql and its applications to cheminformatics data

610 views

Published on

The Royal Society of Chemistry hosts an increasing number of chemistry related databases and have utilized SQL-based technologies for our development platforms in general. In recent years the interest in noSQL databases has exploded as the associated technologies have developed and have shown great promise in terms of enhanced performance. We have collaborated with GGA Software Services to implement their noSQL technologies and have integrated it into the compound repository presently being developed as part of the underpinning architecture for compound data management at the RSC. This presentation will provide an overview of the reasons why we have integrated a noSQL solution, quantitative analysis of the benefits of inclusion and our thoughts regarding further approaches to optimize search performance for the chemical compound repository.

Published in: Science
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
610
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Information typically associated with reactions
  • Change to add more database, rearrange
  • That’s the world we live in
  • Experiences and adventures with no sql and its applications to cheminformatics data

    1. 1. Experiences and adventures with noSQL and its applications to cheminformatics data Valery Tkachenko, Antony Williams, Ken Karapetyan, Alexey Pshenichnov, Mikhail Rybalkin ACS, 248th National Meeting San Francisco, CA August 14th 2014
    2. 2. Chemistry research
    3. 3. Standard designs
    4. 4. Scientific article Compounds Reaction Analytical Data Text and References
    5. 5. Compounds model
    6. 6. Reaction datamodel
    7. 7. Analytical Data Model
    8. 8. Research Knowledge Model
    9. 9. RSC Databases RSC Compounds RSC Reactions RSC Spectra RSC Crystals RSC Polymers RSC Materials RSC Assays RSC Algorithms RSC Models …and on…
    10. 10. Compounds domain
    11. 11. Reactions domain
    12. 12. Reactions domain
    13. 13. Analytical data domain
    14. 14. Crystallography domain
    15. 15. APIs, endpoints and widgets
    16. 16. Technical view - unification
    17. 17. Chemistry Validation and Standardization Platform
    18. 18. Input pipeline
    19. 19. Output pipeline
    20. 20. Scaling approaches
    21. 21. Federated linked system
    22. 22. Federated repositories • Privacy • Security • Authenticity • Safety • Deployment • Access • Etc
    23. 23. SQL database issues • Scalability
    24. 24. noSQL cartridge performance Query Bingo Standalone Module Bingo for SQL Server Time Difference (how standalone version works faster)# Smiles Hits Time (sec) Hits Time (sec) First 10000 hits All 1C(CN)C(F)F 53684 4.51 14.91 53684 55.56 40.65 2[SiH](C)(C)C[SiH2]C 2740 7.59 7.59 2740 26.99 19.41 3C1(=O)CCCC1C 57397 7.69 25.86 57397 39.59 13.73 4C1CCCC=1N=C 1652 6.21 6.21 1652 27.62 21.40 5C(OCC)/C=CC 672278 1.75 77.88 672278 94.25 16.37 6C1N(C=NC1)CC 21986 11.92 14.50 21986 43.16 28.66 7C1(C)=CCC(C)=C1 8982 7.16 7.16 8982 10.11 2.95 8P(=O)(O)CCCC 28403 7.34 9.25 28403 12.18 2.93 9C1(I)=CN=CC=C1 3012 8.36 8.36 3012 9.18 0.82 10C1(CC)=CC=CO1 291005 1.48 24.67 291005 39.79 15.13 11SC1N=CC=CC=1 130485 2.08 17.92 130485 24.09 6.17 12C1N=NSC=1Cl 2348 5.61 5.61 2348 8.33 2.72 13C(/C1CC1)=NN 4769 7.82 7.82 4769 11.43 3.62 14N1=CC=CN=C1 1166342 0.82 70.29 1166342 141.77 71.48 15C1=NON=C1C 15718 7.30 7.79 15718 34.17 26.38 16C(CCC)CC 6541041 0.46 421.05 6541041 664.98 243.94 17C1(=NN=CN1)C 864513 1.08 56.80 864513 137.24 80.45 18[Cr](=O)O[Cr]([O-])=O 127 3.36 3.36 127 6.94 3.58 19[BH2-]1[NH2+]C=CCN1 1 2.36 2.36 1 6.70 4.35 20P(O)(OC)CC 37364 4.09 7.22 37364 9.95 2.72 21C(CN)S(=O)=O 301168 2.99 29.10 301168 41.71 12.61 22C(C)COC=O 1236803 0.82 123.43 1236803 149.82 26.39 23N1(NNCN1)C 122 4.68 4.68 122 8.13 3.45 24C(=C)/C=NC=N 23916 9.93 12.21 23916 18.50 6.29 25CNCCOC 4503264 0.62 262.43 4503264 441.03 178.60 26C(CO)CCS 148233 11.49 128.54 148233 34.09 -94.45 27C1(S)C=CNC=1 35624 7.49 10.81 35624 23.02 12.21 28S([O-])(=O)(=O)CC 17442 7.79 9.11 17442 15.67 6.56 29P(OPN)(N)N 75 4.73 4.73 75 15.05 10.32
    25. 25. Thank you Email: tkachenkov@rsc.org Slides: http://www.slideshare.net/valerytkachenko16

    ×