The application of cloud computing
to Royal Society of Chemistry data
platforms
Valery Tkachenko, Ken Karapetyan, Jon Stee...
ChemSpider
RSC Archive
RSC Chemistry Platform
Big Data world and chemistry
Data quality
Cloud Computing considerations
• ~30 million chemicals and growing
• Data sourced from >500 different sources
• Live depositions
• Live crowd curation an...
ChemSpider – user view
ChemSpider – under the hood
ChemSpider – load over years
2007
•1 visitor (there is always the first one)
2009
•3000 – 7000 visits/day
2014
•50000 visi...
ChemSpider – bottlenecks analysis
• “Live” database
o Read-only is easier to scale-out
• Application server(s)
o Standard ...
ChemSpider – scaling out
ChemSpider – geography
Globalization
Localization
CDN
ChemSpider
RSC Archive
RSC Chemistry Platform
Big Data world and chemistry
Data quality
Cloud Computing considerations
RSC Archive – since 1841
Published article example
Compounds
Reaction
Analytical Data
Text and References
New navigation style
What’s the
structure?
What’s the
structure?
Are they in
our file?
Are they in
our file?
What’s
simila...
ChemSpider
RSC Archive
RSC Chemistry Platform
Big Data world and chemistry
Data quality
Cloud Computing considerations
New architecture
Compounds Reactions Spectra Crystals Documents
Compounds
API
Reactions
API
Spectra
API
Crystals
API
Docum...
ChemSpider
RSC Archive
RSC Chemistry Platform
Big Data world and chemistry
Data quality
Cloud Computing considerations
We are a part of a much larger world
APIs, endpoints and widgets
Challenges of the Big Data
indexing, navigation, visualization
Managing Big Data
Consuming Big Data
ChemSpider
RSC Archive
RSC Chemistry Platform
Big Data world and chemistry
Data quality
Cloud Computing considerations
Chemistry Validation and Standardization
Platform
ChemSpider
RSC Archive
RSC Chemistry Platform
Big Data world and chemistry
Data quality
Cloud Computing considerations
Cloud continuum
Cloud services from major players
Big Data in a Cloud whoops…
Summary
Cloud definition is foggy
Demands for computing resources is growing
tremendously as we move into a Big Data world...
Thank you
Email: tkachenkov@rsc.org
Slides: http://www.slideshare.net/valerytkachenko16
Upcoming SlideShare
Loading in...5
×

The application of cloud computing to royal society of chemistry data platforms

194

Published on

Cloud computing offers significant advantages for the hosting of RSC chemistry databases in terms of reliability, performance and access to large scale computational power. The ChemSpider database contains almost 30 million unique chemical compounds and access to compute power to regenerate properties and add new properties is essential for efficient delivery on a manageable timescale. The use of cloud-based facilities reduces the needs for internal infrastructure and enhances performance generally at the cost of significant recoding of the platforms. This presentation will review our move of our ChemSpider related projects to the cloud, the associated challenges and both the obvious and unforeseen benefits. We will also discuss our use of parallelization technologies for mass calculation using Hadoop.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
194
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The application of cloud computing to royal society of chemistry data platforms

  1. 1. The application of cloud computing to Royal Society of Chemistry data platforms Valery Tkachenko, Ken Karapetyan, Jon Steele, Alexey Pshenichnov, Antony J. Williams ACS 247th National Meeting Dallas, TX March 18th 2014
  2. 2. ChemSpider RSC Archive RSC Chemistry Platform Big Data world and chemistry Data quality Cloud Computing considerations
  3. 3. • ~30 million chemicals and growing • Data sourced from >500 different sources • Live depositions • Live crowd curation and annotation • A structure centric hub for web-searching
  4. 4. ChemSpider – user view
  5. 5. ChemSpider – under the hood
  6. 6. ChemSpider – load over years 2007 •1 visitor (there is always the first one) 2009 •3000 – 7000 visits/day 2014 •50000 visits/day •40000 unique visitors/day •150000 page views/day •100 – 400 real-time visitors
  7. 7. ChemSpider – bottlenecks analysis • “Live” database o Read-only is easier to scale-out • Application server(s) o Standard ways to scale o Session persistence • SQL server(s) o Expensive, but not all data are relational - NoSQL o Overhead for replication o Alternatives do not work well for “live” databases • Backend (processing) server(s) o Use of grid computing • UI technology o ASP.NET Forms o MVC/REST • Software as a Service (SaaS) o API o Widgets o High-scalability
  8. 8. ChemSpider – scaling out
  9. 9. ChemSpider – geography Globalization Localization CDN
  10. 10. ChemSpider RSC Archive RSC Chemistry Platform Big Data world and chemistry Data quality Cloud Computing considerations
  11. 11. RSC Archive – since 1841
  12. 12. Published article example Compounds Reaction Analytical Data Text and References
  13. 13. New navigation style What’s the structure? What’s the structure? Are they in our file? Are they in our file? What’s similar? What’s similar? What’s the target? What’s the target?Pharmacology data? Pharmacology data? Known Pathways? Known Pathways? Working On Now? Working On Now?Connections to disease? Connections to disease? Expressed in right cell type? Expressed in right cell type? Competitors?Competitors? IP?IP?
  14. 14. ChemSpider RSC Archive RSC Chemistry Platform Big Data world and chemistry Data quality Cloud Computing considerations
  15. 15. New architecture Compounds Reactions Spectra Crystals Documents Compounds API Reactions API Spectra API Crystals API Documents API Compounds Widgets Reactions Widgets Spectra Widgets Crystals Widgets Documents Widgets Data tier Data access tier User interface components tier Analytical Laboratory application User interface tier (examples) Electronic Laboratory Notebook Paid 3rd party integrations (various platforms – SharePoint, Google, etc) Chemical Inventory application
  16. 16. ChemSpider RSC Archive RSC Chemistry Platform Big Data world and chemistry Data quality Cloud Computing considerations
  17. 17. We are a part of a much larger world
  18. 18. APIs, endpoints and widgets
  19. 19. Challenges of the Big Data indexing, navigation, visualization
  20. 20. Managing Big Data
  21. 21. Consuming Big Data
  22. 22. ChemSpider RSC Archive RSC Chemistry Platform Big Data world and chemistry Data quality Cloud Computing considerations
  23. 23. Chemistry Validation and Standardization Platform
  24. 24. ChemSpider RSC Archive RSC Chemistry Platform Big Data world and chemistry Data quality Cloud Computing considerations
  25. 25. Cloud continuum
  26. 26. Cloud services from major players
  27. 27. Big Data in a Cloud whoops…
  28. 28. Summary Cloud definition is foggy Demands for computing resources is growing tremendously as we move into a Big Data world Moving into the Cloud is not an “if” question, it’s a “when” question It’s also a question of timing, budgets and resources
  29. 29. Thank you Email: tkachenkov@rsc.org Slides: http://www.slideshare.net/valerytkachenko16
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×