• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
The application of cloud computing to royal society of chemistry data platforms
 

The application of cloud computing to royal society of chemistry data platforms

on

  • 140 views

Cloud computing offers significant advantages for the hosting of RSC chemistry databases in terms of reliability, performance and access to large scale computational power. The ChemSpider database ...

Cloud computing offers significant advantages for the hosting of RSC chemistry databases in terms of reliability, performance and access to large scale computational power. The ChemSpider database contains almost 30 million unique chemical compounds and access to compute power to regenerate properties and add new properties is essential for efficient delivery on a manageable timescale. The use of cloud-based facilities reduces the needs for internal infrastructure and enhances performance generally at the cost of significant recoding of the platforms. This presentation will review our move of our ChemSpider related projects to the cloud, the associated challenges and both the obvious and unforeseen benefits. We will also discuss our use of parallelization technologies for mass calculation using Hadoop.

Statistics

Views

Total Views
140
Views on SlideShare
140
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    The application of cloud computing to royal society of chemistry data platforms The application of cloud computing to royal society of chemistry data platforms Presentation Transcript

    • The application of cloud computing to Royal Society of Chemistry data platforms Valery Tkachenko, Ken Karapetyan, Jon Steele, Alexey Pshenichnov, Antony J. Williams ACS 247th National Meeting Dallas, TX March 18th 2014
    • ChemSpider RSC Archive RSC Chemistry Platform Big Data world and chemistry Data quality Cloud Computing considerations
    • • ~30 million chemicals and growing • Data sourced from >500 different sources • Live depositions • Live crowd curation and annotation • A structure centric hub for web-searching
    • ChemSpider – user view
    • ChemSpider – under the hood
    • ChemSpider – load over years 2007 •1 visitor (there is always the first one) 2009 •3000 – 7000 visits/day 2014 •50000 visits/day •40000 unique visitors/day •150000 page views/day •100 – 400 real-time visitors
    • ChemSpider – bottlenecks analysis • “Live” database o Read-only is easier to scale-out • Application server(s) o Standard ways to scale o Session persistence • SQL server(s) o Expensive, but not all data are relational - NoSQL o Overhead for replication o Alternatives do not work well for “live” databases • Backend (processing) server(s) o Use of grid computing • UI technology o ASP.NET Forms o MVC/REST • Software as a Service (SaaS) o API o Widgets o High-scalability
    • ChemSpider – scaling out
    • ChemSpider – geography Globalization Localization CDN
    • ChemSpider RSC Archive RSC Chemistry Platform Big Data world and chemistry Data quality Cloud Computing considerations
    • RSC Archive – since 1841
    • Published article example Compounds Reaction Analytical Data Text and References
    • New navigation style What’s the structure? What’s the structure? Are they in our file? Are they in our file? What’s similar? What’s similar? What’s the target? What’s the target?Pharmacology data? Pharmacology data? Known Pathways? Known Pathways? Working On Now? Working On Now?Connections to disease? Connections to disease? Expressed in right cell type? Expressed in right cell type? Competitors?Competitors? IP?IP?
    • ChemSpider RSC Archive RSC Chemistry Platform Big Data world and chemistry Data quality Cloud Computing considerations
    • New architecture Compounds Reactions Spectra Crystals Documents Compounds API Reactions API Spectra API Crystals API Documents API Compounds Widgets Reactions Widgets Spectra Widgets Crystals Widgets Documents Widgets Data tier Data access tier User interface components tier Analytical Laboratory application User interface tier (examples) Electronic Laboratory Notebook Paid 3rd party integrations (various platforms – SharePoint, Google, etc) Chemical Inventory application
    • ChemSpider RSC Archive RSC Chemistry Platform Big Data world and chemistry Data quality Cloud Computing considerations
    • We are a part of a much larger world
    • APIs, endpoints and widgets
    • Challenges of the Big Data indexing, navigation, visualization
    • Managing Big Data
    • Consuming Big Data
    • ChemSpider RSC Archive RSC Chemistry Platform Big Data world and chemistry Data quality Cloud Computing considerations
    • Chemistry Validation and Standardization Platform
    • ChemSpider RSC Archive RSC Chemistry Platform Big Data world and chemistry Data quality Cloud Computing considerations
    • Cloud continuum
    • Cloud services from major players
    • Big Data in a Cloud whoops…
    • Summary Cloud definition is foggy Demands for computing resources is growing tremendously as we move into a Big Data world Moving into the Cloud is not an “if” question, it’s a “when” question It’s also a question of timing, budgets and resources
    • Thank you Email: tkachenkov@rsc.org Slides: http://www.slideshare.net/valerytkachenko16