Biomatters and Amazon Web Services

496 views

Published on

Steven Stones-Havas spoke about how the Biomatters WebApp Development Group creates visualizations of biological data at the Amazon Web Summit in Auckland, May 2013 http://aws.amazon.com/aws-summit-2013/auckland/

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
496
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Biomatters has been around since 2003, specialising in visualisation and interpretation of digital biological dataThe volumes of digitised biological data have exploded in recent years – this has created a unique opportunity as genetic analysis has become cost effective to use in the clinic for the first time.Our software brings targeted genetic analyses to the cloud, coupled with intuitive visualisations of complex data.We combine the results of analysis with data from other relevant sources (e.g. patient data, knowledge databases) to provide actionable reports for clincians
  • This is an example of one of our visualisations – it allows a clinician to compare a patient’s DNA to a ‘reference’ human genome and look at the differencesIt works like Google Maps, but for the human genomeWe overlay data from a number of external data sources to put information at their fingertipsLet’s say that the clinician notices the patient has a particular variation in their DNAWe can quickly bring up information about the variation – we can see that has been associated with baldnessWe can see publications related to the variationWe can even look at the structure of proteins produced by the genes around the variation
  • Mobile friendly means that only a small amount of data can be stored on the client device at any one time – we need to be able to rapidly retrieve more information from the server as requiredTile Rendering – we can cache tiles on either side of the viewport, but we need really fast lookups on the database to make the app smoothSecure – If we’re dealing with medical data it has to be absolutely secureLocal Deployment – Some organisations (particularly medical ones) don’t have clear guidelines about how to deal with the cloud – regulations and policy prevent data from leaving their site
  • One private and one public subnet for each Availability ZoneELB (and bastion host) in the public subnetsWeb nodes and database in private subnets – connect to net through NATMulti-AZ (will cover database stack later)Setup:Used the VPC wizzard then customisedVery important to configure correctly – get routing tables and security groups correct
  • For Amazon config we had some trouble using command line tools, so we used the java API and wrote custom Ant tasks
  • Database is a MongoDB clusterCan’t use Dynamo or RDS because of local deployment requirementMongo is highly scalable, a NoSQL type database but supports advanced features like automatic shardingOur base unit of storage is a pair of 50GB EBS volumes in raid0Database nodes spread across three AZ’s – so mongo has at least two nodes runningLogical volume (using LVM) allows us to scale up size as required, but scaling is a manual processThe file system is encrypted at the logical volume levelXFS file system allows us to perform online resizing of the volume
  • Database is a MongoDB clusterCan’t use Dynamo or RDS because of local deployment requirementMongo is highly scalable, a NoSQL type database but supports advanced features like automatic shardingOur base unit of storage is a pair of 50GB EBS volumes in raid0Database nodes spread across three AZ’s – so mongo has at least two nodes runningLogical volume (using LVM) allows us to scale up size as required, but scaling is a manual processThe file system is encrypted at the logical volume levelXFS file system allows us to perform online resizing of the volumeOur cluster is spread across three availability zones so that MongoDB will still have two nodes in the event of an outage
  • Job is inserted into incoming job queue (status in db=NEW)Job is picked up by a Melanoma job processor node (status in db updated to PROCESSING)Output written to S3Completed job inserted into completed job queue (status in db updated to COMPLETE)Job is picked up by emailNotifier service, which sends an email and updates job status to USER_NOTIFIED
  • Biomatters and Amazon Web Services

    1. 1. • Founded in 2003• Sophisticated, intuitive Visualisation and Interpretation ofGenetic data• Targeted Analysis Workflows• Actionable Results• We’re Hiring!
    2. 2. Genome Browser - Requirements• Smooth, intuitive experience in the browser– JavaScript/HTML5– Mobile friendly• Tile Rendering– Like Google Maps– Requires fast database lookups• Secure– Data must be encrypted at rest and in transit• Local-deployable– Some customers not ready for cloud
    3. 3. Architecture• Initial Architecture– On EC2– One autoscaling group (and ELB)– One Availability Zone• Revised Architecture– VPC across two Availability Zones– Private subnets for security
    4. 4. VPC ArchitectureELBPublic Subnets Private SubnetsAZ 1AZ 2MASTERMIN=0 MAX=2MIN=1 MAX=3DB ClusterInternetNAT
    5. 5. Web Stack• Tomcat behind Apache• Session info stored in Elasticache• Monitoring– Healthcheck Ping URL for the load balancer– Cloudwatch CPU alarms for autoscaling• Autoscaling– Scales from 2 to 6 machines depending on load– For > 6 machines, the database becomes the bottleneck• Deployment– Automatic deployment with no downtime
    6. 6. Automatic Deployment1. Deploy latest code to master web node (through Tomcatmanager)2. Shutdown master tomcat3. Take AMI snapshot4. Restart master webnode, and wait for ping URL to respond5. Teardown existing autoscaling config6. Set up new autoscaling config
    7. 7. Database• Local Deployment Requirement– Can’t use RDS or Dynamo• MongoDB– Highly scalable NoSQL– Supports Advanced features
    8. 8. Database• Base unit – pair of 50GB volumes in Raid0• 100GB Logical Volume (LVM)• Encryption Layer• XFS File System– Can grow without unmounting• Scaling– Storage scaling is manual– Performance scaling could be automatic• Need to scale preemptively
    9. 9. Job ProcessingDB ClusterIncoming Job QueueWeb AppProcessing NodeCompleted Job QueueNotification NodeS3Status=NEWStatus=PROCESSINGStatus=COMPLETEStatus=NOTIFIED
    10. 10. Overview• Multi-Availability Zone VPC with public and private subnets• ELB in front of Auto-Scaling web nodes• Statically scaled MongoDB Cluster• Encrypted volumes• Simple Queue Service for job processing• We’re Hiring!
    11. 11. Thank you!

    ×