Cluster Filesystems and the next 1000 Human genomes Guy Coates Wellcome Trust Sanger Institute
Introduction
About the Institute <ul><li>Funded by Wellcome Trust. </li><ul><li>2 nd  largest research charity in the world.
~700 employees. </li></ul><li>Large scale genomic research. </li><ul><li>Sequenced 1/3 of the human genome (largest single...
We have active cancer, malaria, pathogen and genomic variation studies. </li></ul><li>All data is made publicly available....
New technology Sequencing
Sequencing projects at the Sanger “The” Human Genome Project <ul><ul><li>Worldwide collaboration, 6 countries, 5 major cen...
13 years. </li></ul></ul>1000 Genomes project. <ul><ul><li>Study variation in human populations.
1000 genomes over 3 years by 5 centres.
We have agreed to do 200 genomes. </li></ul></ul>And the rest. <ul><ul><li>We have cancer, malaria, pathogen, worm, human ...
How is this achievable? Moore's Law of Sequencing. <ul><ul><li>Cost of sequencing halves every 2 years.
Driven by multiple factors. </li></ul></ul>Economies of Scale: <ul><ul><li>Human Genome Project: 13 years, 23 labs,$500 Mi...
Cost today: $10 Million, several months in a single large genome centre. </li></ul></ul>New sequencing technologies: <ul><...
$100,000 for a human genome.
Single machine, 3 days. </li></ul></ul>
New sequencing technologies Capillary sequencing. <ul><ul><li>96 sequencing reactions carried out per run.
0.5-1 hour run time. </li></ul></ul>Illumina sequencing. <ul><ul><li>52 Million reactions per run.
3 day run time. </li></ul></ul>Machines are cheap (ish) and small. <ul><ul><li>We can buy lot of them. </li></ul></ul>
Data centre <ul><li>4x250 M 2  Data centres. </li><ul><li>2-4KW / M 2  cooling.
3.4MW power draw. </li></ul><li>Overhead aircon, power and networking. </li><ul><li>Allows counter-current cooling.
More efficient. </li></ul><li>Technology Refresh. </li><ul><li>1 data centre is an empty shell. </li><ul><li>Rotate into t...
Refurb one of the in-use rooms with the current state of the art. </li></ul><li>“Fallow Field” principle. </li></ul></ul>r...
Highly Disruptive Sequencing centre runs 24x7 Peak capacity of capillary  sequencing: <ul><ul><li>3.5 Gbases / month.  </l...
1 T Base /month predicted for Sept. </li></ul></ul>Total sequence deposited in genbank for all time. <ul><ul><li>200 Gbase...
Upcoming SlideShare
Loading in …5
×

Cluster Filesystems and the next 1000 human genomes

882 views
822 views

Published on

Details of our first attempts to deal with our next-generation sequencing machines.

Talk given at International Supercomputing, 2008.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
882
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cluster Filesystems and the next 1000 human genomes

  1. 1. Cluster Filesystems and the next 1000 Human genomes Guy Coates Wellcome Trust Sanger Institute
  2. 2. Introduction
  3. 3. About the Institute <ul><li>Funded by Wellcome Trust. </li><ul><li>2 nd largest research charity in the world.
  4. 4. ~700 employees. </li></ul><li>Large scale genomic research. </li><ul><li>Sequenced 1/3 of the human genome (largest single contributor).
  5. 5. We have active cancer, malaria, pathogen and genomic variation studies. </li></ul><li>All data is made publicly available. </li><ul><li>Websites, ftp, direct database. access, programmatic APIs. </li></ul></ul>
  6. 6. New technology Sequencing
  7. 7. Sequencing projects at the Sanger “The” Human Genome Project <ul><ul><li>Worldwide collaboration, 6 countries, 5 major centres, many smaller labs.
  8. 8. 13 years. </li></ul></ul>1000 Genomes project. <ul><ul><li>Study variation in human populations.
  9. 9. 1000 genomes over 3 years by 5 centres.
  10. 10. We have agreed to do 200 genomes. </li></ul></ul>And the rest. <ul><ul><li>We have cancer, malaria, pathogen, worm, human variation WTCCC2 etc. </li></ul></ul>
  11. 11. How is this achievable? Moore's Law of Sequencing. <ul><ul><li>Cost of sequencing halves every 2 years.
  12. 12. Driven by multiple factors. </li></ul></ul>Economies of Scale: <ul><ul><li>Human Genome Project: 13 years, 23 labs,$500 Million.
  13. 13. Cost today: $10 Million, several months in a single large genome centre. </li></ul></ul>New sequencing technologies: <ul><ul><li>Illumina/solexa machines.
  14. 14. $100,000 for a human genome.
  15. 15. Single machine, 3 days. </li></ul></ul>
  16. 16. New sequencing technologies Capillary sequencing. <ul><ul><li>96 sequencing reactions carried out per run.
  17. 17. 0.5-1 hour run time. </li></ul></ul>Illumina sequencing. <ul><ul><li>52 Million reactions per run.
  18. 18. 3 day run time. </li></ul></ul>Machines are cheap (ish) and small. <ul><ul><li>We can buy lot of them. </li></ul></ul>
  19. 19. Data centre <ul><li>4x250 M 2 Data centres. </li><ul><li>2-4KW / M 2 cooling.
  20. 20. 3.4MW power draw. </li></ul><li>Overhead aircon, power and networking. </li><ul><li>Allows counter-current cooling.
  21. 21. More efficient. </li></ul><li>Technology Refresh. </li><ul><li>1 data centre is an empty shell. </li><ul><li>Rotate into the empty room every 4 years.
  22. 22. Refurb one of the in-use rooms with the current state of the art. </li></ul><li>“Fallow Field” principle. </li></ul></ul>rack rack rack rack
  23. 23. Highly Disruptive Sequencing centre runs 24x7 Peak capacity of capillary sequencing: <ul><ul><li>3.5 Gbases / month. </li></ul></ul>Current Illumina sequencing: <ul><ul><li>262 Gbases / month in April.
  24. 24. 1 T Base /month predicted for Sept. </li></ul></ul>Total sequence deposited in genbank for all time. <ul><ul><li>200 Gbases. </li></ul></ul>75x Increase in sequencing output.
  25. 25. Gigabase!=Gigabyte We store ~8 bytes of data per base. <ul><ul><li>Quality,error, experimental information.
  26. 26. ~10 Tbytes / month permanent archival storage. </li></ul></ul>Raw data from the machines is much larger. <ul><ul><li>June 2007 15 machine, 1 TB every 6 days.
  27. 27. Sept 2007, 30 machines, 1TB every 3 days.
  28. 28. Jan 2008, 30 machines, 2TB every 3 days. </li></ul></ul>Compute pipeline crunches 2 TB bytes of raw data into 30 Gbytes of sequence data. We need to capture ~120 TB data per week before we can analyse the data to produce the final sequence.
  29. 29. IT for new technology sequencing
  30. 30. Compute Infrastructure Problem I <ul><ul><li>How do we capture the data coming off the sequencing machines? </li></ul></ul>Problem 2 <ul><ul><li>How do we analyse the compute coming of the sequencing machines? </li></ul></ul>Problem 3 <ul><ul><li>How do we do this from scratch in 8 weeks? </li></ul></ul>
  31. 31. Problem 1: Build a Big file-system 3 x 100 TB file-systems to dump data to. <ul><ul><li>Multiple file-systems in order to protect against catastrophic hardware failures. </li></ul></ul>Hold data for 2 weeks only. <ul><ul><li>This should give us enough space to store ~2 weeks worth of raw data.
  32. 32. Once run has passed QC, raw data can be deleted. </li></ul></ul>Use Lustre (HP SFS, based on lustre 1.4). <ul><ul><li>Sustained write rate 1.6 Gbit /s (not huge).
  33. 33. Reads will have to be much faster so we can analyse data on our compute cluster faster than we capture it.
  34. 34. We used it already.
  35. 35. Low risk for us. </li></ul></ul>
  36. 36. Problem 2: Build a compute cluster Compute was the easy part. <ul><ul><li>Analysis pipeline embarrassingly parallel workload.
  37. 37. Scales well on commodity clusters. </li><ul><li>(After the bugs had been fixed). </li></ul></ul></ul>8 Chassis of HP BL460c Blades. <ul><ul><li>128 nodes / 640 cores.
  38. 38. We use blade systems already.
  39. 39. Excellent manageability.
  40. 40. Fit into our existing machine management structure.
  41. 41. Once physically installed we can deploy the cluster in a few hours. </li></ul></ul>
  42. 42. Add lots of networking Black diamond 8810 chassis. <ul><ul><li>360 GigE ports. </li></ul></ul>Trunked GigE links. <ul><ul><li>8x per blade chassis (16 machines) for the lustre network.
  43. 43. 8x links to the sequencing centre. </li></ul></ul>
  44. 44. Data pull ... LSF reconfig allows processing capacity to be interchanged. sequencer1 sequencer 30 Offline 2 o analysis Realtime 1 o analysis suckers Final Repository 100TB / yr scratch area 25 TB Lustre SFS20 staging area 320 TB Lustre EVA
  45. 45. Problem 3: How do we do it quickly? Plan for more than you actually need. <ul><ul><li>Make and estimation and add 1/3.
  46. 46. Still was not enough in our case. </li></ul></ul>Go with technologies you know. <ul><ul><li>Nothing works “out of the box” on this scale.
  47. 47. There will inevitably be problems with kit you do know. </li><ul><li>Firmware / hardware skews, delivery . </li></ul><li>Other technologies might have been better on paper (eg lustre 1.6, Big NAS box?), but might not have worked. </li></ul></ul>Good automated systems management infrastructure. <ul><ul><li>Machine and software configs all held in cfengine.
  48. 48. Easy to add new hardware and “make it the same as that”. </li></ul></ul>
  49. 49. Problems Lustre file-system is striped across a number of OSS server <ul><ul><li>A box with some disk attached. </li></ul></ul>Original plan was for 6 EVA arrays (50TB each) and 12 OSS servers. <ul><ul><li>Server failover pairs. </li></ul></ul>Limit in the SFS failover code means that 1 OSS can only serve 8 LUNs. <ul><ul><li>We were looking at 13 LUNs per server (26 in case of failover). </li></ul></ul>Required to increase the number of OSSs from 6 to 28. <ul><ul><li>Plus increased SAN / networking infrastructure. </li></ul></ul>
  50. 50. More Problems Golden Rule of Storage Systems <ul><ul><li>All disks go to 96% full and stay there. </li></ul></ul>Increase in data production rates reduced the time we could buffer data for. <ul><ul><li>Keep data for 2 weeks rather than 3. </li></ul></ul>We need to add another 100TB / 6 OSS. <ul><ul><li>Expansion is currently ongoing. </li></ul></ul>df -h Filesystem Size Used Avail Use% Mounted on XXX 97T 93T 4T 96% /lustre/sf1 XXX 116T 111T 5T 96% /lustre/sf2 XXX 97T 93T 4T 96% /lustre/sf3
  51. 51. Even More Problems Out of memory... <ul><ul><li>Changes in the analysis code + new sequencing machines means that we were filling up our compute farm.
  52. 52. Code requirement jumped from 1GB / core to 2GB core.
  53. 53. Under-commit machines with jobs to prevent memory exhaustion. </li><ul><li>Reduced overall capacity. </li></ul><li>Retro-fit underway to increase memory. </li></ul></ul>Out of machines... <ul><ul><li>Changes in downstream analysis means that we need twice as much CPU than we had.
  54. 54. Installed 560 cores of IBM HS21 Blades </li></ul></ul>
  55. 55. Can we do it better? Rapidly changing environment. <ul><ul><li>Sequencing machine are already here.
  56. 56. Have to do something, and quickly. </li></ul></ul>Agile software development. <ul><ul><li>Sequencing software team use agile development to cope with change in sequencing science and process.
  57. 57. Very fast ,incremental development (weekly release).
  58. 58. Get usable code into production very quickly, even if it is not feature complete. </li></ul></ul>Can we do “agile” systems? <ul><ul><li>Software is empheral, Hardware is not. </li><ul><li>You cannot magic 320TB of disk and 1000 cores out of thin air... </li></ul><li>Or can you? </li></ul></ul>
  59. 59. Possible Future directions Virtualisation. <ul><ul><li>Should help us if we have excess capacity in our data-centre. </li><ul><li>We are not talking single machines with a bit of local disk.
  60. 60. Cluster file-systems, non-trivial networking. </li></ul><li>Require over-provision of network and storage infrastructure.
  61. 61. Is this the price of agility? </li></ul></ul>Grid / cloud / elastic computing. <ul><ul><li>Can we use someone else's capacity instead?
  62. 62. Can we find a sucker / valued partner to take a wedge of our data?
  63. 63. Can we get data in and out of the grid quickly enough?
  64. 64. Do the machine inside the clould have fast data paths between them and the storage?
  65. 65. Supercomputer, not web services. </li></ul></ul>We are starting work to look at all of these.
  66. 66. There is no end in sight! We already have exponential growth in storage and compute. <ul><ul><li>Storage doubles every 12 months.
  67. 67. We cross the 2PB barrier last week. </li></ul></ul>Sequencing technologies are constantly evolving. Known unknowns: <ul><ul><li>Higher data output from our current machines.
  68. 68. More machines. </li></ul></ul>Unknown unknowns: <ul><ul><li>New big science projects are just a “good idea” away...
  69. 69. Gen 3 sequencing technologies </li></ul></ul>
  70. 70. Acknowledgements Sanger <ul><ul><li>Systems network, SAN and storage teams.
  71. 71. Sequencing pipeline development team. </li><ul><li>Code performance, testing. </li></ul></ul></ul>Cambridge Online <ul><ul><li>Dawn Johnson. </li><ul><li>Systems intergration. </li></ul></ul></ul>HP Galway <ul><ul><li>Eamonn O'Toole, Gavin Brebner. </li><ul><li>Lustre planning. </li></ul></ul></ul>

×