• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Bio-IT for Core Facility Managers

Bio-IT for Core Facility Managers



This is a massive slide deck I used as the starting point for a 1.5 hour talk at the 2012 www.nerlscd.org conference. Mixture of old and (some) new slides from my usual stuff.

This is a massive slide deck I used as the starting point for a 1.5 hour talk at the 2012 www.nerlscd.org conference. Mixture of old and (some) new slides from my usual stuff.



Total Views
Views on SlideShare
Embed Views



3 Embeds 5

https://twitter.com 3
https://si0.twimg.com 1
http://www.linkedin.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Bio-IT for Core Facility Managers Bio-IT for Core Facility Managers Presentation Transcript

    • Bio-IT For Core Facility Leaders Tips, Tricks & Trends 2012 NERLCSD Meeting - www.nerlscd.org 1Wednesday, October 31, 12
    • Intro 1 Meta-Issues (The Big Picture) 2 Infrastructure Tour 3 Compute & HPC 4 Storage 5 Cloud & Big Data 6 2Wednesday, October 31, 12
    • I’m Chris. I’m an infrastructure geek. I work for the BioTeam. @chris_dag 3Wednesday, October 31, 12
    • BioTeam Who, what & why ‣ Independent consulting shop ‣ Staffed by scientists forced to learn IT, SW & HPC to get our own research done ‣ 12+ years bridging the “gap” between science, IT & high performance computing ‣ www.bioteam.net 4Wednesday, October 31, 12
    • Listen to me at your own risk Seriously. ‣ Clever people find multiple solutions to common issues ‣ I’m fairly blunt, burnt-out and cynical in my advanced age ‣ Significant portion of my work has been done in demanding production Biotech & Pharma environments ‣ Filter my words accordingly 5Wednesday, October 31, 12
    • Intro 1 Meta-Issues (The Big Picture) 2 Infrastructure Tour 3 Compute & HPC 4 Storage 5 Cloud & Big Data 6 6Wednesday, October 31, 12
    • Meta-Issues Why you need to track this stuff ... 7Wednesday, October 31, 12
    • Big Picture Why this stuff matters ... ‣ HUGE revolution in the rate at which lab instruments are being redesigned, improved & refreshed • Example: CCD sensor upgrade on that confocal microscopy rig just doubled your storage requirements • Example: That 2D ultrasound imager is now a 3D imager • Example: Illumina HiSeq upgrade just doubled the rate at which you can acquire genomes. Massive downstream increase in storage, compute & data movement needs 8Wednesday, October 31, 12
    • The Central Problem Is ... ‣ Instrumentation & protocols are changing FAR FASTER than we can refresh our Research-IT & Scientific Computing infrastructure • The science is changing month-to-month ... • ... while our IT infrastructure only gets refreshed every 2-7 years ‣ We have to design systems TODAY that can support unknown research requirements & workflows over many years (gulp ...) 9Wednesday, October 31, 12
    • The Central Problem Is ... ‣ The easy period is over ‣ 5 years ago you could toss inexpensive storage and servers at the problem; even in a nearby closet or under a lab bench if necessary ‣ That does not work any more; IT needs are too extreme ‣ 1000 CPU Linux clusters and petascale storage is the new normal; try fitting THAT in a closet! 10Wednesday, October 31, 12
    • The Take Home Lesson What core facility leadership needs to understand ‣ The incredible rate of cost decreases & capability gains seen in the lab instrumentation space is not mirrored everywhere ‣ As gear gets cheaper/faster, scientists will simply do more work and ask more questions. Nobody simply banks the financial savings when an instrument gets 50% cheaper -- they just buy two of them! ‣ IT technology is not improving at the same rate; we also can’t change our IT infrastructures all that rapidly 11Wednesday, October 31, 12
    • If you get it wrong ... ‣ Lost opportunity ‣ Frustrated & very vocal researchers ‣ Problems in recruiting ‣ Publication problems 12Wednesday, October 31, 12
    • Intro 1 Meta-Issues (The Big Picture) 2 Infrastructure Tour 3 Compute & HPC 4 Storage 5 Cloud & Big Data 6 13Wednesday, October 31, 12
    • Infrastructure Tour What does this stuff look like? 14Wednesday, October 31, 12
    • Self-contained single-instrument infrastructure 15Wednesday, October 31, 12
    • Ilumina GA 16Wednesday, October 31, 12
    • Instrument Control Workstation 17Wednesday, October 31, 12
    • SOLiD Sequencer ... 18Wednesday, October 31, 12
    • sits on top of a 24U server rack... 19Wednesday, October 31, 12
    • Another lab-local HPC cluster + storage 20Wednesday, October 31, 12
    • More lab-local servers & storage 21Wednesday, October 31, 12
    • Small core w/ multiple instrument support 22Wednesday, October 31, 12
    • Small cluster; large storage 23Wednesday, October 31, 12
    • Mid-sized core facility 24Wednesday, October 31, 12
    • Large Core Facility 25Wednesday, October 31, 12
    • Large Core Facility 26Wednesday, October 31, 12
    • Large Core Facility 27Wednesday, October 31, 12
    • Colocation Cages 28Wednesday, October 31, 12
    • Inside a colo cage 29Wednesday, October 31, 12
    • Linux Cluster + In-row chillers (front) 30Wednesday, October 31, 12
    • Linux Cluster + In-row chillers (rear) 31Wednesday, October 31, 12
    • 1U “Pizza Box” Style Server Chassis 32Wednesday, October 31, 12
    • Pile of “pizza boxes” 33Wednesday, October 31, 12
    • 4U Rackmount Servers 34Wednesday, October 31, 12
    • “Blade” Servers & Enclosure 35Wednesday, October 31, 12
    • Hybrid Modular Server 36Wednesday, October 31, 12
    • Integrated: Blades + Hypervisor + Storage 37Wednesday, October 31, 12
    • Petabyte-scale Storage 38Wednesday, October 31, 12
    • Real world screenshot from earlier this month 16 monster compute nodes + 22 GPU nodes Cost? 30 bucks an hour via AWS Spot Market Yep. This counts. 39Wednesday, October 31, 12
    • Physical data movement station 40Wednesday, October 31, 12
    • Physical data movement station 41Wednesday, October 31, 12
    • “Naked” Data Movement 42Wednesday, October 31, 12
    • “Naked” Data Archive 43Wednesday, October 31, 12
    • The cliche image 44Wednesday, October 31, 12
    • Backblaze Pod: 100 terabytes for $12,000 45Wednesday, October 31, 12
    • Intro 1 Meta-Issues (The Big Picture) 2 Infrastructure Tour 3 Compute & HPC 4 Storage 5 Cloud & Big Data 6 46Wednesday, October 31, 12
    • Compute Actually the easy bit ... 47Wednesday, October 31, 12
    • Compute Power Not a big deal in 2012 ... ‣ Compute power is largely a solved problem ‣ It’s just a commodity ‣ Cheap, simple & very easy to acquire ‣ Lets talk about what you need to know ... 48Wednesday, October 31, 12
    • Compute Trends Thinks you should be tracking ... ‣ Facility Issues ‣ “Fat Nodes” replacing Linux Clusters ‣ Increasing presence of serious “lab-local” IT 49Wednesday, October 31, 12
    • Facility Stuff ‣ Compute & storage requirements are getting larger and larger ‣ We are packing more “stuff” into smaller spaces ‣ This increases (radically) electrical and cooling requirements 50Wednesday, October 31, 12
    • Facility Stuff - Core issue ‣ Facility & power issues can take many months or years to address ‣ Sometimes it may be impossible to address (new building required ...) ‣ If research IT footprint is growing fast; you must be well versed in your facility planning/upgrade process 51Wednesday, October 31, 12
    • Facility Stuff - One more thing ‣ Sometimes central IT will begin facility upgrade efforts without consulting with research users • This was the reason behind one of our more ‘interesting’ projects in 2012 ‣ ... a client was weeks away from signing off on a $MM datacenter which would not have had enough electricity to support current research & faculty recruiting commitments 52Wednesday, October 31, 12
    • “Fat” Nodes Replacing Clusters 53Wednesday, October 31, 12
    • Fat Nodes - 1 box replacing a cluster ‣ This server has 64 CPU Cores ‣ .. and up to 1TB of RAM ‣ Fantastic Genomics/Chemistry system • A 256GB RAM version only costs $13,000 ‣ These single systems are replacing small clusters in some environments 54Wednesday, October 31, 12
    • Fat Nodes - Clever Scale-out Packaging ‣ This 2U chassis contains 4 individual servers ‣ Systems like this get near “blade” density without the price premium seen with proprietary blade packaging ‣ These “shrink” clusters in a major way or replace small ones 55Wednesday, October 31, 12
    • The other trend ... 56Wednesday, October 31, 12
    • “Serious” IT now in your wet lab ... ‣ Instruments used to ship with a Windows PC “instrument control workstation” ‣ As instruments get more powerful the “companion” hardware is starting to scale-up ‣ End result: very significant stuff that used to live in your datacenter is now being rolled into lab enviroments 57Wednesday, October 31, 12
    • “Serious” IT now in your wet lab ... ‣ You may be surpised what you find in your labs in ’12 ‣ ... can be problematic for a few reasons ... 1. IT support & backup 2. Power & cooling 3. Noise 4. Security 58Wednesday, October 31, 12
    • Networking Also not particularly worrisome ... 59Wednesday, October 31, 12
    • Networking ‣ Networking is also not super complicated ‣ It’s also fairly cheap & commoditized in ’12 ‣ There are three core uses for networks: 1. Communication between servers & services 2. Message passing within a single application 3. Sharing files and data between many clients 60Wednesday, October 31, 12
    • Networking 1 - Servers & Services ‣ Ethernet. Period. Enough said. ‣ Your only decision is between 10-Gig and 1-Gig ethernet ‣ 1-Gig Ethernet is pervasive and dirt cheap ‣ 10-Gig Ethernet is getting cheaper and on it’s way to becoming pervasive 61Wednesday, October 31, 12
    • Networking 1 - Ethernet ‣ Everything speaks ethernet ‣ 1-Gig is still the common interconnect for most things ‣ 10-Gig is the standard now for the “core” ‣ 10-Gig is the standard for top-of-rack and “aggregation” ‣ 10-Gig connections to “special” servers is the norm 62Wednesday, October 31, 12
    • Networking 2 - Message Passing ‣ Parallel applications can span many servers at once ‣ Communicate/coordinate via “message passing” ‣ Ethernet is fine for this but has a somewhat high latency between message packets ‣ Many apps can tolerate Ethernet-level latency; some applications clearly benefit from a message passing network with lower latency ‣ There used to be many competing alternatives ‣ Clear 2012 winner is “Infiniband” 63Wednesday, October 31, 12
    • Networking 2 - Message Passing ‣ The only things you need to know ... ‣ Infiniband is an expensive networking alternative that offers much lower latency than Ethernet ‣ You would only pay for and deploy an IB fabric if you had an application or use case that requires it. ‣ No big deal. It’s just “another” network. 64Wednesday, October 31, 12
    • Networking 3 - File Sharing ‣ For ‘Omics this is the primary focus area ‣ Overwhelming need for shared read/write access to files and data between instruments, HPC environment and researcher desktops ‣ In HPC environments you will often have a separate network just for file sharing traffic 65Wednesday, October 31, 12
    • Networking 3 - File Sharing ‣ Generic file sharing uses familiar NFS or Windows fileshare protocols. No big deal ‣ Always implemented over Ethernet although often a mixture of 10-Gig and 1-Gig connections • 10-Gig connections to the file servers, storage and edge switches; 1-gig connections to cluster nodes and user desktops ‣ Infiniband also has a presence here • Many “parallel” or “cluster” filesystems may talk to the clients via NFS-over-ethernet but internally the distributed components may use a private Infiband network for metadata and coordination. 66Wednesday, October 31, 12
    • Storage. (the hard bit ...) 67Wednesday, October 31, 12
    • Storage Setting the stage ... ‣ Life science is generating torrents of data ‣ Size and volume often dwarf all other research areas - particularly with Bioinformatics & Genomics work ‣ Big/Fast storage is not cheap and is not commodity ‣ There are many vendors and many ways to spectacularly waste tons of money ‣ And we still have an overwhelming need for storage that can be shared concurrently between many different users, systems and clients 68Wednesday, October 31, 12
    • Life Science “Data Deluge” ‣ Scare stories and shocking graphs getting tiresome ‣ We’ve been dealing with terabyte-scale lab instruments & data movement issues since 2004 • And somehow we’ve managed to survive ... ‣ Next few slides • Try to explain why storage does not stress me out all that much in 2012 ... 69Wednesday, October 31, 12
    • The sky is not falling. 1. You are not the Broad Institute or Sanger Center ‣ Overwhelming majority of us do not operate at Broad/ Sanger levels • These folks add 200+ TB a week in primary storage ‣ We still face challenges but the scale/scope is well within the bounds of what traditional IT technologies can handle ‣ We’ve been doing this for years • Many vendors, best practices, “war stories”, proven methods and just plain “people to talk to…” 70Wednesday, October 31, 12
    • The sky is not falling. 2. Instrument Sanity Beckons ‣ Yesteryear: Terascale .TIFF Tsunami ‣ Yesterday: RTA, in-instrument data reduction ‣ Today: Basecalls, BAMs & Outsourcing ‣ Tomorrow: Write directly to the cloud 71Wednesday, October 31, 12
    • The sky is not falling. 3. Peta-scale storage is not really exotic or unusual any more. ‣ Peta-scale storage has not been a risky exotic technology gamble for years now • A few years ago you’d be betting your career ‣ Today it’s just an engineering & budget exercise • Multiple vendors don’t find petascale requirements particularly troublesome and can deliver proven systems within weeks • $1M (or less in ’12) will get you 1PB from several top vendors ‣ However, still HARD to do BIG, FAST & SAFE • Hard but solvable; many resources & solutions out there 72Wednesday, October 31, 12
    • On the other hand ... 73Wednesday, October 31, 12
    • OMG! The Sky Is Falling! Maybe a little panic is appropriate ... 74Wednesday, October 31, 12
    • The sky IS falling! 1. Those @!*#&^@ Scientists ... ‣ As instrument output declines … ‣ Downstream storage consumption by end-user researchers is increasing rapidly ‣ Each new genome generates new data mashups, experiments, data interchange conversions, etc. ‣ MUCH harder to do capacity planning against human beings vs. instruments 75Wednesday, October 31, 12
    • The sky IS falling! 2. @!*#&^@ Scientific Leadership ... ‣ Sequencing is already a commodity ‣ NOBODY simply banks the savings ‣ EVERYBODY buys or does more 76Wednesday, October 31, 12
    • The sky IS falling! Gigabases vs. Moores Law OMG!! BIG SCARY GRAPH 2007 2008 2009 2010 2011 2012: 77Wednesday, October 31, 12
    • The sky IS falling! 3. Uncomfortable truths ‣ Cost of acquiring data (genomes) falling faster than rate at which industry is increasing drive capacity ‣ Human researchers downstream of these datasets are also consuming more storage (and less predictably) ‣ High-scale labs must react or potentially have catastrophic issues in 2012-2013 78Wednesday, October 31, 12
    • The sky IS falling! 5. Something will have to break ... ‣ This is not sustainable • Downstream consumption exceeding instrument data reduction • Commoditization yielding more platforms • Chemistry moving faster than IT infrastructure • What the heck are we doing with all this sequence? 79Wednesday, October 31, 12
    • CRAM it. 80Wednesday, October 31, 12
    • The sky IS falling! CRAM it in 2012 ... ‣ Minor improvements are useless; order-of-magnitude needed ‣ Some people are talking about radical new methods – compressing against reference sequences and only storing the diffs • With a variable compression “quality budget” to spend on lossless techniques in the areas you care about ‣ http://biote.am/5v - Ewan Birney on “Compressing DNA” ‣ http://biote.am/5w - The actual CRAM paper ‣ If CRAM takes off, storage landscape will change 81Wednesday, October 31, 12
    • What comes next? Next 18 months will be really fun... 82Wednesday, October 31, 12
    • What comes next. The same rules apply for 2012 and beyond ... ‣ Accept that science changes faster than IT infrastructure ‣ Be glad you are not Broad/Sanger ‣ Flexibility, scalability and agility become the key requirements of research informatics platforms • Tiered storage is in your future ... ‣ Shared/concurrent access is still the overwhelming storage use case • We’ll still continue to use clustered, parallel and scale-out NAS solutions 83Wednesday, October 31, 12
    • What comes next. In the following year ... ‣ Many peta-scale capable systems deployed • Most will operate in the hundreds-of-TBs range ‣ Far more aggressive “data triage” • “.BAM only!” ‣ Genome compression via CRAM ‣ Even more data will sit untouched & unloved ‣ Growing need for tiers, HSM & even tape 84Wednesday, October 31, 12
    • What comes next. In the following year ... ‣ Broad, Sanger and others will pave the way with respect to metadata-aware & policy driven storage frameworks • And we’ll shamelessly copy a year or two later ‣ I’m still on my cloud storage kick • Economics are inescapable; Will be built into storage platforms, gateways & VMs • Amazon S3 is only a HTTP RESTful call away • Cloud will become “just another tier” 85Wednesday, October 31, 12
    • What comes next. Expect your storage to be smarter & more capable ... ‣ What do DDN, Panasas, Isilon, BlueArc, etc. have in common? • Under the hood they all run Unix or Unix-like OS’s on x86_64 architectures ‣ Some storage arrays can already run applications natively • More will follow • Likely a big trend for 2012 86Wednesday, October 31, 12
    • But what about today? 87Wednesday, October 31, 12
    • Still trying to avoid this. (100TB scientific data, no RAID, unsecured on lab benchtops ) 88Wednesday, October 31, 12
    • Flops, Failures & Freakouts Common storage mistakes ... 89Wednesday, October 31, 12
    • Flops, Failures & Freakouts #1 - Unchecked Enterprise Storage Architects ‣ Scientist: “My work is priceless, I must be able to access it at all times” ‣ Corporate/Enterprise Storage Guru: “Hmmm …you want high availability, huh?” ‣ System delivered: • 40TB Enterprise SAN • Asynchronous replication to remote site • Can’t scale, can’t do NFS easily • ~$500K per year in operational & maintenance costs 90Wednesday, October 31, 12
    • Flops, Failures & Freakouts #2 - Unchecked User Requirements ‣ Scientist: “I do bioinformatics, I am rate limited by the speed of file IO operations. Faster disk means faster science. “ ‣ System delivered: • Budget blown on top tier fastest-possible ‘Cadillac’ system ‣ Outcome: • System fills to capacity in 9 months; zero budget left. 91Wednesday, October 31, 12
    • Flops, Failures & Freakouts #3 - D.I.Y Cluster & Parallel Filesystems ‣ Common source of storage unhappiness ‣ Root cause: • Not enough pre-sales time spent on design and engineering • Choosing Open Source over Common Sense ‣ System as built: • Not enough metadata controllers • Issues with interconnect fabric • Poor selection & configuration of key components ‣ End result: • Poor performance or availability • High administrative/operational burden 92Wednesday, October 31, 12
    • Hard Lessons Learned What these tales tell us ... 93Wednesday, October 31, 12
    • Flops, Failures & Freakouts Hard Lessons Learned ‣ End-users are not precise with storage terms • “Extremely reliable” means no data loss; Not millions spent on 99.99999% high availability ‣ When true costs are explained: • Many research users will trade a small amount of uptime or availability for more capacity or capabilities • … will also often trade some level of performance in exchange for a huge win in capacity or capability 94Wednesday, October 31, 12
    • Flops, Failures & Freakouts Hard Lessons Learned ‣ End-users demand the world but are willing to compromise • Necessary for IT staff to really talk to them and understand work, needs and priorities • Also essential to explain true costs involved ‣ People demanding the “fastest” storage often don’t have actual metrics to back their assertions 95Wednesday, October 31, 12
    • Flops, Failures & Freakouts Hard Lessons Learned ‣ Software-based parallel or clustered file systems are non-trivial to correctly implement • Essential to involve experts in the initial design phase • Even if using ‘open source’ version … ‣ Commercial support is essential • And I say this as an open source zealot … 96Wednesday, October 31, 12
    • The road ahead My $.02 for 2012... 97Wednesday, October 31, 12
    • The Road Ahead Storage Trends & Tips for 2012 ‣ Peta-capable platforms required ‣ Scale-out NAS still the best fit ‣ Customers will no longer build one big scale-out NAS tier ‣ My ‘hack’ of using nearline spec storage as primary science tier is probably obsolete in ’12 ‣ Not everything is worth backing up ‣ Expect disruptive stuff 98Wednesday, October 31, 12
    • The Road Ahead Trends & Tips for 2012 ‣ Monolithic tiers no longer cut it • Changing science & instrument output patterns are to blame • We can’t get away with biasing towards capacity over performance any more ‣ pNFS should go mainstream in ’12 • { fantastic news } ‣ Tiered storage IS in your future • Multiple vendors & types 99Wednesday, October 31, 12
    • The Road Ahead Trends & Tips for 2012 ‣ Your storage will be able to run apps • Dedupe, cloud gateways & replication • ‘CRAM’ or similar compression • Storage Resource Brokers (iRODS) & metadata servers • HDFS/Hadoop hooks? • Lab, Data management & LIMS applications Drobo Appliance running BioTeam MiniLIMS internally... 100Wednesday, October 31, 12
    • The Road Ahead Trends & Tips for 2012 ‣ Hadoop / MapReduce / BigData • Just like GRID and CLOUD back in the day you’ll need a gas mask to survive the smog of hype and vendor press releases. • You still need to think about it • ... and have a roadmap for doing it • Deep, deep ties to your storage • Your users want/need it • My $.02? Fantastic cloud use case 101Wednesday, October 31, 12
    • Disruptive Technology Example 102Wednesday, October 31, 12
    • Backblaze Pod For Biotech 103Wednesday, October 31, 12
    • Backblaze: 100Tb for $12,000 104Wednesday, October 31, 12
    • Intro 1 Meta-Issues (The Big Picture) 2 Infrastructure Tour 3 Compute & HPC 4 Storage 5 Cloud & Big Data 6 105Wednesday, October 31, 12
    • The ‘C’ word Does a Bio-IT talk exist if it does not mention “the cloud”? 106Wednesday, October 31, 12
    • Defining the “C-word” ‣ Just like “Grid Computing” the “cloud” word has been diluted to almost uselessness thanks to hype, vendor FUD and lunatic marketing minions ‣ Helpful to define terms before talking seriously ‣ There are three types of cloud ‣ “IAAS”, “SAAS” & “PAAS” 107Wednesday, October 31, 12
    • Cloud Stuff ‣ Before I get nasty ... ‣ I am not an Amazon shill ‣ I am a jaded, cynical, zero-loyalty consumer of IT services and products that let me get #%$^ done ‣ Because I only get paid when my #%$^ works, I am picky about what tools I keep in my toolkit ‣ Amazon AWS is an infinitely cool tool 108Wednesday, October 31, 12
    • Cloud Stuff - SAAS ‣ SAAS = “Software as a Service” ‣ Think: ‣ gmail.com 109Wednesday, October 31, 12
    • Cloud Stuff - SAAS ‣ PAAS = “Platform as a Service” ‣ Think: ‣ https://basespace.illumina.com/ ‣ salesforce.com ‣ MS office365.com, Apple iCloud, etc. 110Wednesday, October 31, 12
    • Cloud Stuff - IAAS ‣ IAAS = “Infrastructure as a Service” ‣ Think: ‣ Amazon Web Services ‣ Microsoft Azure 111Wednesday, October 31, 12
    • Cloud Stuff - IAAS ‣ When I talk “cloud” I mean IAAS ‣ And right now in 2012 Amazon IS the IAAS cloud ‣ ... everyone else is a pretender 112Wednesday, October 31, 12
    • Cloud Stuff - Why IAAS ‣ IAAS clouds are the focal point for life science informatics • Although some vendors are now offering PAAS and SAAS options ... ‣ The “infrastructure” clouds give us the “building blocks” we can assemble into useful stuff ‣ Right now Amazon has the best & most powerful collection of “building blocks” ‣ The competition is years behind ... 113Wednesday, October 31, 12
    • A message for the cloud pretenders…Wednesday, October 31, 12
    • No APIs? Not a cloud.Wednesday, October 31, 12
    • No self-service? Not a cloud.Wednesday, October 31, 12
    • Installing VMWare & excreting a press release? Not a cloud.Wednesday, October 31, 12
    • I have to email a human? Not a cloud.Wednesday, October 31, 12
    • ~50% failure rate when launching new servers? Stupid cloud.Wednesday, October 31, 12
    • Block storage and virtual servers only? (barely) a cloud;Wednesday, October 31, 12
    • Private Clouds My $.02 cents 121Wednesday, October 31, 12
    • Private Clouds in 2012: ‣ I’m no longer dismissing them as “utter crap” ‣ Usable & useful in certain situations ‣ Hype vs. Reality ratio still wacky ‣ Sensible only for certain shops • Have you seen what you have to do to your networks & gear? ‣ There are easier waysWednesday, October 31, 12
    • Private Clouds: My Advice for ‘12 ‣ Remain cynical (test vendor claims) ‣ Due Diligence still essential ‣ I personally would not deploy/buy anything that does not explicitly provide Amazon API compatibilityWednesday, October 31, 12
    • Private Clouds: My Advice for ‘12 Most people are better off: 1. Adding VM platforms to existing HPC clusters & environments 2. Extending enterprise VM platforms to allow user self- service & server catalogsWednesday, October 31, 12
    • Cloud Advice My $.02 cents 125Wednesday, October 31, 12
    • Cloud Advice Don’t get left behind ‣ Research IT Organizations need a cloud strategy today ‣ Those that don’t will be bypassed by frustrated users ‣ IaaS cloud services are only a departmental credit card away ... and some senior scientists are too big to be fired for violating IT policy :) 126Wednesday, October 31, 12
    • Cloud Advice Design Patterns ‣ You actually need three tested cloud design patterns: ‣ (1) To handle ‘legacy’ scientific apps & workflows ‣ (2) The special stuff that is worth re-architecting ‣ (3) Hadoop & big data analytics 127Wednesday, October 31, 12
    • Cloud Advice Legacy HPC on the Cloud ‣ MIT StarCluster • http://web.mit.edu/star/cluster/ ‣ This is your baseline ‣ Extend as needed 128Wednesday, October 31, 12
    • Cloud Advice “Cloudy” HPC ‣ Some of our research workflows are important enough to be rewritten for “the cloud” and the advantages that a truly elastic & API-driven infrastructure can deliver ‣ This is where you have the most freedom ‣ Many published best practices you can borrow ‣ Amazon Simple Workflow Service (SWS) look sweet ‣ Good commercial options: Cycle Computing, etc. 129Wednesday, October 31, 12
    • Hadoop & “Big Data” ‣ Hadoop and “big data” need to be on your radar ‣ Be careful though, you’ll need a gas mask to avoid the smog of marketing and vapid hype ‣ The utility is real and this does represent the “future path” for analysis of large data sets 130Wednesday, October 31, 12
    • Cloud Advice - Hadoop & Big Data Big Data HPC ‣ It’s gonna be a MapReduce world, get used to it ‣ Little need to roll your own Hadoop in 2012 ‣ ISV & commercial ecosystem already healthy ‣ Multiple providers today; both onsite & cloud-based ‣ Often a slam-dunk cloud use case 131Wednesday, October 31, 12
    • Hadoop & “Big Data” What you need to know ‣ “Hadoop” and “Big Data” are now general terms ‣ You need to drill down to find out what people actually mean ‣ We are still in the period where senior mgmt. may demand “hadoop” or “big data” capability without any actual business or scientific need 132Wednesday, October 31, 12
    • Hadoop & “Big Data” What you need to know ‣ In broad terms you can break “Big Data” down into two very basic use cases: 1. Compute: Hadoop can be used as a very powerful platform for the analysis of very large data sets. The google search term here is “map reduce” 2. Data Stores: Hadoop is driving the development of very sophisticated “no-SQL” “non-Relational” databases and data query engines. The google search terms include “nosql”, “couchdb”, “hive”, “pig” & “mongodb”, etc. ‣ Your job is to figure out which type applies for the groups requesting “hadoop” or “big data” capability 133Wednesday, October 31, 12
    • High Throughput Science Hadoop vs traditional Linux Clusters ‣ Hadoop is a very complex beast ‣ It’s also the way of the future so you can’t ignore it ‣ Very tight dependency on moving the ‘compute’ as close as possible to the ‘data’ ‣ Hadoop clusters are just different enough that they do not integrate cleanly with traditional Linux HPC system ‣ Often treated as separate silo or punted to the cloud 134Wednesday, October 31, 12
    • Hadoop & “Big Data” What you need to know ‣ Hadoop is being driven by a small group of academics writing and releasing open source life science hadoop applications; ‣ Your people will want to run these codes ‣ In some academic environments you may find people wanting to develop on this platform 135Wednesday, October 31, 12
    • Cloud Data Movement My $.02 cents 136Wednesday, October 31, 12
    • Cloud Data Movement ‣ We’ve slung a ton of data in and out of the cloud ‣ We used to be big fans of physical media movement ‣ Remember these pictures? ‣ ... 137Wednesday, October 31, 12
    • Physical data movement station 1 138Wednesday, October 31, 12
    • Physical data movement station 2 139Wednesday, October 31, 12
    • “Naked” Data Movement 140Wednesday, October 31, 12
    • “Naked” Data Archive 141Wednesday, October 31, 12
    • Cloud Data Movement ‣ We’ve got a new story for 2012 ‣ And the next image shows why ... 142Wednesday, October 31, 12
    • March 2012 143Wednesday, October 31, 12
    • Cloud Data Movement Wow! ‣ With a 1GbE internet connection ... ‣ and using Aspera software .... ‣ We sustained 700 MB/sec for more than 7 hours freighting genomes into Amazon Web Services ‣ This is fast enough for many use cases, including genome sequencing core facilities* ‣ Chris Dwan’s webinar on this topic: http://biote.am/7e 144Wednesday, October 31, 12
    • Cloud Data Movement Wow! ‣ Results like this mean we now favor network-based data movement over physical media movement ‣ Large-scale physical data movement carries a high operational burden and consumes non-trivial staff time & resources 145Wednesday, October 31, 12
    • Cloud Data Movement There are three ways to do network data movement ... ‣ Buy software from Aspera and be done with it ‣ Attend the annual SuperComputing conference & see which student group wins the bandwidth challenge contest; use their code ‣ Get GridFTP from the Globus folks • Trend: At every single “data movement” talk I’ve been to in 2011 it seemed that any speaker who was NOT using Aspera was a very happy user of GridFTP. #notCoincidence 146Wednesday, October 31, 12
    • Putting it all together 147Wednesday, October 31, 12
    • Wrapping up IT may just be a means to an end but you need to get your head wrapped around it ‣ (1) So you use/buy/request the correct ‘stuff’ ‣ (2) So you don’t get cheated by a vendor ‣ (3) Because you need to understand your tools ‣ (4) Because trends in automation and orchestration are blurring the line between scientist & sysadmin 148Wednesday, October 31, 12
    • Wrapping up - Compute & Servers ‣ Servers and compute power are pretty straightforward ‣ You just need to know roughly what your preferred compute building blocks look like ‣ ... and what special purpose resources you require (GPUs, Large Memory, High Core Count, etc.) ‣ Some of you may also have to deal with sizing, cost and facility (power, cooling, space) issues as well 149Wednesday, October 31, 12
    • Wrapping up - Networking ‣ Networking is also not hugely painful thing ‣ Ethernet rules the land; you might have to pick and choose between 1-Gig and 10-Gig Ethernet ‣ Understand that special networking technologies like Infiniband offer advantages but they are expensive and need to be applied carefully (if at all) ‣ Knowing if your MPI apps are latency sensitive will help ‣ And remember that networking is used for multiple things (server communication, application message passing & file and data sharing) 150Wednesday, October 31, 12
    • Wrapping up - Storage ‣ If you are going to focus on one IT area, this is it ‣ It’s incredibly important for genomics and also incredibly complicated. Many ways to waste money or buy the ‘wrong’ stuff ‣ You may only have one chance to get it correct and may have to live with your decision for years ‣ Budget is finite. You have to balance “speed” vs “size” vs “expansion capacity” vs “high availibility” and more ... ‣ “Petabyte-capable Scale-out NAS” is usually the best starting point. You deviate away from NAS when scientific or technical requirements demand “something else”. 151Wednesday, October 31, 12
    • Wrapping up - Hadoop / Big Data ‣ Probably the way of the future for big-data analytics. It’s worth spending time to study; especially if you intend to develop software in the future ‣ Popular target for current and emerging high-scale genomics tools. If you want to use those tools you need to deploy Hadoop ‣ It’s complicated and still changing rapidly. It can be difficult to integrate into existing setups ‣ Be cynical about hype & test vendor claims 152Wednesday, October 31, 12
    • Wrapping up - Cloud ‣ Cloud is the future. The economics are inescapable and the advantages are compelling. ‣ The main obstacle holding back genomics is terabyte scale data movement. The cloud is horrible if you have to move 2TB of data before you can run 2Hrs of compute! ‣ Your future core facility may involve a comp bio lab without a datacenter at all. Some organizations are already 100% virtual and 100% cloud-based 153Wednesday, October 31, 12
    • The NGS cloud clincher. 700 mb/sec sustained for ~7 hours West Coast to East Coast USA 154Wednesday, October 31, 12
    • Wrapping up - Cloud, continued ‣ Understand that for the foreseeable future there are THREE distinct cloud architectures and design patterns. ‣ Vendors who push “100% hadoop” or “legacy free” solutions are idiots and should be shoved out the door. We will be running legacy codes and workflows for many years to come ‣ Your three design patterns on the cloud: • Legacy HPC systems (replicate traditional clusters in the cloud) • Hadoop • Cloudy (when you rewrite something to fully leverage cloud capability) 155Wednesday, October 31, 12
    • Thanks! Slides online at: http://slideshare.net/chrisdag/ 156Wednesday, October 31, 12