• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
2012: Trends from the Trenches
 

2012: Trends from the Trenches

on

  • 3,533 views

Talk slides as delivered at the 2012 Bio-IT World Conference in Boston, MA

Talk slides as delivered at the 2012 Bio-IT World Conference in Boston, MA

This is my annual "state of the state" address that has become somewhat popular.

Statistics

Views

Total Views
3,533
Views on SlideShare
2,777
Embed Views
756

Actions

Likes
5
Downloads
0
Comments
1

32 Embeds 756

http://bioteam.net 561
http://kevin-gattaca.blogspot.com 70
http://kevin-gattaca.blogspot.fr 31
http://kevin-gattaca.blogspot.in 13
http://kevin-gattaca.blogspot.co.uk 10
http://r.pi.gs 8
http://kevin-gattaca.blogspot.de 7
http://kevin-gattaca.blogspot.ca 5
http://feeds.feedburner.com 5
http://kevin-gattaca.blogspot.it 4
http://kevin-gattaca.blogspot.ch 4
http://kevin-gattaca.blogspot.com.au 4
http://www.linkedin.com 4
http://kevin-gattaca.blogspot.se 4
http://kevin-gattaca.blogspot.jp 3
http://kevin-gattaca.blogspot.com.es 3
http://kevin-gattaca.blogspot.co.nz 2
http://kevin-gattaca.blogspot.ie 2
http://kevin-gattaca.blogspot.fi 2
http://kevin-gattaca.blogspot.com.sixxs.org 2
http://kevin-gattaca.blogspot.sk 1
http://kevin-gattaca.blogspot.ru 1
http://kevin-gattaca.blogspot.no 1
http://feedly.com 1
http://digg.com 1
http://webcache.googleusercontent.com 1
http://kevin-gattaca.blogspot.nl 1
http://newsfeed.gregfoot.com 1
http://kevin-gattaca.blogspot.com.ar 1
http://kevin-gattaca.blogspot.com.br 1
http://feedproxy.google.com 1
https://twitter.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

2012: Trends from the Trenches 2012: Trends from the Trenches Presentation Transcript

  • Trends from the Trenches2012 Bio-IT World Expo, Boston MA 1
  • I’m Chris.I’m an infrastructure geek.I work for the BioTeam. 2
  • BioTeamWho, what & why ‣ Independent consulting shop ‣ Staffed by scientists forced to learn IT, SW & HPC to get our own research done ‣ 10+ years bridging the “gap” between science, IT & high performance computing ‣ PS: We are hiring. 3
  • BioTeamWhy we get invited to these sorts of talks ... ‣ Lots of people hire us across wide range of project types • Pharma, Biotech, EDU, Nonprofit, .Gov, .Mil, etc. ‣ We get to see how groups of smart people approach similar problems ‣ We can speak honestly & objectively about what we see “in the real world” 4
  • Disclaimer. 5
  • Listen to me at your own riskSeriously. ‣ I’m not an expert, pundit, visionary or “thought leader” ‣ All career success entirely due to shamelessly copying what actual smart people do ‣ I’m biased, burnt-out & cynical ‣ Filter my words accordingly 6
  • Introduction 1Business & Marketplace 2Datacenter, Facility & Infrastructure 3Storage 4Cloud 5Hot for ’12 ... 6 7
  • Business LandscapeSo far 2012 feels a lot like 2011 ... 8
  • Business & Meta ObservationsMore of the same in ’12 ... ‣ ~4 staff full time on issues involving data handling, data management and multi-instrument Next-Gen sequencing/analysis ‣ ~2 staff full time on infrastructure, storage and facility related projects • Dwan: Big infrastructure & facility projects for Fortune 20 companies, research consortia & .GOV customers • Dag: 40% infrastructure, 20% storage, 20% cloud ‣ ~1 staff full time on Amazon Cloud projects 9
  • What that tells us ‣ Same problem(s) as last year ‣ Next-gen sequencing still causing a lot of pain when it comes to data handling, storage, organization & integration ‣ As sequencing continues to be commoditized, this will likely only get worse 10
  • Business & Meta Observations‣ Companies are still spending • On people, software, infrastructure, facility & cloud‣ Pharma may be contracting • ... but more and more startups are popping up and other companies are simply continuing sane & sensible growth‣ .GOV is of some concern • Stimulus funding winding down or already gone; Same with ‘BioDefense’ funding & project efforts • Grant funding organizations tightening belts 11
  • Introduction 1Business & Marketplace 2Datacenter, Facility & Infrastructure 3Storage 4Cloud 5Hot for ’12 ... 6 12
  • Facility Observations 13
  • Facility & InfrastructureLess frenetic this year ‣ No clients breaking ground on major new datacenters this year • Slight change from 2011 • A few electrical/cooling refresh projects in the works ‣ Multiple clients of all sizes securing additional colo • Often for power density reasons • Small shops & startups are going with colo+cloud • Large shops expanding into Tier-1 colos 14
  • Facility & InfrastructurePower problems are fading or less critical ... ‣ Last year we had serious power density problems ‣ Friction between facility & research staff ‣ Arguments over density vs. power envelope vs. rack space & physical footprint ‣ No such issues (so far) in 2012 15
  • Facility & InfrastructureHPC + Virtualization ‣ Still deploying HPC Linux Clusters w/ Scale-out NAS ‣ However, every HPC system since 2011 has also intentionally included a VM environment integrated into the HPC cluster 16
  • Facility & InfrastructureHPC + Virtualization ‣ HPC + Virtualization solves a lot of problems ‣ Deals with valid biz/scientific need for researchers to run/own/manage their own servers ‘near’ HPC stack ‣ Solves a ton of research IT support issues • Or at least leaves us a clear boundary line ‣ Lets us obtain useful “cloud” features without choking on endless BS shoveled at us by “private cloud” vendors • Example: Server Catalogs + Self-service Provisioning 17
  • Storage 18
  • Science-centric StorageCurrent State Assessment ‣ Storage still making me crazy in ’12 19
  • Science-centric StorageWhy I’m not worried ‣ Peta-capable storage is trivial to acquire in 2012 ‣ Scale-out NAS has won the battle ‣ It’s simply not as hard/risky as it used to be 20
  • On the other hand ... 21
  • OMG! The Sky Is Falling!Maybe a little panic is appropriate ... 22
  • The sky IS falling! OMG!! BIG SCARY GRAPH 2007 2008 2009 2010 2011 2012: 23
  • The sky IS falling!Uncomfortable truths‣ Cost of acquiring data (genomes) falling faster than rate at which industry is increasing drive capacity‣ Human researchers downstream of these datasets are also consuming more storage (and less predictably)‣ High-scale labs must react or potentially have catastrophic issues in 2012-2013 24
  • The sky IS falling!Current Practices Are Not Sustainable ‣ FACT: Chemistry changing faster than we can refresh our datacenters and research IT infrastructure ‣ FACT: Rate at which we can cheaply acquire interesting data exceeds rate at which storage companies can increase the capacity of their products ‣ FACT: We suck at managing, tagging, valuing & curating our data. Few scientists really understand true cost/complexity involved with keeping data safe, online & accessible ‣ FACT: In 2012 people still think “keep everything online, forever” is a viable demand to be making of IT staff ‣ FACT: Something is going to break. Soon. 25
  • CRAM it. 26
  • The sky IS falling!CRAM it in 2012 ... ‣ Minor improvements are useless; order-of-magnitude needed ‣ Some people are talking about radical new methods – compressing against reference sequences and only storing the diffs • With a variable compression “quality budget” to spend on lossless techniques in the areas you care about ‣ http://biote.am/5v - Ewan Birney on “Compressing DNA” ‣ http://biote.am/5w - The actual CRAM paper ‣ If CRAM takes off, storage landscape will change 27
  • Storage: What comes next? Next 18 months will be really fun... 28
  • What comes next.The same rules apply for 2012 and beyond ... ‣ Accept that science changes faster than IT infrastructure ‣ Be glad you are not Broad/Sanger/BGI/NCBI ‣ Flexibility, scalability and agility become the key requirements of research informatics platforms • Tiered storage is in your future ... ‣ Shared/concurrent access is still the overwhelming storage use case 29
  • What comes next.In the following year ... ‣ Many peta-scale capable systems deployed • Most will operate in the hundreds-of-TBs range ‣ Far more aggressive “data triage” ‣ Genome compression via CRAM ‣ Even more data will sit untouched & unloved ‣ Growing need for tiers, HSM & even tape 30
  • What comes next.In the following year ... ‣ Broad and others are paving the way with respect to metadata-aware & policy driven storage frameworks • And we’ll shamelessly copy a year or two later ‣ I’m still on my cloud storage kick • Economics are inescapable; Will be built into storage platforms, gateways & VMs • Amazon S3 is only a HTTP RESTful call away • Cloud will become “just another tier” 31
  • What comes next.Expect your storage to be smarter & more capable ... ‣ What do DDN, Panasas, Isilon, BlueArc, etc. have in common? • Under the hood they all run Unix or Unix-like OS’s on x86_64 architectures ‣ Some storage arrays can already run applications natively • More will follow • Likely a big trend for 2012 32
  • Storage: The road ahead My $.02 for 2012... 33
  • The Road Ahead Trends & Tips for 2012‣ Peta-capable platforms required‣ Scale-out NAS still the best fit‣ Customers will no longer build one big scale-out NAS tier‣ My ‘hack’ of using nearline spec storage as primary science tier is obsolete in ’12‣ Not everything is worth backing up‣ Expect disruptive stuff 34
  • The Road Ahead Trends & Tips for 2012‣ Monolithic tiers no longer cut it • Changing science & instrument output patterns are to blame • We can’t get away with biasing towards capacity over performance any more‣ pNFS should go mainstream in ’12 • { fantastic news }‣ Tiered storage IS in your future • Multiple vendors & types 35
  • The Road Ahead Trends & Tips for 2012‣ Your storage will be able to run apps • Dedupe, cloud gateways & replication • ‘CRAM’ or similar compression • Storage Resource Brokers (iRODS) & metadata servers • HDFS/Hadoop hooks? • Lab, Data management & LIMS applications Drobo Appliance running BioTeam MiniLIMS internally... 36
  • The Road Ahead Trends & Tips for 2012‣ Hadoop / MapReduce / BigData • Just like GRID and CLOUD back in the day you’ll need a gas mask to survive the smog of hype and vendor press releases. • You still need to think about it • ... and have a roadmap for doing it • Deep, deep ties to your storage • Your users want/need it • My $.02? Fantastic cloud use case 37
  • Disruptive Storage Example 38
  • Backblaze Pod For Biotech 39
  • Backblaze: 100Tb for $12,000http://bioteam.net/tag/backblaze/ 40
  • Storage Future Feels Like This ...Multiple Tiers, Multiple Vendors, Multiple Products 41
  • The ‘C’ word Does a Bio-IT talk exist if it does not mention “the cloud”? 42
  • Cloud Stuff ‣ Before I get nasty ... ‣ I am not an Amazon shill ‣ I am a jaded, cynical, zero-loyalty consumer of IT services and products that let me get #%$^ done ‣ Because I only get paid when my #%$^ works, I am picky about what tools I keep in my toolkit ‣ Amazon AWS is an infinitely cool tool 43
  • A message for thecloud pretenders…
  • No APIs?Not a cloud.
  • No self-service? Not a cloud.
  • Installing VMWare& excreting a press release? Not a cloud.
  • I have to email a human? Not a cloud.
  • ~50% failure rate when launching new servers? Stupid cloud.
  • Block storageand virtual servers only? (barely) a cloud;
  • Private Clouds My $.02 cents 51
  • Private Clouds in 2012: ‣ I’m no longer dismissing them as “utter crap” ‣ Usable & useful in certain situations ‣ Hype vs. Reality ratio still wacky ‣ Sensible only for certain shops • Have you seen what you have to do to your networks & gear? ‣ There are easier ways
  • Private Clouds: My Advice for ‘12 ‣ Remain cynical (test vendor claims) ‣ Due Diligence still essential ‣ I personally would not deploy/buy anything that does not explicitly provide Amazon API compatibility
  • Private Clouds: My Advice for ‘12 Most people are better off: 1. Adding VM platforms to existing HPC clusters & environments 2. Extending enterprise VM platforms to allow user self- service & server catalogs
  • Cloud Advice My $.02 cents 55
  • Cloud AdviceDon’t get left behind ‣ Research IT Organizations need a cloud strategy today ‣ Those that don’t will be bypassed by frustrated users ‣ IaaS cloud services are only a departmental credit card away ... and some senior scientists are too big to be fired for violating IT policy :) 56
  • Cloud AdviceDesign Patterns ‣ You actually need three tested cloud design patterns: ‣ (1) To handle ‘legacy’ scientific apps & workflows ‣ (2) The special stuff that is worth re-architecting ‣ (3) Hadoop & big data analytics 57
  • Cloud AdviceLegacy HPC on the Cloud ‣ MIT StarCluster • http://web.mit.edu/star/cluster/ ‣ This is your baseline ‣ Extend as needed 58
  • Cloud Advice“Cloudy” HPC ‣ Some of our research workflows are important enough to be rewritten for “the cloud” and the advantages that a truly elastic & API-driven infrastructure can deliver ‣ This is where you have the most freedom ‣ Many published best practices you can borrow ‣ Amazon Simple Workflow Service (SWS) look sweet ‣ Good commercial options: Cycle Computing, etc. 59
  • Cloud AdviceBig Data HPC ‣ It’s gonna be a MapReduce world, get used to it ‣ Little need to roll your own Hadoop in 2012 ‣ ISV & commercial ecosystem already healthy ‣ Multiple providers today; both onsite & cloud-based ‣ Often a slam-dunk cloud use case 60
  • Cloud Data Movement My $.02 cents 61
  • Cloud Data Movement‣ We’ve slung a ton of data in and out of the cloud‣ We used to be big fans of physical media movement‣ Remember these pictures?‣ ... 62
  • Physical data movement station 1 63
  • Physical data movement station 2 64
  • “Naked” Data Movement 65
  • “Naked” Data Archive 66
  • Cloud Data Movement‣ We’ve got a new story for 2012‣ And the next image shows why ... 67
  • March 2012 68
  • Cloud Data MovementWow! ‣ With a 1GbE internet connection ... ‣ and using Aspera software .... ‣ We sustained 700 MB/sec for more than 7 hours freighting genomes into Amazon Web Services ‣ This is fast enough for many use cases, including genome sequencing core facilities* ‣ Chris Dwan’s webinar on this topic: http://biote.am/7e 69
  • Cloud Data MovementWow! ‣ Results like this mean we now favor network-based data movement over physical media movement ‣ Large-scale physical data movement carries a high operational burden and consumes non-trivial staff time & resources 70
  • Cloud Data MovementThere are three ways to do network data movement ... ‣ Buy software from Aspera and be done with it ‣ Attend the annual SuperComputing conference & see which student group wins the bandwidth challenge contest; use their code ‣ Get GridFTP from the Globus folks • Trend: At every single “data movement” talk I’ve been to in 2011 it seemed that any speaker who was NOT using Aspera was a very happy user of GridFTP. #notCoincidence 71
  • Cloud Data MovementFinal thoughts ‣ GridFTP has a booth on the show floor; pay them a visit ‣ Michelle Munson from Aspera speaking today in Track 2 on “High-Speed Data Movement for Effective Global Collaboration in Genomic Research” 72
  • Hot topics for 2012 ... 73
  • Hot for ’12BioTeam side projects & research interests ‣ Like to wrap up with some topics we think are interesting ‣ Who knows? These might be trends for 2013! 74
  • Siri Voice Control of Instruments/Pipelines ‣ BioTeam revealed our work with BT and Accelrys yesterday morning @ BioIT ‣ We demonstrated Siri voice control of a Pipeline Pilot experiment running in the BT Compute Cloud ‣ http://biote.am/7h ‣ We expect to continue doing cool things with Siri in ’12 75
  • Smart Storage & Lab-local Appliances ‣ I firmly expect the “storage arrays running apps & VMs” trend to go mainstream ‣ This has beneficial implications for life science informatics ‣ We’ll be hitting this topic hard on systems ranging from Drobo to DataDirect ‣ Also working with the Intel Modular Server concept 76
  • Lab Local AppliancesIntel Modular Server ‣ Interesting hardware combination; storage + servers + native hypervisor ‣ VM Pool 1: MiniLIMs + other useful lab software ‣ VM Pool 2: Amazon Storage Gateway Appliance http://biote.am/7i ‣ Server Blade 3: BrightCluster HPC Stack 77
  • 78
  • 79
  • Cloud, Community & Orchestration‣ We love Opscode & Chef‣ We’ll be doing more with systems orchestration in ’12‣ And hopefully expanding our community collection of useful Chef coobooks for life science informatics‣ We also still love MIT StarCluster and will hopefully be contributing plugins and enhancements back to Justin 80
  • Phew. Think I’m done now. 81
  • Thanks!Slides online at: http://slideshare.net/chrisdag/ 82