2013: Trends from the Trenches


Published on

Slides from the 2013 "trends talk" as delivered annually at Bio-IT World Boston.

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

2013: Trends from the Trenches

  1. 1. Trends from the trenches.2013 Bio IT World - Boston 1
  2. 2. Some less aspirational title slides ... 2
  3. 3. Trends from the trenches.2013 Bio IT World Boston 3
  4. 4. Trends from the trenches.2013 Bio IT World Boston 4
  5. 5. I’m Chris.I’m an infrastructure geek.I work for the BioTeam. www.bioteam.net - Twitter: @chris_dag 5
  6. 6. BioTeamWho, What, Why ...‣ Independent consulting shop‣ Staffed by scientists forced to learn IT, SW & HPC to get our own research done‣ 10+ years bridging the “gap” between science, IT & high performance computing 6
  7. 7. If you have not heard me speak ...Apologies in advance ‣ “Infamous” for speaking very fast and carrying a huge slide deck • ~70 slides for 25 minutes about average for me • Let me mention what happened after my Pharma HPC best practices talk yesterday ... By the time you see this slide I’ll be on my ~4th espresso 7
  8. 8. Why I do this talk every year ... ‣ Bioteam works for everyone • Pharma, Biotech, EDU, Nonprofit, .Gov, etc. ‣ We get to see how groups of smart people approach similar problems ‣ We can speak honestly & objectively about what we see “in the real world” 8
  9. 9. Standard Dag DisclaimerListen to me at your own risk ‣ I’m not an expert, pundit, visionary or “thought leader” ‣ Any career success entirely due to shamelessly copying what actual smart people do ‣ I’m biased, burnt-out & cynical ‣ Filter my words accordingly 9
  10. 10. So why are you here?And before 9am! 10
  11. 11. It’s a risky time to be doing Bio-IT 11
  12. 12. Big Picture / Meta Issue‣ HUGE revolution in the rate at which lab platforms are being redesigned, improved & refreshed • Example: CCD sensor upgrade on that confocal microscopy rig just doubled storage requirements • Example: The 2D ultrasound imager is now a 3D imager • Example: Illumina HiSeq upgrade just doubled the rate at which you can acquire genomes. Massive downstream increase in storage, compute & data movement needs‣ For the above examples, do you think IT was informed in advance? 12
  13. 13. The Central Problem Is ...Science progressing way faster than IT can refresh/change ‣ Instrumentation & protocols are changing FAR FASTER than we can refresh our Research-IT & Scientific Computing infrastructure • Bench science is changing month-to-month ... • ... while our IT infrastructure only gets refreshed every 2-7 years ‣ We have to design systems TODAY that can support unknown research requirements & workflows over many years (gulp ...) 13
  14. 14. The Central Problem Is ...‣ The easy period is over‣ 5 years ago we could toss inexpensive storage and servers at the problem; even in a nearby closet or under a lab bench if necessary‣ That does not work any more; real solutions required 14
  15. 15. The new normal. 15
  16. 16. And a related problem ...‣ It has never been easier to acquire vast amounts of data cheaply and easily‣ Growth rate of data creation/ ingest exceeds rate at which the storage industry is improving disk capacity‣ Not just a storage lifecycle problem. This data *moves* and often needs to be shared among multiple entities and providers • ... ideally without punching holes in your firewall or consuming all available internet bandwidth 16
  17. 17. If you get it wrong ...‣ Lost opportunity‣ Missing capability‣ Frustrated & very vocal scientific staff‣ Problems in recruiting, retention, publication & product development 17
  18. 18. Enough groundwork. Lets Talk Trends* 18
  19. 19. Topic: DevOps & Org Charts 19
  20. 20. The social contract betweenscientist and IT is changing forever 20
  21. 21. You can blame “the cloud” for this 21
  22. 22. DevOps & Scriptable Everything‣ On (real) clouds, EVERYTHING has an API‣ If it’s got an API you can automate and orchestrate it‣ “scriptable datacenters” are now a very real thing 22
  23. 23. DevOps & Scriptable Everything‣ Incredible innovation in the past few years‣ Driven mainly by companies with massive internet ‘fleets’ to manage‣ ... but the benefits trickle down to us little people 23
  24. 24. DevOps will conquer the enterprise ‣ Over the past few years cloud automation/ orchestration methods have been trickling down into our local infrastructures ‣ This will have significant impact on careers, job descriptions and org charts 24
  25. 25. Scientist/SysAdmin/Programmer2013: Continue to blur the lines between all these roles ‣ Radical change in how IT www.opscode.com is provisioned, delivered, managed & supported • Technology Driver: Virtualization & Cloud • Ops Driver: Configuration Mgmt, Systems Orchestration & Infrastructure Automation ‣ SysAdmins & IT staff need to re-skill and retrain to stay relevant 25
  26. 26. Scientist/SysAdmin/Programmer2013: Continue to blur the lines between all these roles ‣ When everything has an API ... ‣ ... anything can be ‘orchestrated’ or ‘automated’ remotely ‣ And by the way ... ‣ The APIs (‘knobs & buttons’) are accessible to all, not just the bearded practitioners sitting in that room next to the datacenter 26
  27. 27. Scientist/SysAdmin/Programmer2013: Continue to blur the lines between all these roles ‣ IT jobs, roles and responsibilities are going to change significantly ‣ SysAdmins must learn to program in order to harness automation tools ‣ Programmers & Scientists can now self- provision and control sophisticated IT resources 27
  28. 28. Scientist/SysAdmin/Programmer2013: Continue to blur the lines between all these roles ‣ My take on the future ... • SysAdmins (Windows & Linux) who can’t code will have career issues • Far more control is going into the hands of the research end user • IT support roles will radically change -- no longer owners or gatekeepers ‣ IT will “own” policies, procedures, reference patterns, identity mgmt, security & best practices ‣ Research will control the “what”, “when” and “how big” 28
  29. 29. Topic: Facility Observations 29
  30. 30. Facility 1: Enterprise vs Shadow IT‣ Marked difference in the types of facilities we’ve been working in‣ Discovery Research systems are firmly embedded in the enterprise datacenter‣ ... moving away from “wild west” unchaperoned locations and mini- facilities 30
  31. 31. Facility 2: Colo Suites for R&D ‣ Marked increase in use of commercial colocation facilities for R&D systems • And they’ve noticed! - Markly Group (One Summer) has a booth - Sabey is on this afternoon’s NYGenome panel ‣ Potential reasons: • Expensive to build high-density hosting at small scale • Easier metro networking to link remote users/sites • Direct connect to cloud provider(s) • High-speed research nets only a cross-connect away 31
  32. 32. Facility 3: Some really old stuff ... ‣ Final facility observation ‣ Average age of infrastructure we work on seems to be increasing ‣ ... very few aggressive 2-year refresh cycles these days ‣ Potential reasons • Recession & consolidation still effecting or deferring major technology upgrades and changes • Cloud: local upgrades deferred pending strategic cloud decisions • Cloud: economic analysis showing stark truth that local setups need to be run efficiently and at high utilization in order to justify existence 32
  33. 33. Facility 3: Virtualization ‣ Every HPC environment we’ve worked on since 2011has included (or plans to include) a local virtualization environment • True for big systems: 2k cores / 2 petabyte disk • True for small systems: 96 core CompChem cluster ‣ Unlikely to change; too many advantages 33
  34. 34. Facility 3: Virtualization ‣ HPC + Virtualization solves a lot of problems • Deals with valid biz/scientific need for researchers to run/own/manage their own servers ‘near’ HPC stack ‣ Solves a ton of research IT support issues • Or at least leaves us a clear boundary line ‣ Lets us obtain useful “cloud” features without choking on endless BS shoveled at us by “private cloud” vendors • Example: Server Catalogs + Self-service Provisioning 34
  35. 35. Topic: Compute 35
  36. 36. Compute:‣ Still feels like a solved problem in 2013‣ Compute power is a commodity • Inexpensive relative to other costs • Far less vendor differentiation than storage • Easy to acquire; easy to deploy 36
  37. 37. Compute: Fat NodesFat nodes are wiping out small and midsized clusters ‣ This box has 64 CPU Cores • ... and up to 1TB of RAM ‣ Fantastic Genomics/ Chemistry system • A 256GB RAM version only costs $13,000* ‣ BioIT Homework: • Go visit the Sillicon Mechanics booth and find out the current cost of a box with 1TB RAM 37
  38. 38. Possibly the most significant ’13 compute trend 38
  39. 39. Compute: Local Disk is BackDefensive hedge against Big Data / HDFS ‣ We’ve started to see organizations move away from blade servers and 1U pizza box enclosures for HPC ‣ The “new normal” may be 4U enclosures with massive local disk spindles - not occupied, just available ‣ Why? Hadoop & Big Data ‣ This is a defensive hedge against future HDFS or similar requirements • Remember the ‘meta’ problem - science is changing far faster than we can refresh IT. This is a defensive future-proofing play. ‣ Hardcore Hadoop rigs sometimes operate at 1:1 ratio between core count and disk count 39
  40. 40. Topic: Network 40
  41. 41. Network:‣ 10 Gigabit Ethernet still the standard • ... although not as pervasive as I predicted in prior trend talks‣ Non-Cisco options attractive • BioIT homework: listen to the Arista talks and visit their booth.‣ SDN still more hype than reality in our market • May not see it until next round of large private cloud rollouts or new facility construction (if even) 41
  42. 42. Network:‣ Infiniband for message passing in decline • Still see it for comp chem, modeling & structure work; Started building such a system last week • Still see it for parallel and clustered storage • Decline seems to match decreasing popularity of MPI for latest generation of informatics and ‘omics tools‣ Hadoop / HDFS seems to favor throughput and bandwidth over latency 42
  43. 43. Topic: Storage 43
  44. 44. Storage‣ Still the biggest expense, biggest headache and scariest systems to design in modern life science informatics environments‣ Most of my slides for last year’s trends talk focused on storage & data lifecycle issues • Check http://slideshare.net/chrisdag/ if you want to see what I’ve said in the past • Dag accuracy check: It was great yesterday to see DataDirect talking about the KVM hypervisor running on their storage shelves! I’m convinced more and more apps will run directly on storage in the future‣ ... not doing that this year. The core problems and common approaches are largely unchanged and don’t need to be restated 44
  45. 45. It’s 2013, we know what questions to ask of our storage 45
  46. 46. NGS new data generation: 6-month windowData like this lets us make realistic capacity planning and purchase decisions 46
  47. 47. Storage: 2013‣ Advice: Stay on top of the “compute nodes with many disks” trends.‣ HDFS if suddenly required by your scientists can be painful to deploy in a standard scale-out NAS environment 47
  48. 48. Storage: 2013‣ Object Storage is getting interesting 48
  49. 49. Storage: 2013Object Storage + Commodity Disk Pods ‣ Object storage is far more approachable • ... used to see it in proprietary solutions for specific niche needs • potentially on it’s way to the mainstream now ‣ Why? • Benefits are compelling across a wide variety of interesting use cases • Amazon S3 showed what a globe-spanning general purpose object store could do; this is starting to convince developers & ISVs to modify their software to support it • www.swiftstack.com and others are making local object stores easy, inexpensive and approachable on commodity gear • Most of your Tier1 storage and server vendors have a fully supported object store stack they can sell to you (or simply enable in a product you already have deployed in-house) 49
  50. 50. Remember this disruptive technology example from last year? 50
  51. 51. 100 Terabytes for $12,000(more info: http://biote.am/8p ) 51
  52. 52. Storage: 2013‣ There are MANY reasons why you should not build that $12K backblaze pod • ... done wrong you will potentially inconvenience researchers, lose critical scientific information and (probably) lose your job‣ Inexpensive or open source object storage software makes the ultra-cheap storage pod concept viable 52
  53. 53. Storage: 2013‣ A single unit like this is risky and should only be used for well known and scoped use cases. Risks generally outweigh the disruptive price advantage‣ However ...‣ What if you had 3+ of these units running an object store stack with automatic triple location replication, recovery and self-healing? • Then things get interesting • This is one of the ‘lab’ projects I hope to work on in ’13 53
  54. 54. Storage: 2013‣ Caveat/Warning • The 2013 editions of “backblaze-like” enclosures mitigate many of the earlier availability, operational and reliability concerns • Still a aggressive play that carries risk in exchange for a disruptive price point‣ There is a middle ground • Lots of action in the ZFS space with safer & more mainstream enclosures • BioIT Homework: Visit the Silicon Mechanics booth and check out what they are doing with Nexenta’s Open Storage stuff. 54
  55. 55. Topic: Cloud 55
  56. 56. Can you do a Bio-IT talk without using the ‘C’ word? 56
  57. 57. Cloud: 2013‣ Our core advice remains the same‣ What’s changed 57
  58. 58. Cloud: 2013Core Advice ‣ Research Organizations need a cloud strategy today • Those that don’t will be bypassed by frustrated users ‣ IaaS cloud services are only a departmental credit card away ... and some senior scientists are too big to be fired for violating IT policy 58
  59. 59. Cloud AdviceDesign Patterns ‣ You actually need three tested cloud design patterns: ‣ (1) To handle ‘legacy’ scientific apps & workflows ‣ (2) The special stuff that is worth re-architecting ‣ (3) Hadoop & big data analytics 59
  60. 60. Cloud AdviceLegacy HPC on the Cloud ‣ MIT StarCluster • http://web.mit.edu/star/cluster/ ‣ This is your baseline ‣ Extend as needed 60
  61. 61. Cloud Advice“Cloudy” HPC ‣ Some of our research workflows are important enough to be rewritten for “the cloud” and the advantages that a truly elastic & API-driven infrastructure can deliver ‣ This is where you have the most freedom ‣ Many published best practices you can borrow ‣ Warning: Cloud vendor lock-in potential is strongest here 61
  62. 62. Hadoop & “Big Data”What you need to know ‣ “Hadoop” and “Big Data” are now general terms ‣ You need to drill down to find out what people actually mean ‣ We are still in the period where senior leadership may demand “Hadoop” or “BigData” capability without any actual business or scientific need 62
  63. 63. Hadoop & “Big Data”What you need to know ‣ In broad terms you can break “Big Data” down into two very basic use cases: 1. Compute: Hadoop can be used as a very powerful platform for the analysis of very large data sets. The google search term here is “map reduce” 2. Data Stores: Hadoop is driving the development of very sophisticated “no-SQL” “non-Relational” databases and data query engines. The google search terms include “nosql”, “couchdb”, “hive”, “pig” & “mongodb”, etc. ‣ Your job is to figure out which type applies for the groups requesting “Hadoop” or “BigData” capability 63
  64. 64. Cloud: 2013What has changed .. ‣ Lets revisit some of my bile from prior years ‣ “... private clouds: still utter crap” ‣ “... some AWS competitors are delusional pretenders” ‣ “... AWS has a multi-year lead on the competition” 64
  65. 65. Private Clouds in 2013: ‣ I’m no longer dismissing them as “utter crap” ‣ Usable & useful in certain situations ‣ BioTeam positive experiences with OpenStack ‣ Hype vs. Reality ratio still wacky ‣ Sensible only for certain shops • Have you seen what you have to do to your networks & gear? ‣ Still important to remain cynical and perform proper due dillegenge
  66. 66. Non-AWS IaaS in 2013‣ Three main drivers for BioTeam’s evolving IaaS practices and thinking for 2013:‣ (1) Real world success with OpenStack & BT‣ (2) Real world success with Google Compute‣ (3) Real world multi-cloud DevOps‣ Just to remain honest though: • AWS still has multi-year lead in product, service and features • .. and many novel capabilities • But some of the competition has some interesting benefits that AWS can’t match
  67. 67. BioTeam, BT & OpenStack‣ We’ve been working with BT for a while now on various projects‣ BT Cloud using OpenStack under the hood with some really nice architecture and operational features‣ BioTeam developed a Chef-based HPC clustering stack and other tools that are currently being used by BT customers • ... some of whom have spoken openly at this meeting
  68. 68. BioTeam & Google Compute Engine‣ We can’t even get into the preview program‣ But one of our customers did‣ ... and we’ve been able to do some successful and interesting stuff • Without changing operations or DevOps tools our client is capable of running both on AWS and Google Compute • For this client and a few other use cases we believe we can span both clouds or construct architectures that would enable fast and relatively friction-free transitions
  69. 69. Chef, AWS, OpenStack & GoogleWrapping up ... ‣ 2012 was the 1st year we did real work spanning multiple IaaS cloud platforms or at least replicating workloads on multiple platforms ‣ We’ve learned a lot - I think this may result in some interesting talks at next year’s Bio-IT meeting - By BioTeam and actual end-users ‣ What makes this all possible is the DevOps / Orchestration stuff mentioned at the beginning of this presentation.
  70. 70. end; Thanks!Slides: http://slideshare.net/chrisdag/ 70
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.