Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

2014 BioIT World - Trends from the trenches - Annual presentation

5,400 views

Published on

Talk slides from the annual "trends from the trenches" address at BioITWorld Expo. 2014 Edition.

### Email chris@bioteam.net if you'd like a PDF copy of this deck ###

Published in: Technology

2014 BioIT World - Trends from the trenches - Annual presentation

  1. 1. 1 Trends from the trenches: 2014 slideshare.net/chrisdag/ chris@bioteam.net @chris_dag #BioIT14 Wednesday, April 30, 14
  2. 2. 2 I’m Chris. I’m an infrastructure geek. I work for the BioTeam. Wednesday, April 30, 14
  3. 3. Apologies in advance 3 If you have not heard me speak ... ‣ ‘Infamous’ for speaking very fast and carrying a huge slide deck ‣ In 2014 CHI finally gave up and just gave me a 60min talk slot ‣ Aiming to end with enough time for questions & discussions By the time you see this slide I’ll be on my ~4th espresso Wednesday, April 30, 14
  4. 4. Who, What, Why ... 4 BioTeam ‣ Independent consulting shop ‣ Staffed by scientists forced to learn IT, SW & HPC to get our own research done ‣ 12+ years bridging the “gap” between science, IT & high performance computing ‣ Our wide-ranging work is what gets us invited to speak at events like this ... Wednesday, April 30, 14
  5. 5. 5 Why I do this talk every year ... ‣ Bioteam works for everyone • Pharma, Biotech, EDU, Nonprofit, .Gov, etc. ‣ We get to see how groups of smart people approach similar problems ‣ We can speak honestly & objectively about what we see “in the real world” Wednesday, April 30, 14
  6. 6. Listen to me at your own risk 6 Standard Disclaimer ‣ I’m not an expert, pundit, visionary or “thought leader” ‣ There are ~2000 smart people at this event; I don’t presume to speak for us as a whole ‣ All career success entirely due to shamelessly copying what actual smart people do ‣ I’m biased, burnt-out & cynical ‣ Filter my words accordingly Wednesday, April 30, 14
  7. 7. 7 What’s new? What’s new? I’ve seen your slides before. <yawn> Wednesday, April 30, 14
  8. 8. aka ‘spreading the blame ...’ 8 What’s new 1: Acknowledgements ‣ This talk used to be made in a vacuum each year • ... often mere minutes before the scheduled talk time ‣ Not this year • Heavily influenced by peer group of smarter people who get chatty when given beer ‣ Non-comprehensive blame gang: • Ari Berman • Aaron Gardner • Adam Kraut • Chris Botka (Harvard) • Chris Dwan (Broad) • James Cuff (Harvard) • ... many more ... Wednesday, April 30, 14
  9. 9. What has not changed in recent talks Not new 2: Recycled Content ‣ The core Bio-IT ‘meta’ issue remains unchanged ‣ Minor updates to report for cloud landscape ‣ Compute landscape largely unchanged • ... a few updates to share in this space but nothing earth shattering 9 Wednesday, April 30, 14
  10. 10. 10 Why are we all here? Wednesday, April 30, 14
  11. 11. 11 The #1 ‘meta issue’ is unchanged in 2014 Wednesday, April 30, 14
  12. 12. 12 It’s a risky time to be doing Bio-IT Wednesday, April 30, 14
  13. 13. 13 Meta: Science evolving faster than IT can refresh infrastructure & practices Wednesday, April 30, 14
  14. 14. This is what keeps Bio-IT folks up at night The Central Problem Is ... ‣ Instrumentation & protocols are changing FAR FASTER than we can refresh our Research-IT & Scientific Computing infrastructure • Bench science is changing month-to-month ... • ... while our IT infrastructure only gets refreshed every 2-7 years ‣ Our job is to design systems TODAY that can support unknown research requirements & workflows over multi-year spans (gulp ...) 14 Wednesday, April 30, 14
  15. 15. The Central Problem Is ... ‣ The easy period is over ‣ 5 years ago we could toss inexpensive storage and servers at the problem; even in a nearby closet or under a lab bench if necessary ‣ That does not work any more; real solutions required 15 Wednesday, April 30, 14
  16. 16. 16 This is our “new normal” for informatics Wednesday, April 30, 14
  17. 17. 17 The Central Problem Is ... ‣ Lab technology is being refreshed, upgraded and replaced at an astonishing rate • Bigger, faster, parallel • Requiring increasingly sophisticated IT support • Cheap and easily obtainable Wednesday, April 30, 14
  18. 18. 18 The Central Problem Is ... ‣ ... and IT still being caught by surprise in 2014 • Procurement practices and cheaper instrument prices result in situations where IT is bypassed or not consulted in advance Wednesday, April 30, 14
  19. 19. True Story - 48 Hours Ago 19 Wednesday, April 30, 14
  20. 20. A conversation with a client Just 48 hours ago ... ‣ Scientists tell IT that they are getting a new PacBio sequencing platform • Gave IT a 5-node cluster quote that PacBio provided as blueprint for SMRT Portal • Wanted confirmation that everything was cool with IT support 20 Wednesday, April 30, 14
  21. 21. A conversation with a client Just 48 hours ago ... ‣ Partial “Minor” Issue List: • Scientists had no clue about power requirements. A pair of 60amp 220v power outlets = multi-month facility project • ... assumed IT would be cool accepting and supporting a one-off HPC system sized for 1 instrument & 1 workgroup • ... also appeared to believe that storage was infinite and free. At least that is what their budget assumed. 21 Wednesday, April 30, 14
  22. 22. One more thing ... 22 Wednesday, April 30, 14
  23. 23. We can’t blame the science/lab side for everything One more thing ... ‣ Can’t blame the lab-side for all our woes ‣ IT innovation is causing headaches in research and program management ‣ Grant funding agencies, regulatory rules and internal risk/program management practices not updated to reflect current and emerging IT capabilities, architectures & practices • Rules & policies often simply do not cover what we are capable of doing right now 23 Wednesday, April 30, 14
  24. 24. 24 A related problem ... Wednesday, April 30, 14
  25. 25. This also hurts ... ‣ It has never been easier to acquire vast amounts of data cheaply and easily ‣ Growth rate of data creation/ ingest exceeds rate at which the storage industry is improving disk capacity ‣ Not just a storage lifecycle problem. This data *moves* and often needs to be shared among multiple entities and providers • ... ideally without punching holes in your firewall or consuming all available internet bandwidth 25 Wednesday, April 30, 14
  26. 26. The future is not looking pretty for the ill prepared 26 Wednesday, April 30, 14
  27. 27. High Costs For Getting It Wrong ‣ Lost opportunity ‣ Missing capability ‣ Frustrated & very vocal scientific staff ‣ Problems in recruiting, retention, publication & product development 27 Wednesday, April 30, 14
  28. 28. 28 Enough groundwork. Lets Talk Trends Wednesday, April 30, 14
  29. 29. 29 Trends: DevOps & Org Charts Wednesday, April 30, 14
  30. 30. 30 The social contract between scientist and IT is changing forever Wednesday, April 30, 14
  31. 31. 31 You can blame “the cloud” for this Wednesday, April 30, 14
  32. 32. 32 DevOps & Scriptable Everything ‣ On (real) clouds, EVERYTHING has an API ‣ If it’s got an API you can automate and orchestrate it ‣ “scriptable datacenters” are now a very real thing Wednesday, April 30, 14
  33. 33. 33 DevOps & Scriptable Everything ‣ Incredible innovation in the past few years ‣ Driven mainly by companies with massive internet ‘fleets’ to manage ‣ ... but the benefits trickle down to us mere mortals Wednesday, April 30, 14
  34. 34. 34 DevOps will conquer the enterprise ‣ Over the past few years cloud automation/ orchestration methods have been trickling down into our local infrastructures ‣ This will have significant impact on careers, job descriptions and org charts Wednesday, April 30, 14
  35. 35. 2014: Continue to blur the lines between all these roles 35 Scientist/SysAdmin/Programmer ‣ Radical change in how IT is provisioned, delivered, managed & supported • Technology Driver: Virtualization & Cloud • Ops Driver: Configuration Mgmt, Systems Orchestration & Infrastructure Automation ‣ SysAdmins & IT staff need to re-skill and retrain to stay relevant www.opscode.com Wednesday, April 30, 14
  36. 36. 2014: Continue to blur the lines between all these roles 36 Scientist/SysAdmin/Programmer ‣ When everything has an API ... ‣ ... anything can be ‘orchestrated’ or ‘automated’ remotely ‣ And by the way ... ‣ The APIs (‘knobs & buttons’) are accessible to all, not just the expert practitioners sitting in that room next to the datacenter Wednesday, April 30, 14
  37. 37. 2014: Continue to blur the lines between all these roles 37 Scientist/SysAdmin/Programmer ‣ IT jobs, roles and responsibilities are changing ‣ SysAdmins must learn to program in order to harness automation tools ‣ Programmers & Scientists can now self- provision and control sophisticated IT resources Wednesday, April 30, 14
  38. 38. 2014: Continue to blur the lines between all these roles 38 Scientist/SysAdmin/Programmer ‣ My take on the future ... • SysAdmins (Windows & Linux) who can’t code will have career issues • Far more control is going into the hands of the research end user • IT support roles will radically change -- no longer owners or gatekeepers ‣ IT will “own” policies, procedures, reference patterns, identity mgmt, security & best practices ‣ Research will control the “what”, “when” and “how big” Wednesday, April 30, 14
  39. 39. 2014 Summary Trend: DevOps & Automation ‣ Almost every HPC project (all sizes) BioTeam worked on in 2014 included • A bare-metal OS provisioning service (Cobbler, etc.) • A ‘next-gen’ configuration management service (Chef, Puppet, Saltstack, etc.) ‣ Gut feeling: This is going to be very useful for regulated environments • Not BS or empty hype: IT infrastructure and server/OS/service configuration encoded as text files • Easy to version control, audit, revert, rebuild, verify and fold into existing change management & documentation systems 39 Wednesday, April 30, 14
  40. 40. 40 Trends: Compute Wednesday, April 30, 14
  41. 41. Compute related design patterns largely static 41 Core Compute ‣ Linux compute clusters are still the baseline compute platform ‣ Even our lab instruments know how to submit jobs to common HPC cluster schedulers ‣ Compute is not hard. It’s a commodity that is easy to acquire & deploy in 2014 Wednesday, April 30, 14
  42. 42. Defensive hedge against Big Data / HDFS 42 Compute: Local Disk Matters ‣ This slide is from 2013; trend is continuing ‣ The “new normal” may be 4U enclosures with massive local disk spindles - not occupied, just available ‣ Why? Hadoop & Big Data ‣ This is a defensive hedge against future HDFS or similar requirements • Remember the ‘meta’ problem - science is changing far faster than we can refresh IT. This is a defensive future-proofing play. ‣ Hardcore Hadoop rigs sometimes operate at 1:1 ratio between core count and disk count Wednesday, April 30, 14
  43. 43. Faster networks are driving compute config changes 43 Compute: NICs and Disks ‣ One pain point for me in 2013-2014: • Network links to my nodes are getting faster • It’s embarrassing my disks are slower than the network feeding them • Need to be careful about selecting and configuring high speed NICs - Example: that dual-port 10Gig card may not actually be able to drive both ports if the card was engineered for an active:passive link failover scenario • Also need to re-visit local disk configurations Wednesday, April 30, 14
  44. 44. New and refreshed HPC systems running many node types 44 Compute: Huge trend in ‘diversity’ ‣ Accelerated trend since at least 2012 ... • HPC compute resources no longer homogenous; many types and flavors now deployed in single HPC stacks ‣ Newer clusters mix-and-match to match the known use cases: • GPU nodes for compute • GPU nodes for visualization • Large memory nodes (512GB +) • Very Large memory nodes (1TB +) • ‘Fat’ nodes with many CPU cores • ‘Thin’ nodes with super-fast CPUs • Analytic nodes with SSD, FusionIO, flash or large local disk for ‘big data’ tasks Wednesday, April 30, 14
  45. 45. GPUs, Coprocessors & FPGAs 45 Compute: Hardware Acceleration ‣ Specialized hardware acceleration has it’s place but will not take over the world • “... the activation energy required for a scientist to use this stuff is generally quite high ...” ‣ GPU, Phi and FPGA best used in large scale pipelines or as specific solution to a singular pain point Wednesday, April 30, 14
  46. 46. Compute: Big Data & Analytics ‣ BioTeam is starting to build “Big Data” labs and environments for clients ‣ The most interesting trend: • We are not designing for specific analytic use cases; in most projects are are adding in basic “capabilities” with the expectation that the apps and users will come later • ... defensive IT hedge against rapidly changing science requirements, remember? 46 Wednesday, April 30, 14
  47. 47. Compute: Big Data & Analytics ‣ This translates to infrastructure designed to support certain capabilities rather than specific software or application. ‣ Example: • Beefy HDFS friendly servers • 100% bare metal provisioning and dynamic system reconfiguration • Systems for ingest • Very large RAM systems • Big PCIx bus systems • Memory-resident database systems • Mix of very fast and capacity optimized storage • Very fast core, top-of-rack and server networking 47 Wednesday, April 30, 14
  48. 48. Also known as hybrid clouds Emerging Trend: Hybrid HPC ‣ No longer “utter crap” or “cynical vendor-supported reference case” • small local footprint • large, dynamic, scalable, orchestrated public cloud component ‣ DevOps is key to making this work ‣ High-speed network to public cloud required ‣ Software interface layer acting as the mediator between local and public resources ‣ Good for tight budgets, has to be done right to work ‣ Still best approached very carefully 48 Wednesday, April 30, 14
  49. 49. BioIT World Homework ‣ We’ve got interesting hardware vendors on the show floor this week; check them out • Silicon Mechanics, Thinkmate, Microway: cool commodity • Intel, IBM, Dell, SGI: Large & enterprise • Timelogic: hardware acceleration • ... 49 Wednesday, April 30, 14
  50. 50. 50 Trends: Network Wednesday, April 30, 14
  51. 51. 51 Big trouble ahead ... Wednesday, April 30, 14
  52. 52. 52 Network: Speed @ Core and Edge ‣ Huge potential pain point ‣ May surpass storage as our #1 infrastructure headache ‣ Petascale data is useless if you can’t move it or access it fast enough ‣ Don’t be smug about 10 Gigabit - folks need to start thinking *now* about 40 and even 100 Gigabit Ethernet ‣ You may need 10Gig to some desktops for data ingest/export Wednesday, April 30, 14
  53. 53. 53 Network: Speed @ Core and Edge ‣ Remember ~2004 when research storage requirements started to dwarf what the enterprise was using? ‣ Same thing is happening now for networking ‣ Research core, edge and top- of-rack networking speeds may exceed what the rest of the organization has standardized on Wednesday, April 30, 14
  54. 54. Massive data movement needs are driving innovation pain This is going to be painful ‣ Enterprise networking folks are even more aloof than storage admins we battled in ’04 ‣ Often used to driving requirements and methods; unhappy when science starts to drive them out of their comfort zones ‣ Research needs to start pushing harder and faster for network speeds above 10GbE • This will take a long time so start now! 54 Wednesday, April 30, 14
  55. 55. Not sure how this will play out ‣ It will be interesting to see what large-scale data movement does to our local infrastructure and desktop experience ‣ Especially with other trends like BYOD ‣ My $.02 • Speeds to our desktops are going get very fast, or • We give up on growing massive bandwidth to the client and embrace a full VDI model where the users just “remote desktop” into a well-networked scientific informatics environment 55 Wednesday, April 30, 14
  56. 56. BioIT World Homework ‣ Visit the Internet2 booth to chat high speed networking • Ask about their free or low-cost training events and technical workshops; start thinking about how you can get your internal networking teams/leadership to attend • Ask them about the new trend of private/corporate links into I2 and other fast research networks ‣ Arista is here. Talking and exhibiting. They are not Cisco. Listen, visit & talk to them. 56 Wednesday, April 30, 14
  57. 57. Significant new trend in networking Science DMZs 57 Wednesday, April 30, 14
  58. 58. It’s real and becoming necessary Network: ‘ScienceDMZ’ ‣ BioTeam building them in 2014 and beyond ‣ Central premise: • Legacy firewall, network and security methods architected for “many small data flows” use cases • Not built to handle smaller #s of massive data flows • Also very hard to deploy ‘traditional’ security gear on 10Gigabit and faster networks ‣ More details, background & documents at http://fasterdata.es.net/science-dmz/ 58 Background traffic or competing bursts DTN traffic with wire-speed bursts 10GE 10GE 10GE Wednesday, April 30, 14
  59. 59. Network: ‘ScienceDMZ’ ‣ Start thinking/discussing this sooner rather than later ‣ Just like “the cloud” this may fundamentally change internal operations and technology ‣ Will also require conscious buy-in and support from senior network, security and risk management professionals • ... these talks take time. Best to plan ahead 59 Wednesday, April 30, 14
  60. 60. Network: ‘ScienceDMZ’ ‣ A Science DMZ has 3 required components: 1. Very fast “low-friction” network links and paths with security policy and enforcement specific to scientific workflows 2. Dedicated, high performance data transfer nodes (“DTNs”) highly optimized for high speed data xfer 3. Dedicated network performance/measurement nodes 60 Wednesday, April 30, 14
  61. 61. Network: ‘ScienceDMZ’ ‣ Implementation specifics are complex; the basic concept is not: 1. Research need to move scientific data at high speeds is already being negatively affected by networks not designed for this requirement 2. Likely to force fundamental changes in core enterprise architectures on a similar disruptive scale as what genome data storage forced in ~2004 3. Firewalls/IDS and security in particular will be affected 61 Wednesday, April 30, 14
  62. 62. 62 Simple Science DMZ: Image source: “The Science DMZ: Introduction & Architecture” -- esnet Wednesday, April 30, 14
  63. 63. Network: ‘ScienceDMZ’ ‣ My gut feeling: 1. The fanciest and most complex Science DMZ architectures in the literature right now are not suitable for our world • Expensive specialized equipment; Expensive specialist staff expertise required • Often still experimental, not something enterprise IT would want to drop into a production environment 2. Science DMZ concepts are sound and simple implementations are possible today 3. Start small: • Incorporate these sorts of concepts/ideas into long term planning ASAP • Start adding network performance monitoring nodes to research networks, DMZs and external circuit connections now; this entire concept falls over without actionable flow and performance data • Start work on policies and procedures for manual bypass of firewall/IDS rules when known sender/receivers are freighting high speed data; automation comes later! 63 Wednesday, April 30, 14
  64. 64. BioIT World Homework ‣ Bookmark http://fasterdata.es.net and check out the published materials and advice ‣ Monitor http://www.oinworkshop.com/ to see when a workshop/event may be coming near you (send your networking people ...) ‣ Both ESNet and Internet2 run training and technical workshops that deliver far more value for price than the usual training junkets 64 Wednesday, April 30, 14
  65. 65. Check out this talk BioIT World Homework ‣ Track 1 - 3:10pm today: • Christian Todorov talks “Accelerating Biomedical Research Discovery: The 100G Internet2 Network – Built and Engineered for the Most Demanding Big Data Science Collaborations” 65 Wednesday, April 30, 14
  66. 66. Not very significant trend in 2014: Software Defined Networking (“SDN”) 66 Wednesday, April 30, 14
  67. 67. More hype than useful reality at the moment 67 Network: SDN Hype vs. Reality ‣ Software Defined Networking (“SDN”) is the new buzzword ‣ It WILL become pervasive and will change how we build and architect things ‣ But ... ‣ Not hugely practical at the moment for most environments • We need far more than APIs that control port forwarding behavior on switches • More time needed for all of the related technologies and methods to coalesce into something broadly useful and usable Wednesday, April 30, 14
  68. 68. More hype than useful reality at the moment 68 Network: SDN ‣ My gut feeling: • It is the future but right now we are still in the “mostly empty hype” phase if you wanna be cynical about it; best to wait and watch • Production enterprise use: OpenFlow and similar stuff does not provide value relative to implementation effort right now • Best bang for the buck in ’14 will be getting ‘SDN’ features as part of some other supported stack - OpenStack, VMWare, Cloud, etc. Wednesday, April 30, 14
  69. 69. 69 Trends: Storage Wednesday, April 30, 14
  70. 70. 70 Storage ‣ Still the biggest expense, biggest headache and scariest systems to design in modern life science informatics environments ‣ Many of the pain points we’ve talked about for years are still in place: • Explosive growth forcing tradeoffs in capacity over performance • Lots of monolithic single tiers of storage • Critical need to actively manage data through it’s full life cycle (just storing data is not enough ...) • Need for post-POSIX solutions such as iRODS and other metadata-aware data repositories Wednesday, April 30, 14
  71. 71. 71 Storage Trends ‣ The large but monolithic storage platforms we’ve built up over the years are no longer sufficient • Do you know how many people are running a single large scale-out NAS or parallel filesystem? Most of us! ‣ Tiered storage is now an absolute requirement • At a minimum we need an active storage tier plus something far cheaper/deeper for cold files ‣ Expect the tiers to involve multiple vendors, products and technologies • The Tier1 storage vendors tend to have higher-end pricing for their “all in one” tiered data management solutions Wednesday, April 30, 14
  72. 72. 72 Storage - The Old Way ‣ Single tier of scale-out NAS or parallel FS ‣ Why? • Suitable for broadest set of use cases • Easy to procure/integrate • Lowest administrative & operational burden ‣ Example: • 400TB - 1PB of ‘something’ stores ‘everything’ Wednesday, April 30, 14
  73. 73. 73 Storage - The New Way ‣ Multiple tiers; potentially from multiple vendors ‣ Why? • Way more cost efficient (size the tier to the need) • Single tier no longer capable of supporting all use cases and workflow patterns • Single tiers waste incredible money at large scale ‣ Example: • 10-40 TB SSD/Flash for ingest & IOPS-sensitive workloads • 50-400 TB tier (SATA/SAS/SSD mix) for active processing • Multi-petabyte tier (Cloud, Object, SATA) for cost and operationally efficient long term (yet reachable) storage of scientific data at rest Wednesday, April 30, 14
  74. 74. Sticking 100% with Tier 1 vendors gets expensive 74 Storage: Disruptive stuff ahead ‣ BioTeam has built 1Petabyte ZFS-based storage pools from commodity whitebox kit for about ~$100,000 in direct hardware costs (engineering effort & admin not included in this price ...) ‣ There are many storage vendors in the middle tier who can provide storage systems that are less ‘risky’ than DIY homebuilt setups yet far less expensive than the traditional Tier 1 enterprise storage options • Several of these vendors are here at the show! ‣ Companies like Avere Systems are producing boxes that unify disparate storage tiers and link them to cloud and object stores • This is a route to unifying “tier 1” storage with the “cheap & deep” storage Wednesday, April 30, 14
  75. 75. Infinidat aka http://izbox.com The new thumper. ‣ 1 petabyte usable NAS shipped as a single integrated rack • List price: $500 per usable terabyte ‣ More expensive than DIY ZFS on commodity chassis but less expensive than current mainstream products ‣ Lots of interesting use cases for ‘cheap & deep’ 75 Wednesday, April 30, 14
  76. 76. Avere Systems Wait, I can DO that? ‣ These folks caught my eye in late 2013 for one very specific use case ‣ Since then I keep them in mind for 4-5 common problems I regularly face ‣ It can: • Add performance layer on top of storage bought to be “cheap & deep” • Virtualize many NAS islands into a single namespace • Replicate & move data between tiers and sites • Act as CIFS/NFS gateway to on-premise or offsite object stores *** • Treat Amazon S3 and Glacier as simply another storage tier fully integrated into your environment 76 Wednesday, April 30, 14
  77. 77. Object Storage ‣ Object storage is the future for scientific data at rest • Total no brainer; makes more sense than the “files and folders” paradigm, especially for automated analysis • Plus Amazon does it for super cheap ‣ But ... There will be a long transition period due to all of our legacy codes and workflows • This is where gateway devices can play ‣ It can: • Provide a much better workflow design pattern than assuming “files and folders” data storage • Save millions of dollars via efficiencies of erasure coding • Provide a much more robust and resilient peta-scale storage framework • Hide behind a metadata-aware layer such as IRODS to provide very interesting capabilities 77 Wednesday, April 30, 14
  78. 78. Object Storage ‣ Erasure coding distributed object stores are very interesting at peta-scale ... ‣ Think about how you would handle & replicate 20 petabytes of data the “traditional way” • Purchase 2x or 3x storage capacity to handle replication overhead • Ignore the nightmare scenario of having to restore from one of the distributed replicas 78 Wednesday, April 30, 14
  79. 79. Object Storage ‣ Efficiencies of erasure coding allow for LESS raw disk to be distributed across MORE geographic sites ‣ End result is a “single” usable system that tolerant to the failure of an entire datacenter/site ‣ For the 20 petabyte problem instead of purchasing 2x disk you buy ~1.8x and use the capex savings to add an extra colo facility or increase WAN link speed 79 Wednesday, April 30, 14
  80. 80. Exercise BioIT World Homework ‣ Pick a storage size that make sense for you (100TB or 1PB suggested) ‣ Visit the various storage vendors on the show floor and price out what 100TB or 1PB would cost ‣ You will see an awesome diversity of products, performance, features and capabilities at various price points • DO NOT fixate on price alone. This is a mistake. ‣ This is REALLY worth doing - there is incredible diversity in the mix of price/features/performance/ capability out there 80 Wednesday, April 30, 14
  81. 81. Check out these booths BioIT World Homework ‣ Object storage: • Amplidata & CleverSafe ‣ Glue/Gateway/Acceleration: • Avere Systems ‣ Enterprise: • EMC Isilon, IBM, Dell, SGI, Hitachi, Panasas ‣ Mid-tier/Commodity: • Silicon Mechanics, Thinkmate, RAID Inc., Xyratex 81 Wednesday, April 30, 14
  82. 82. Check out these talks BioIT World Homework ‣ Track 5 - noon today: • Aaron Gardner talks “Taming big scientific data growth with converged infrastructure” ‣ Track 1 - 2:55pm today: • Jacob Farmer talks “Bridging the Worlds of Files, Objects, NAS, and Cloud: A Blazing Fast Crash Course in Object Storage” ‣ Track 1 - 4:30pm today: • Dirk Petersen talks “ Deploying Very Low Cost Cloud Storage Technology in a Traditional Research HPC Environment 82 Wednesday, April 30, 14
  83. 83. 83 Can you do a Bio-IT talk without using the ‘C’ word? Wednesday, April 30, 14
  84. 84. 84 Cloud: 2014 ‣ Core advice remains the same ‣ A few new permutations ... Wednesday, April 30, 14
  85. 85. Core Advice 85 Cloud: 2014 ‣ Research Organizations need a cloud strategy today yesterday • Those that don’t will be bypassed by frustrated users or sneaky “cloud aware” devices ‣ IaaS cloud services are only a departmental credit card away ... some senior scientists are too big to be fired for violating IT policy ‣ Instrument vendors are forcing the issue ‣ Storage vendors are forcing the issue Wednesday, April 30, 14
  86. 86. Design Patterns 86 Cloud Advice ‣ We actually need several tested cloud design patterns: ‣ (1) To handle ‘legacy’ scientific apps & workflows ‣ (2) The special stuff that is worth re-architecting ‣ (3) Hadoop & big data analytics ‣ ... and maybe (4) Regulated/sensitive efforts... ‣ ...and maybe (5) a way to evaluate Commercial solutions Wednesday, April 30, 14
  87. 87. Legacy HPC on the Cloud 87 Cloud Advice ‣ MIT StarCluster • http://star.mit.edu/cluster/ • This is your baseline • Extend as needed ‣ Also check out Univa • Commercially supported Grid Engine stack with compelling roadmap and native cloud capabilities Wednesday, April 30, 14
  88. 88. “Cloudy” HPC 88 Cloud Advice ‣ Some of our research workflows are important enough to be rewritten for “the cloud” and the advantages that a truly elastic & API-driven infrastructure can deliver ‣ This is where you have the most freedom ‣ Many published best practices you can borrow ‣ Warning: Cloud vendor lock-in potential is strongest here Wednesday, April 30, 14
  89. 89. What has changed .. Cloud: 2014 ‣ Lets revisit some of my bile from prior years ‣ “... private clouds: still utter crap” ‣ “... some AWS competitors are delusional pretenders” ‣ “... AWS has a multi-year lead on the competition” 89 Wednesday, April 30, 14
  90. 90. Private Clouds in 2014: ‣ I’m no longer dismissing them as “utter crap” • However it is a lot of work and money to build a system that only has 5% of the features that AWS can deliver today (for a cheaper price). Need to be careful about the use case, justification and operational/development burden. ‣ Usable & useful in certain situations ‣ BioTeam positive experiences with OpenStack ‣ Starting to see OpenStack pilots among our clients ‣ Hype vs. Reality ratio still wacky ‣ Sensible only for certain shops • Have you seen what you have to do to your networks & gear? ‣ Still important to remain cynical and perform proper due diligence Wednesday, April 30, 14
  91. 91. Not all AWS competitors are delusional ‣ Google Compute is viable in 2014 for scientific workflows • Compute/Memory: Late start into IaaS means CPUs and memory are current generation; we have ‘war stories’ from AWS users who probe /proc/cpuinfo on EC2 servers so they can instantly kill any instance running on older chipsets • Price: Competitive on price although the shooting war between IaaS providers means it is hard to pin down the current “winner”; The “sustained use” pricing is easier to navigate than AWS Reserved Instances. Overall AWS pricing algorithms for various services seem more complicated than Google equivalents. • Network performance: Fantastic networking and excellent performance/latency figures between regions and zones. VPC type features are baked into the default resource set • Ops: Priced in 1min increments; no more need to hunt and kill servers at 55 min past the hour. Google has a concept of “Projects” with assigned collaborators and quotas. Quite different from the AWS account structure and IAM-based access control model. Project-based paradigm easier to think about for scientific use case. • IaaS Building Blocks: Still far fewer features than AWS but the core building blocks that we need for science and engineering workflows are present. ‣ My $.02 • AWS is still the clear leader but Google Compute is now a viable option and it is worth ‘kicking the tires’ in 2014 and beyond ... to me AWS has had no serious competition until now Wednesday, April 30, 14
  92. 92. Cloud Science Facilitators ‣ Cycle Computing is legit • They’ve proven themselves on some of largest IaaS HPC grids ever built • Experience with hybrid systems (cloud & premise) ‣ Smart people. Nice people. ‣ They have a booth, stop by and chat them up ... Wednesday, April 30, 14
  93. 93. 93 The road ahead ... Wednesday, April 30, 14
  94. 94. This has been a slow moving trend for years now ... 94 POSIX Alternatives Coming ‣ The scope of organizations faced with the limitations of POSIX filesystem will continue to expand ‣ We desperately need some sort of “metadata aware” data management solution in life science ‣ Nobody has an easy solution yet; several bespoke installations but no clear mass-market options ‣ IRODS front-ending “cheap & deep” storage tiers or object stores appears to be gaining significant interest out in our community Wednesday, April 30, 14
  95. 95. Application Containers are getting interesting 95 Watch out for: Containerization ‣ Application containerization via methods like http://docker.io gaining significant attention • Docker support now in native RHEL kernel • AWS Elastic Beanstalk recently added Docker support ‣ If broadly adopted, these techniques will stretch research IT infrastructures in interesting directions • This is far more interesting to me than moving virtual machines around a network or into the cloud ‣ ... with a related impact on storage location, features & capability ‣ Major new news and progress expected in 2014 Wednesday, April 30, 14
  96. 96. 96 Keep an eye on: Storage ‣ Data generation out-pacing technology ‣ Really interesting disruptive stuff on the market now ‣ Cheap/easy laboratory assays taking over • Researchers largely don’t know what to do with it all • Holding on to the data until someone figures it out • This will cause some interesting headaches for IT • Huge need for real “Big Data” applications to be developed Wednesday, April 30, 14
  97. 97. 97 Keep an eye on: Networking ‣ Unless there’s an investment in ultra-high speed networking, need to change thought on analysis ‣ Data commons are becoming a precedent • Need to minimize the movement of data • Include compute power and analysis platform with data commons ‣ Move the analysis to the data, don’t move the data • Requires sharing/Large core institutional resources Wednesday, April 30, 14
  98. 98. 98 Long term trends ... ‣ Compute continues to become easier ‣ Data movement and ingest (physical & network) gets harder ‣ Cost of storage will be dwarfed by “cost of managing stored data” ‣ We can see end-of-life for our current IT architecture and design patterns; new patterns will start to appear over next 2-5 years Wednesday, April 30, 14
  99. 99. 99 Wrap-up: Final Advice & Tips Wednesday, April 30, 14
  100. 100. Embrace The Innovation 100 Ending Advice: 1 of 5 ‣ Understand the ‘interesting times’ we are in • Science is changing faster than we can refresh IT • This is not going to change any time soon ‣ Advice: • Spend as much time thinking about future flexibility as you spend on actual current needs & requirements • Design for agility & responsiveness Wednesday, April 30, 14
  101. 101. Capacity 101 Ending Advice: 2 of 5 ‣ Many of us will need ‘petabyte capable’ storage ‣ However: • Only some of us will ever have 1PB+ under management • The hard part is knowing whom that will be Wednesday, April 30, 14
  102. 102. Tiers are in your future 102 Ending Advice: 3 of 5 ‣ Tiers are now a requirement, at least long-term • At a minimum we need an ‘active’ tier for processing & ingest • ... and some sort of inexpensive cold/nearline/archive option as well ‣ Advice: • It’s OK to buy a single block/tier of disk • ... but always have a strategy for diversification Wednesday, April 30, 14
  103. 103. 103 Ending Advice: 4 of 5 ‣ Above a certain scale, inefficient data management & simple storage practices are hugely wasteful ‣ Advice: • The cost of a new hire “data manager” or curator role may be cheaper and far more beneficial to your organization than continuing to throw CapEx dollars at keeping a badly run storage platform under it’s capacity limit • Many opportunities to get clever & recapture efficiency & capability: tiers, replication, cloud, dedupe, CRAM compression, iRODS • BROADEN YOUR PERSPECTIVE Wednesday, April 30, 14
  104. 104. 104 Ending Advice: 5 of 5 ‣ You need a cloud strategy. Yesterday. - Users, instrument makers & IT vendors are forcing the issue - Economic trends indicate cloud storage is inescapable - 90% of cloud is “easy”. Remaining 10% takes time & effort ‣ Advice: • The technical aspects of using “the cloud” are trivial • The political, policy and risk management aspects are difficult and time consuming; start these ASAP Wednesday, April 30, 14
  105. 105. 105 end; Thanks! slideshare.net/chrisdag/ chris@bioteam.net @chris_dag #BioIT14 Wednesday, April 30, 14

×