SlideShare a Scribd company logo
1 of 70
Download to read offline
Trends from the trenches.
2013 Bio IT World - Boston
                             1
Some less aspirational title slides ...




                                          2
Trends from the trenches.
2013 Bio IT World Boston
                            3
Trends from the trenches.
2013 Bio IT World Boston
                            4
I’m Chris.

I’m an infrastructure geek.

I work for the BioTeam.

             www.bioteam.net - Twitter: @chris_dag   5
BioTeam
Who, What, Why ...



‣ Independent consulting shop
‣ Staffed by scientists forced to
  learn IT, SW & HPC to get our
  own research done
‣ 10+ years bridging the “gap”
  between science, IT & high
  performance computing



                                    6
If you have not heard me speak ...
Apologies in advance


 ‣ “Infamous” for speaking
   very fast and carrying a
   huge slide deck
   •   ~70 slides for 25 minutes
       about average for me
   •   Let me mention what
       happened after my Pharma
       HPC best practices talk
       yesterday ...
                                   By the time you see this slide
                                   I’ll be on my ~4th espresso

                                                                    7
Why I do this talk every year ...

 ‣ Bioteam works for
   everyone
   •   Pharma, Biotech, EDU,
       Nonprofit, .Gov, etc.
 ‣ We get to see how
   groups of smart people
   approach similar
   problems
 ‣ We can speak honestly &
   objectively about what
   we see “in the real
   world”
                                    8
Standard Dag Disclaimer
Listen to me at your own risk




 ‣ I’m not an expert, pundit,
   visionary or “thought leader”
 ‣ Any career success entirely due
   to shamelessly copying what
   actual smart people do
 ‣ I’m biased, burnt-out & cynical
 ‣ Filter my words accordingly


                                     9
So why are you here?
And before 9am!




                       10
It’s a risky time to be doing Bio-IT



                                       11
Big Picture / Meta Issue

‣       HUGE revolution in the rate at which
        lab platforms are being redesigned,
        improved & refreshed
    •     Example: CCD sensor upgrade on that
          confocal microscopy rig just doubled
          storage requirements
    •     Example: The 2D ultrasound imager is
          now a 3D imager
    •     Example: Illumina HiSeq upgrade just
          doubled the rate at which you can acquire
          genomes. Massive downstream increase
          in storage, compute & data movement
          needs
‣       For the above examples, do you
        think IT was informed in advance?
                                                      12
The Central Problem Is ...
Science progressing way faster than IT can refresh/change


 ‣ Instrumentation & protocols are changing FAR
   FASTER than we can refresh our Research-IT &
   Scientific Computing infrastructure
   •   Bench science is changing month-to-month ...
   •   ... while our IT infrastructure only gets refreshed every
       2-7 years
 ‣ We have to design systems TODAY that can
   support unknown research requirements &
   workflows over many years (gulp ...)
                                                                   13
The Central Problem Is ...

‣ The easy period is over
‣ 5 years ago we could toss
  inexpensive storage and
  servers at the problem;
  even in a nearby closet or
  under a lab bench if
  necessary
‣ That does not work any
  more; real solutions
  required

                               14
The new normal.



                  15
And a related problem ...

‣       It has never been easier to
        acquire vast amounts of data
        cheaply and easily
‣       Growth rate of data creation/
        ingest exceeds rate at which
        the storage industry is
        improving disk capacity
‣       Not just a storage lifecycle
        problem. This data *moves*
        and often needs to be shared
        among multiple entities and
        providers
    •     ... ideally without punching holes in
          your firewall or consuming all
          available internet bandwidth
                                                  16
If you get it wrong ...



‣ Lost opportunity
‣ Missing capability
‣ Frustrated & very vocal scientific staff
‣ Problems in recruiting, retention,
  publication & product development


                                            17
Enough groundwork. Lets Talk Trends*

                                       18
Topic: DevOps & Org Charts



                             19
The social contract between
scientist and IT is changing forever




                                       20
You can blame “the cloud” for this



                                     21
DevOps & Scriptable Everything


‣ On (real) clouds,
  EVERYTHING has an
  API
‣ If it’s got an API you can
  automate and
  orchestrate it
‣ “scriptable datacenters”
  are now a very real thing

                                 22
DevOps & Scriptable Everything


‣ Incredible innovation in
  the past few years
‣ Driven mainly by
  companies with
  massive internet
  ‘fleets’ to manage
‣ ... but the benefits
  trickle down to us little
  people
                                 23
DevOps will conquer the enterprise

 ‣ Over the past few years
   cloud automation/
   orchestration methods
   have been trickling
   down into our local
   infrastructures
 ‣ This will have
   significant impact on
   careers, job
   descriptions and org
   charts
                                     24
Scientist/SysAdmin/Programmer
2013: Continue to blur the lines between all these roles

 ‣ Radical change in how IT
                                         www.opscode.com
   is provisioned, delivered,
   managed & supported
    •   Technology Driver:
        Virtualization & Cloud
    •   Ops Driver:
        Configuration Mgmt, Systems
        Orchestration & Infrastructure
        Automation

 ‣ SysAdmins & IT staff need
   to re-skill and retrain to
   stay relevant
                                                           25
Scientist/SysAdmin/Programmer
2013: Continue to blur the lines between all these roles


 ‣ When everything has an
   API ...
 ‣ ... anything can be
   ‘orchestrated’ or
   ‘automated’ remotely
 ‣ And by the way ...
 ‣ The APIs (‘knobs &
   buttons’) are accessible to
   all, not just the bearded
   practitioners sitting in that
   room next to the datacenter
                                                           26
Scientist/SysAdmin/Programmer
2013: Continue to blur the lines between all these roles


 ‣ IT jobs, roles and
   responsibilities are going
   to change significantly
 ‣ SysAdmins must learn to
   program in order to
   harness automation tools
 ‣ Programmers &
   Scientists can now self-
   provision and control
   sophisticated IT
   resources
                                                           27
Scientist/SysAdmin/Programmer
2013: Continue to blur the lines between all these roles


 ‣       My take on the future ...
     •     SysAdmins (Windows & Linux) who
           can’t code will have career issues
     •     Far more control is going into the
           hands of the research end user
     •     IT support roles will radically change
           -- no longer owners or gatekeepers
 ‣       IT will “own” policies,
         procedures, reference patterns,
         identity mgmt, security & best
         practices
 ‣       Research will control the
         “what”, “when” and “how big”
                                                           28
Topic: Facility Observations



                               29
Facility 1: Enterprise vs Shadow IT

‣ Marked difference in the
  types of facilities we’ve
  been working in
‣ Discovery Research
  systems are firmly
  embedded in the
  enterprise datacenter
‣ ... moving away from “wild
  west” unchaperoned
  locations and mini-
  facilities
                                      30
Facility 2: Colo Suites for R&D

 ‣ Marked increase in use of commercial colocation
   facilities for R&D systems
   •       And they’ve noticed!
       -     Markly Group (One Summer) has a booth
       -     Sabey is on this afternoon’s NYGenome panel

 ‣ Potential reasons:
   •       Expensive to build high-density hosting at small scale
   •       Easier metro networking to link remote users/sites
   •       Direct connect to cloud provider(s)
   •       High-speed research nets only a cross-connect away

                                                                    31
Facility 3: Some really old stuff ...

 ‣ Final facility observation
 ‣ Average age of infrastructure we work on seems to be
   increasing
 ‣ ... very few aggressive 2-year refresh cycles these days
 ‣ Potential reasons
   •   Recession & consolidation still effecting or deferring major
       technology upgrades and changes
   •   Cloud: local upgrades deferred pending strategic cloud decisions
   •   Cloud: economic analysis showing stark truth that local setups
       need to be run efficiently and at high utilization in order to justify
       existence

                                                                               32
Facility 3: Virtualization

 ‣ Every HPC environment
   we’ve worked on since
   2011has included (or
   plans to include) a local
   virtualization environment
   •   True for big systems: 2k
       cores / 2 petabyte disk
   •   True for small systems: 96
       core CompChem cluster

 ‣ Unlikely to change; too
   many advantages

                                    33
Facility 3: Virtualization

 ‣ HPC + Virtualization solves a lot of problems
  •   Deals with valid biz/scientific need for researchers to
      run/own/manage their own servers ‘near’ HPC stack
 ‣ Solves a ton of research IT support issues
  •   Or at least leaves us a clear boundary line
 ‣ Lets us obtain useful “cloud” features without
   choking on endless BS shoveled at us by
   “private cloud” vendors
  •   Example: Server Catalogs + Self-service Provisioning

                                                               34
Topic: Compute



                 35
Compute:

‣ Still feels like a solved
  problem in 2013
‣ Compute power is a
  commodity
  •   Inexpensive relative to other
      costs
  •   Far less vendor differentiation
      than storage
  •   Easy to acquire; easy to
      deploy
                                        36
Compute: Fat Nodes
Fat nodes are wiping out small and midsized clusters

 ‣ This box has 64 CPU Cores
   •   ... and up to 1TB of RAM
 ‣ Fantastic Genomics/
   Chemistry system
   •   A 256GB RAM version only
       costs $13,000*
 ‣ BioIT Homework:
   •   Go visit the Sillicon Mechanics
       booth and find out the current
       cost of a box with 1TB RAM
                                                       37
Possibly the most significant ’13 compute trend

                                                 38
Compute: Local Disk is Back
Defensive hedge against Big Data / HDFS

 ‣       We’ve started to see organizations move
         away from blade servers and 1U pizza box
         enclosures for HPC
 ‣       The “new normal” may be 4U enclosures
         with massive local disk spindles - not
         occupied, just available
 ‣       Why? Hadoop & Big Data
 ‣       This is a defensive hedge against future
         HDFS or similar requirements
     •     Remember the ‘meta’ problem - science is
           changing far faster than we can refresh IT. This
           is a defensive future-proofing play.
 ‣       Hardcore Hadoop rigs sometimes operate
         at 1:1 ratio between core count and disk
         count
                                                              39
Topic: Network



                 40
Network:

‣ 10 Gigabit Ethernet still the
  standard
  •   ... although not as pervasive as I
      predicted in prior trend talks

‣ Non-Cisco options attractive
  •   BioIT homework: listen to the Arista
      talks and visit their booth.

‣ SDN still more hype than reality
  in our market
  •   May not see it until next round of
      large private cloud rollouts or new
      facility construction (if even)
                                             41
Network:

‣ Infiniband for message passing
  in decline
  •   Still see it for comp chem, modeling &
      structure work; Started building such
      a system last week
  •   Still see it for parallel and clustered
      storage
  •   Decline seems to match decreasing
      popularity of MPI for latest generation
      of informatics and ‘omics tools

‣ Hadoop / HDFS seems to favor
  throughput and bandwidth over
  latency
                                                42
Topic: Storage



                 43
Storage

‣       Still the biggest expense, biggest headache and scariest
        systems to design in modern life science informatics
        environments
‣       Most of my slides for last year’s trends talk focused on
        storage & data lifecycle issues
    •     Check http://slideshare.net/chrisdag/ if you want to see what I’ve said
          in the past
    •     Dag accuracy check: It was great yesterday to see DataDirect talking
          about the KVM hypervisor running on their storage shelves! I’m
          convinced more and more apps will run directly on storage in the future
‣       ... not doing that this year. The core problems and common
        approaches are largely unchanged and don’t need to be
        restated
                                                                                    44
It’s 2013, we know what questions to ask of our storage


                                                          45
NGS new data generation: 6-month window




Data like this lets us make realistic capacity planning and purchase decisions



                                                                                 46
Storage: 2013


‣ Advice: Stay on top of the
  “compute nodes with
  many disks” trends.
‣ HDFS if suddenly required
  by your scientists can be
  painful to deploy in a
  standard scale-out NAS
  environment

                               47
Storage: 2013




‣ Object Storage is
  getting interesting




                        48
Storage: 2013
Object Storage + Commodity Disk Pods

 ‣       Object storage is far more approachable
     •     ... used to see it in proprietary solutions for specific niche needs
     •     potentially on it’s way to the mainstream now
 ‣       Why?
     •     Benefits are compelling across a wide variety of interesting use cases
     •     Amazon S3 showed what a globe-spanning general purpose object
           store could do; this is starting to convince developers & ISVs to modify
           their software to support it
     •     www.swiftstack.com and others are making local object stores easy,
           inexpensive and approachable on commodity gear
     •     Most of your Tier1 storage and server vendors have a fully supported
           object store stack they can sell to you (or simply enable in a product
           you already have deployed in-house)
                                                                                      49
Remember this disruptive technology example from last year?



                                                              50
100 Terabytes for $12,000
(more info: http://biote.am/8p )


                                   51
Storage: 2013


‣ There are MANY reasons why you should
  not build that $12K backblaze pod
  •   ... done wrong you will potentially inconvenience
      researchers, lose critical scientific information and
      (probably) lose your job
‣ Inexpensive or open source object storage
  software makes the ultra-cheap storage
  pod concept viable

                                                             52
Storage: 2013

‣ A single unit like this is risky and should only
  be used for well known and scoped use cases.
  Risks generally outweigh the disruptive price
  advantage
‣ However ...
‣ What if you had 3+ of these units running an
  object store stack with automatic triple
  location replication, recovery and self-healing?
  •   Then things get interesting
  •   This is one of the ‘lab’ projects I hope to work on in ’13
                                                                   53
Storage: 2013

‣ Caveat/Warning
  •   The 2013 editions of “backblaze-like” enclosures mitigate
      many of the earlier availability, operational and reliability
      concerns
  •   Still a aggressive play that carries risk in exchange for a
      disruptive price point
‣ There is a middle ground
  •   Lots of action in the ZFS space with safer & more mainstream
      enclosures
  •   BioIT Homework: Visit the Silicon Mechanics booth and
      check out what they are doing with Nexenta’s Open Storage
      stuff.
                                                                      54
Topic: Cloud



               55
Can you do a Bio-IT talk without using the ‘C’ word?



                                                       56
Cloud: 2013




‣ Our core advice remains the same
‣ What’s changed




                                     57
Cloud: 2013
Core Advice


 ‣ Research Organizations need a cloud
   strategy today
   •   Those that don’t will be bypassed by frustrated
       users
 ‣ IaaS cloud services are only a departmental
   credit card away ... and some senior
   scientists are too big to be fired for violating
   IT policy

                                                         58
Cloud Advice
Design Patterns




 ‣ You actually need three tested cloud design
   patterns:

 ‣ (1) To handle ‘legacy’ scientific apps & workflows
 ‣ (2) The special stuff that is worth re-architecting
 ‣ (3) Hadoop & big data analytics

                                                         59
Cloud Advice
Legacy HPC on the Cloud




 ‣ MIT StarCluster
   •   http://web.mit.edu/star/cluster/
 ‣ This is your baseline
 ‣ Extend as needed



                                          60
Cloud Advice
“Cloudy” HPC



 ‣ Some of our research workflows are important
   enough to be rewritten for “the cloud” and the
   advantages that a truly elastic & API-driven
   infrastructure can deliver
 ‣ This is where you have the most freedom
 ‣ Many published best practices you can borrow
 ‣ Warning: Cloud vendor lock-in potential is
   strongest here
                                                    61
Hadoop & “Big Data”
What you need to know



 ‣ “Hadoop” and “Big Data” are now general
   terms
 ‣ You need to drill down to find out what people
   actually mean
 ‣ We are still in the period where senior
   leadership may demand “Hadoop” or “BigData”
   capability without any actual business or
   scientific need
                                                   62
Hadoop & “Big Data”
What you need to know

 ‣ In broad terms you can break “Big Data” down into two
   very basic use cases:
 1. Compute: Hadoop can be used as a very powerful
    platform for the analysis of very large data sets. The
    google search term here is “map reduce”
 2. Data Stores: Hadoop is driving the development of very
    sophisticated “no-SQL” “non-Relational” databases and
    data query engines. The google search terms include
    “nosql”, “couchdb”, “hive”, “pig” & “mongodb”, etc.
 ‣ Your job is to figure out which type applies for the
   groups requesting “Hadoop” or “BigData” capability

                                                             63
Cloud: 2013
What has changed ..




 ‣ Lets revisit some of my bile from prior years
 ‣ “... private clouds: still utter crap”
 ‣ “... some AWS competitors are delusional
   pretenders”
 ‣ “... AWS has a multi-year lead on the
   competition”

                                                   64
Private Clouds in 2013:


 ‣       I’m no longer dismissing them as “utter crap”
 ‣       Usable & useful in certain situations
 ‣       BioTeam positive experiences with OpenStack
 ‣       Hype vs. Reality ratio still wacky
 ‣       Sensible only for certain shops
     •     Have you seen what you have to do
           to your networks & gear?
 ‣       Still important to remain cynical and perform proper due dillegenge
Non-AWS IaaS in 2013

‣       Three main drivers for BioTeam’s evolving IaaS practices and thinking
        for 2013:
‣       (1) Real world success with OpenStack & BT
‣       (2) Real world success with Google Compute
‣       (3) Real world multi-cloud DevOps
‣       Just to remain honest though:
    •     AWS still has multi-year lead in product, service and features
    •     .. and many novel capabilities
    •     But some of the competition has some interesting benefits that AWS can’t match
BioTeam, BT & OpenStack


‣ We’ve been working with BT for a while now on
  various projects
‣ BT Cloud using OpenStack under the hood with some
  really nice architecture and operational features
‣ BioTeam developed a Chef-based HPC clustering
  stack and other tools that are currently being used by
  BT customers
  •   ... some of whom have spoken openly at this meeting
BioTeam & Google Compute Engine


‣ We can’t even get into the preview program
‣ But one of our customers did
‣ ... and we’ve been able to do some successful and
  interesting stuff
  •   Without changing operations or DevOps tools our client is capable of
      running both on AWS and Google Compute
  •   For this client and a few other use cases we believe we can span both
      clouds or construct architectures that would enable fast and relatively
      friction-free transitions
Chef, AWS, OpenStack & Google
Wrapping up ...


 ‣ 2012 was the 1st year we did real work spanning multiple
   IaaS cloud platforms or at least replicating workloads on
   multiple platforms
 ‣ We’ve learned a lot - I think this may result in some
   interesting talks at next year’s Bio-IT meeting
     -   By BioTeam and actual end-users

 ‣ What makes this all possible is the DevOps / Orchestration
   stuff mentioned at the beginning of this presentation.
end; Thanks!
Slides: http://slideshare.net/chrisdag/
                                          70

More Related Content

What's hot

Multi-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersMulti-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersChris Dagdigian
 
2021 Trends from the Trenches
2021 Trends from the Trenches2021 Trends from the Trenches
2021 Trends from the TrenchesChris Dagdigian
 
2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentation2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentationChris Dagdigian
 
Trends from the Trenches: 2019
Trends from the Trenches: 2019Trends from the Trenches: 2019
Trends from the Trenches: 2019Chris Dagdigian
 
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingBio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingChris Dagdigian
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Chris Dagdigian
 
2012: Trends from the Trenches
2012: Trends from the Trenches2012: Trends from the Trenches
2012: Trends from the TrenchesChris Dagdigian
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Chris Dagdigian
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte PushingChris Dagdigian
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?mark madsen
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogiesmark madsen
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)mark madsen
 
Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Chris Dagdigian
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except usmark madsen
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionmark madsen
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerMicrosoft
 
IT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOsIT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOsVikram Ramesh
 
BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014Ari Berman
 
Scaling Face Recognition with Big Data
Scaling Face Recognition with Big DataScaling Face Recognition with Big Data
Scaling Face Recognition with Big DataBogdan Bocse
 

What's hot (20)

Multi-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersMulti-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC Clusters
 
2021 Trends from the Trenches
2021 Trends from the Trenches2021 Trends from the Trenches
2021 Trends from the Trenches
 
2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentation2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentation
 
Trends from the Trenches: 2019
Trends from the Trenches: 2019Trends from the Trenches: 2019
Trends from the Trenches: 2019
 
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingBio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
 
2012: Trends from the Trenches
2012: Trends from the Trenches2012: Trends from the Trenches
2012: Trends from the Trenches
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte Pushing
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogies
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehouse
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)
 
Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except us
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collection
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
 
IT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOsIT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOs
 
BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014
 
Scaling Face Recognition with Big Data
Scaling Face Recognition with Big DataScaling Face Recognition with Big Data
Scaling Face Recognition with Big Data
 

Similar to Trends from DevOps to Blurring IT Roles

From the Benchtop to the Datacenter: HPC Requirements in Life Science Research
From the Benchtop to the Datacenter: HPC Requirements in Life Science ResearchFrom the Benchtop to the Datacenter: HPC Requirements in Life Science Research
From the Benchtop to the Datacenter: HPC Requirements in Life Science ResearchAri Berman
 
Emerging technologies
Emerging technologiesEmerging technologies
Emerging technologiesSteve Feldman
 
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)Ioan Toma
 
Fifteen Years of DevOps -- LISA 2012 keynote
Fifteen Years of DevOps -- LISA 2012 keynoteFifteen Years of DevOps -- LISA 2012 keynote
Fifteen Years of DevOps -- LISA 2012 keynoteGeoff Halprin
 
The Distributed & Decentralized Cloud
The Distributed & Decentralized CloudThe Distributed & Decentralized Cloud
The Distributed & Decentralized CloudMargaret Dawson
 
Scientific computing on amazon web services
Scientific computing on amazon web servicesScientific computing on amazon web services
Scientific computing on amazon web servicesThe BioTeam Inc.
 
Mini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São PauloMini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São PauloOCTO Technology
 
Bimodal IT and EDW Modernization
Bimodal IT and EDW ModernizationBimodal IT and EDW Modernization
Bimodal IT and EDW ModernizationRobert Gleave
 
AWS re:Invent 2016: From Dial-Up to DevOps - AOL’s Migration to the Cloud (DE...
AWS re:Invent 2016: From Dial-Up to DevOps - AOL’s Migration to the Cloud (DE...AWS re:Invent 2016: From Dial-Up to DevOps - AOL’s Migration to the Cloud (DE...
AWS re:Invent 2016: From Dial-Up to DevOps - AOL’s Migration to the Cloud (DE...Amazon Web Services
 
Cloud Opportunities for Local Governmen
Cloud Opportunities for Local GovernmenCloud Opportunities for Local Governmen
Cloud Opportunities for Local GovernmenTim Willoughby
 
High-Performance Networking Use Cases in Life Sciences
High-Performance Networking Use Cases in Life SciencesHigh-Performance Networking Use Cases in Life Sciences
High-Performance Networking Use Cases in Life SciencesAri Berman
 
Future of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonFuture of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonKhazret Sapenov
 
The Transformation of Healthcare.gov: Two years of innovation in how our gov...
The Transformation of Healthcare.gov: Two years of innovation in how our gov...The Transformation of Healthcare.gov: Two years of innovation in how our gov...
The Transformation of Healthcare.gov: Two years of innovation in how our gov...New Relic
 
Embracing Disruptive Change with OpenCredo and Google
Embracing Disruptive Change with OpenCredo and GoogleEmbracing Disruptive Change with OpenCredo and Google
Embracing Disruptive Change with OpenCredo and GoogleDaniel Bryant
 
Big Data LDN 2018: USING FAST-DATA TO MAKE SEMICONDUCTORS
Big Data LDN 2018: USING FAST-DATA TO MAKE SEMICONDUCTORSBig Data LDN 2018: USING FAST-DATA TO MAKE SEMICONDUCTORS
Big Data LDN 2018: USING FAST-DATA TO MAKE SEMICONDUCTORSMatt Stubbs
 
Dr Jimmy Schwarzkopf Keynote @STKI Summit 2011
Dr Jimmy Schwarzkopf  Keynote @STKI Summit 2011Dr Jimmy Schwarzkopf  Keynote @STKI Summit 2011
Dr Jimmy Schwarzkopf Keynote @STKI Summit 2011Dr. Jimmy Schwarzkopf
 

Similar to Trends from DevOps to Blurring IT Roles (20)

From the Benchtop to the Datacenter: HPC Requirements in Life Science Research
From the Benchtop to the Datacenter: HPC Requirements in Life Science ResearchFrom the Benchtop to the Datacenter: HPC Requirements in Life Science Research
From the Benchtop to the Datacenter: HPC Requirements in Life Science Research
 
Emerging technologies
Emerging technologiesEmerging technologies
Emerging technologies
 
UCISA 2013 Presentation
UCISA 2013 PresentationUCISA 2013 Presentation
UCISA 2013 Presentation
 
Curated Computing
Curated Computing Curated Computing
Curated Computing
 
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
 
Fifteen Years of DevOps -- LISA 2012 keynote
Fifteen Years of DevOps -- LISA 2012 keynoteFifteen Years of DevOps -- LISA 2012 keynote
Fifteen Years of DevOps -- LISA 2012 keynote
 
The Distributed & Decentralized Cloud
The Distributed & Decentralized CloudThe Distributed & Decentralized Cloud
The Distributed & Decentralized Cloud
 
Scientific computing on amazon web services
Scientific computing on amazon web servicesScientific computing on amazon web services
Scientific computing on amazon web services
 
The new mobile world
The new mobile worldThe new mobile world
The new mobile world
 
Mini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São PauloMini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São Paulo
 
Bimodal IT and EDW Modernization
Bimodal IT and EDW ModernizationBimodal IT and EDW Modernization
Bimodal IT and EDW Modernization
 
AWS re:Invent 2016: From Dial-Up to DevOps - AOL’s Migration to the Cloud (DE...
AWS re:Invent 2016: From Dial-Up to DevOps - AOL’s Migration to the Cloud (DE...AWS re:Invent 2016: From Dial-Up to DevOps - AOL’s Migration to the Cloud (DE...
AWS re:Invent 2016: From Dial-Up to DevOps - AOL’s Migration to the Cloud (DE...
 
Cloud Opportunities for Local Governmen
Cloud Opportunities for Local GovernmenCloud Opportunities for Local Governmen
Cloud Opportunities for Local Governmen
 
High-Performance Networking Use Cases in Life Sciences
High-Performance Networking Use Cases in Life SciencesHigh-Performance Networking Use Cases in Life Sciences
High-Performance Networking Use Cases in Life Sciences
 
Future of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonFuture of cloud up presentation m_dawson
Future of cloud up presentation m_dawson
 
The Transformation of Healthcare.gov: Two years of innovation in how our gov...
The Transformation of Healthcare.gov: Two years of innovation in how our gov...The Transformation of Healthcare.gov: Two years of innovation in how our gov...
The Transformation of Healthcare.gov: Two years of innovation in how our gov...
 
Embracing Disruptive Change with OpenCredo and Google
Embracing Disruptive Change with OpenCredo and GoogleEmbracing Disruptive Change with OpenCredo and Google
Embracing Disruptive Change with OpenCredo and Google
 
Big Data LDN 2018: USING FAST-DATA TO MAKE SEMICONDUCTORS
Big Data LDN 2018: USING FAST-DATA TO MAKE SEMICONDUCTORSBig Data LDN 2018: USING FAST-DATA TO MAKE SEMICONDUCTORS
Big Data LDN 2018: USING FAST-DATA TO MAKE SEMICONDUCTORS
 
Dr Jimmy Schwarzkopf Keynote @STKI Summit 2011
Dr Jimmy Schwarzkopf  Keynote @STKI Summit 2011Dr Jimmy Schwarzkopf  Keynote @STKI Summit 2011
Dr Jimmy Schwarzkopf Keynote @STKI Summit 2011
 
Cloudsourcing2013
Cloudsourcing2013Cloudsourcing2013
Cloudsourcing2013
 

Recently uploaded

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Trends from DevOps to Blurring IT Roles

  • 1. Trends from the trenches. 2013 Bio IT World - Boston 1
  • 2. Some less aspirational title slides ... 2
  • 3. Trends from the trenches. 2013 Bio IT World Boston 3
  • 4. Trends from the trenches. 2013 Bio IT World Boston 4
  • 5. I’m Chris. I’m an infrastructure geek. I work for the BioTeam. www.bioteam.net - Twitter: @chris_dag 5
  • 6. BioTeam Who, What, Why ... ‣ Independent consulting shop ‣ Staffed by scientists forced to learn IT, SW & HPC to get our own research done ‣ 10+ years bridging the “gap” between science, IT & high performance computing 6
  • 7. If you have not heard me speak ... Apologies in advance ‣ “Infamous” for speaking very fast and carrying a huge slide deck • ~70 slides for 25 minutes about average for me • Let me mention what happened after my Pharma HPC best practices talk yesterday ... By the time you see this slide I’ll be on my ~4th espresso 7
  • 8. Why I do this talk every year ... ‣ Bioteam works for everyone • Pharma, Biotech, EDU, Nonprofit, .Gov, etc. ‣ We get to see how groups of smart people approach similar problems ‣ We can speak honestly & objectively about what we see “in the real world” 8
  • 9. Standard Dag Disclaimer Listen to me at your own risk ‣ I’m not an expert, pundit, visionary or “thought leader” ‣ Any career success entirely due to shamelessly copying what actual smart people do ‣ I’m biased, burnt-out & cynical ‣ Filter my words accordingly 9
  • 10. So why are you here? And before 9am! 10
  • 11. It’s a risky time to be doing Bio-IT 11
  • 12. Big Picture / Meta Issue ‣ HUGE revolution in the rate at which lab platforms are being redesigned, improved & refreshed • Example: CCD sensor upgrade on that confocal microscopy rig just doubled storage requirements • Example: The 2D ultrasound imager is now a 3D imager • Example: Illumina HiSeq upgrade just doubled the rate at which you can acquire genomes. Massive downstream increase in storage, compute & data movement needs ‣ For the above examples, do you think IT was informed in advance? 12
  • 13. The Central Problem Is ... Science progressing way faster than IT can refresh/change ‣ Instrumentation & protocols are changing FAR FASTER than we can refresh our Research-IT & Scientific Computing infrastructure • Bench science is changing month-to-month ... • ... while our IT infrastructure only gets refreshed every 2-7 years ‣ We have to design systems TODAY that can support unknown research requirements & workflows over many years (gulp ...) 13
  • 14. The Central Problem Is ... ‣ The easy period is over ‣ 5 years ago we could toss inexpensive storage and servers at the problem; even in a nearby closet or under a lab bench if necessary ‣ That does not work any more; real solutions required 14
  • 16. And a related problem ... ‣ It has never been easier to acquire vast amounts of data cheaply and easily ‣ Growth rate of data creation/ ingest exceeds rate at which the storage industry is improving disk capacity ‣ Not just a storage lifecycle problem. This data *moves* and often needs to be shared among multiple entities and providers • ... ideally without punching holes in your firewall or consuming all available internet bandwidth 16
  • 17. If you get it wrong ... ‣ Lost opportunity ‣ Missing capability ‣ Frustrated & very vocal scientific staff ‣ Problems in recruiting, retention, publication & product development 17
  • 18. Enough groundwork. Lets Talk Trends* 18
  • 19. Topic: DevOps & Org Charts 19
  • 20. The social contract between scientist and IT is changing forever 20
  • 21. You can blame “the cloud” for this 21
  • 22. DevOps & Scriptable Everything ‣ On (real) clouds, EVERYTHING has an API ‣ If it’s got an API you can automate and orchestrate it ‣ “scriptable datacenters” are now a very real thing 22
  • 23. DevOps & Scriptable Everything ‣ Incredible innovation in the past few years ‣ Driven mainly by companies with massive internet ‘fleets’ to manage ‣ ... but the benefits trickle down to us little people 23
  • 24. DevOps will conquer the enterprise ‣ Over the past few years cloud automation/ orchestration methods have been trickling down into our local infrastructures ‣ This will have significant impact on careers, job descriptions and org charts 24
  • 25. Scientist/SysAdmin/Programmer 2013: Continue to blur the lines between all these roles ‣ Radical change in how IT www.opscode.com is provisioned, delivered, managed & supported • Technology Driver: Virtualization & Cloud • Ops Driver: Configuration Mgmt, Systems Orchestration & Infrastructure Automation ‣ SysAdmins & IT staff need to re-skill and retrain to stay relevant 25
  • 26. Scientist/SysAdmin/Programmer 2013: Continue to blur the lines between all these roles ‣ When everything has an API ... ‣ ... anything can be ‘orchestrated’ or ‘automated’ remotely ‣ And by the way ... ‣ The APIs (‘knobs & buttons’) are accessible to all, not just the bearded practitioners sitting in that room next to the datacenter 26
  • 27. Scientist/SysAdmin/Programmer 2013: Continue to blur the lines between all these roles ‣ IT jobs, roles and responsibilities are going to change significantly ‣ SysAdmins must learn to program in order to harness automation tools ‣ Programmers & Scientists can now self- provision and control sophisticated IT resources 27
  • 28. Scientist/SysAdmin/Programmer 2013: Continue to blur the lines between all these roles ‣ My take on the future ... • SysAdmins (Windows & Linux) who can’t code will have career issues • Far more control is going into the hands of the research end user • IT support roles will radically change -- no longer owners or gatekeepers ‣ IT will “own” policies, procedures, reference patterns, identity mgmt, security & best practices ‣ Research will control the “what”, “when” and “how big” 28
  • 30. Facility 1: Enterprise vs Shadow IT ‣ Marked difference in the types of facilities we’ve been working in ‣ Discovery Research systems are firmly embedded in the enterprise datacenter ‣ ... moving away from “wild west” unchaperoned locations and mini- facilities 30
  • 31. Facility 2: Colo Suites for R&D ‣ Marked increase in use of commercial colocation facilities for R&D systems • And they’ve noticed! - Markly Group (One Summer) has a booth - Sabey is on this afternoon’s NYGenome panel ‣ Potential reasons: • Expensive to build high-density hosting at small scale • Easier metro networking to link remote users/sites • Direct connect to cloud provider(s) • High-speed research nets only a cross-connect away 31
  • 32. Facility 3: Some really old stuff ... ‣ Final facility observation ‣ Average age of infrastructure we work on seems to be increasing ‣ ... very few aggressive 2-year refresh cycles these days ‣ Potential reasons • Recession & consolidation still effecting or deferring major technology upgrades and changes • Cloud: local upgrades deferred pending strategic cloud decisions • Cloud: economic analysis showing stark truth that local setups need to be run efficiently and at high utilization in order to justify existence 32
  • 33. Facility 3: Virtualization ‣ Every HPC environment we’ve worked on since 2011has included (or plans to include) a local virtualization environment • True for big systems: 2k cores / 2 petabyte disk • True for small systems: 96 core CompChem cluster ‣ Unlikely to change; too many advantages 33
  • 34. Facility 3: Virtualization ‣ HPC + Virtualization solves a lot of problems • Deals with valid biz/scientific need for researchers to run/own/manage their own servers ‘near’ HPC stack ‣ Solves a ton of research IT support issues • Or at least leaves us a clear boundary line ‣ Lets us obtain useful “cloud” features without choking on endless BS shoveled at us by “private cloud” vendors • Example: Server Catalogs + Self-service Provisioning 34
  • 36. Compute: ‣ Still feels like a solved problem in 2013 ‣ Compute power is a commodity • Inexpensive relative to other costs • Far less vendor differentiation than storage • Easy to acquire; easy to deploy 36
  • 37. Compute: Fat Nodes Fat nodes are wiping out small and midsized clusters ‣ This box has 64 CPU Cores • ... and up to 1TB of RAM ‣ Fantastic Genomics/ Chemistry system • A 256GB RAM version only costs $13,000* ‣ BioIT Homework: • Go visit the Sillicon Mechanics booth and find out the current cost of a box with 1TB RAM 37
  • 38. Possibly the most significant ’13 compute trend 38
  • 39. Compute: Local Disk is Back Defensive hedge against Big Data / HDFS ‣ We’ve started to see organizations move away from blade servers and 1U pizza box enclosures for HPC ‣ The “new normal” may be 4U enclosures with massive local disk spindles - not occupied, just available ‣ Why? Hadoop & Big Data ‣ This is a defensive hedge against future HDFS or similar requirements • Remember the ‘meta’ problem - science is changing far faster than we can refresh IT. This is a defensive future-proofing play. ‣ Hardcore Hadoop rigs sometimes operate at 1:1 ratio between core count and disk count 39
  • 41. Network: ‣ 10 Gigabit Ethernet still the standard • ... although not as pervasive as I predicted in prior trend talks ‣ Non-Cisco options attractive • BioIT homework: listen to the Arista talks and visit their booth. ‣ SDN still more hype than reality in our market • May not see it until next round of large private cloud rollouts or new facility construction (if even) 41
  • 42. Network: ‣ Infiniband for message passing in decline • Still see it for comp chem, modeling & structure work; Started building such a system last week • Still see it for parallel and clustered storage • Decline seems to match decreasing popularity of MPI for latest generation of informatics and ‘omics tools ‣ Hadoop / HDFS seems to favor throughput and bandwidth over latency 42
  • 44. Storage ‣ Still the biggest expense, biggest headache and scariest systems to design in modern life science informatics environments ‣ Most of my slides for last year’s trends talk focused on storage & data lifecycle issues • Check http://slideshare.net/chrisdag/ if you want to see what I’ve said in the past • Dag accuracy check: It was great yesterday to see DataDirect talking about the KVM hypervisor running on their storage shelves! I’m convinced more and more apps will run directly on storage in the future ‣ ... not doing that this year. The core problems and common approaches are largely unchanged and don’t need to be restated 44
  • 45. It’s 2013, we know what questions to ask of our storage 45
  • 46. NGS new data generation: 6-month window Data like this lets us make realistic capacity planning and purchase decisions 46
  • 47. Storage: 2013 ‣ Advice: Stay on top of the “compute nodes with many disks” trends. ‣ HDFS if suddenly required by your scientists can be painful to deploy in a standard scale-out NAS environment 47
  • 48. Storage: 2013 ‣ Object Storage is getting interesting 48
  • 49. Storage: 2013 Object Storage + Commodity Disk Pods ‣ Object storage is far more approachable • ... used to see it in proprietary solutions for specific niche needs • potentially on it’s way to the mainstream now ‣ Why? • Benefits are compelling across a wide variety of interesting use cases • Amazon S3 showed what a globe-spanning general purpose object store could do; this is starting to convince developers & ISVs to modify their software to support it • www.swiftstack.com and others are making local object stores easy, inexpensive and approachable on commodity gear • Most of your Tier1 storage and server vendors have a fully supported object store stack they can sell to you (or simply enable in a product you already have deployed in-house) 49
  • 50. Remember this disruptive technology example from last year? 50
  • 51. 100 Terabytes for $12,000 (more info: http://biote.am/8p ) 51
  • 52. Storage: 2013 ‣ There are MANY reasons why you should not build that $12K backblaze pod • ... done wrong you will potentially inconvenience researchers, lose critical scientific information and (probably) lose your job ‣ Inexpensive or open source object storage software makes the ultra-cheap storage pod concept viable 52
  • 53. Storage: 2013 ‣ A single unit like this is risky and should only be used for well known and scoped use cases. Risks generally outweigh the disruptive price advantage ‣ However ... ‣ What if you had 3+ of these units running an object store stack with automatic triple location replication, recovery and self-healing? • Then things get interesting • This is one of the ‘lab’ projects I hope to work on in ’13 53
  • 54. Storage: 2013 ‣ Caveat/Warning • The 2013 editions of “backblaze-like” enclosures mitigate many of the earlier availability, operational and reliability concerns • Still a aggressive play that carries risk in exchange for a disruptive price point ‣ There is a middle ground • Lots of action in the ZFS space with safer & more mainstream enclosures • BioIT Homework: Visit the Silicon Mechanics booth and check out what they are doing with Nexenta’s Open Storage stuff. 54
  • 56. Can you do a Bio-IT talk without using the ‘C’ word? 56
  • 57. Cloud: 2013 ‣ Our core advice remains the same ‣ What’s changed 57
  • 58. Cloud: 2013 Core Advice ‣ Research Organizations need a cloud strategy today • Those that don’t will be bypassed by frustrated users ‣ IaaS cloud services are only a departmental credit card away ... and some senior scientists are too big to be fired for violating IT policy 58
  • 59. Cloud Advice Design Patterns ‣ You actually need three tested cloud design patterns: ‣ (1) To handle ‘legacy’ scientific apps & workflows ‣ (2) The special stuff that is worth re-architecting ‣ (3) Hadoop & big data analytics 59
  • 60. Cloud Advice Legacy HPC on the Cloud ‣ MIT StarCluster • http://web.mit.edu/star/cluster/ ‣ This is your baseline ‣ Extend as needed 60
  • 61. Cloud Advice “Cloudy” HPC ‣ Some of our research workflows are important enough to be rewritten for “the cloud” and the advantages that a truly elastic & API-driven infrastructure can deliver ‣ This is where you have the most freedom ‣ Many published best practices you can borrow ‣ Warning: Cloud vendor lock-in potential is strongest here 61
  • 62. Hadoop & “Big Data” What you need to know ‣ “Hadoop” and “Big Data” are now general terms ‣ You need to drill down to find out what people actually mean ‣ We are still in the period where senior leadership may demand “Hadoop” or “BigData” capability without any actual business or scientific need 62
  • 63. Hadoop & “Big Data” What you need to know ‣ In broad terms you can break “Big Data” down into two very basic use cases: 1. Compute: Hadoop can be used as a very powerful platform for the analysis of very large data sets. The google search term here is “map reduce” 2. Data Stores: Hadoop is driving the development of very sophisticated “no-SQL” “non-Relational” databases and data query engines. The google search terms include “nosql”, “couchdb”, “hive”, “pig” & “mongodb”, etc. ‣ Your job is to figure out which type applies for the groups requesting “Hadoop” or “BigData” capability 63
  • 64. Cloud: 2013 What has changed .. ‣ Lets revisit some of my bile from prior years ‣ “... private clouds: still utter crap” ‣ “... some AWS competitors are delusional pretenders” ‣ “... AWS has a multi-year lead on the competition” 64
  • 65. Private Clouds in 2013: ‣ I’m no longer dismissing them as “utter crap” ‣ Usable & useful in certain situations ‣ BioTeam positive experiences with OpenStack ‣ Hype vs. Reality ratio still wacky ‣ Sensible only for certain shops • Have you seen what you have to do to your networks & gear? ‣ Still important to remain cynical and perform proper due dillegenge
  • 66. Non-AWS IaaS in 2013 ‣ Three main drivers for BioTeam’s evolving IaaS practices and thinking for 2013: ‣ (1) Real world success with OpenStack & BT ‣ (2) Real world success with Google Compute ‣ (3) Real world multi-cloud DevOps ‣ Just to remain honest though: • AWS still has multi-year lead in product, service and features • .. and many novel capabilities • But some of the competition has some interesting benefits that AWS can’t match
  • 67. BioTeam, BT & OpenStack ‣ We’ve been working with BT for a while now on various projects ‣ BT Cloud using OpenStack under the hood with some really nice architecture and operational features ‣ BioTeam developed a Chef-based HPC clustering stack and other tools that are currently being used by BT customers • ... some of whom have spoken openly at this meeting
  • 68. BioTeam & Google Compute Engine ‣ We can’t even get into the preview program ‣ But one of our customers did ‣ ... and we’ve been able to do some successful and interesting stuff • Without changing operations or DevOps tools our client is capable of running both on AWS and Google Compute • For this client and a few other use cases we believe we can span both clouds or construct architectures that would enable fast and relatively friction-free transitions
  • 69. Chef, AWS, OpenStack & Google Wrapping up ... ‣ 2012 was the 1st year we did real work spanning multiple IaaS cloud platforms or at least replicating workloads on multiple platforms ‣ We’ve learned a lot - I think this may result in some interesting talks at next year’s Bio-IT meeting - By BioTeam and actual end-users ‣ What makes this all possible is the DevOps / Orchestration stuff mentioned at the beginning of this presentation.