Trends from DevOps to Blurring IT Roles

Trends from the trenches.
2013 Bio IT World - Boston
1

Some less aspirational title slides ...

2

2013 Bio IT World Boston
3

2013 Bio IT World Boston
4

I’m Chris.

I’m an infrastructure geek.

I work for the BioTeam.

www.bioteam.net - Twitter: @chris_dag 5

BioTeam
Who, What, Why ...

‣ Independent consulting shop
‣ Staffed by scientists forced to
learn IT, SW & HPC to get our
own research done
‣ 10+ years bridging the “gap”
between science, IT & high
performance computing

6

If you have not heard me speak ...
Apologies in advance

‣ “Infamous” for speaking
very fast and carrying a
huge slide deck
• ~70 slides for 25 minutes
about average for me
• Let me mention what
happened after my Pharma
HPC best practices talk
yesterday ...
By the time you see this slide
I’ll be on my ~4th espresso

7

Why I do this talk every year ...

‣ Bioteam works for
everyone
• Pharma, Biotech, EDU,
Nonproﬁt, .Gov, etc.
‣ We get to see how
groups of smart people
approach similar
problems
‣ We can speak honestly &
objectively about what
we see “in the real
world”
8

Standard Dag Disclaimer
Listen to me at your own risk

‣ I’m not an expert, pundit,
visionary or “thought leader”
‣ Any career success entirely due
to shamelessly copying what
actual smart people do
‣ I’m biased, burnt-out & cynical
‣ Filter my words accordingly

9

So why are you here?
And before 9am!

10

It’s a risky time to be doing Bio-IT

11

Big Picture / Meta Issue

‣ HUGE revolution in the rate at which
lab platforms are being redesigned,
improved & refreshed
• Example: CCD sensor upgrade on that
confocal microscopy rig just doubled
storage requirements
• Example: The 2D ultrasound imager is
now a 3D imager
• Example: Illumina HiSeq upgrade just
doubled the rate at which you can acquire
genomes. Massive downstream increase
in storage, compute & data movement
needs
‣ For the above examples, do you
think IT was informed in advance?
12

The Central Problem Is ...
Science progressing way faster than IT can refresh/change

‣ Instrumentation & protocols are changing FAR
FASTER than we can refresh our Research-IT &
Scientiﬁc Computing infrastructure
• Bench science is changing month-to-month ...
• ... while our IT infrastructure only gets refreshed every
2-7 years
‣ We have to design systems TODAY that can
support unknown research requirements &
workﬂows over many years (gulp ...)
13

The Central Problem Is ...

‣ The easy period is over
‣ 5 years ago we could toss
inexpensive storage and
servers at the problem;
even in a nearby closet or
under a lab bench if
necessary
‣ That does not work any
more; real solutions
required

14

And a related problem ...

‣ It has never been easier to
acquire vast amounts of data
cheaply and easily
‣ Growth rate of data creation/
ingest exceeds rate at which
the storage industry is
improving disk capacity
‣ Not just a storage lifecycle
problem. This data *moves*
and often needs to be shared
among multiple entities and
providers
• ... ideally without punching holes in
your ﬁrewall or consuming all
available internet bandwidth
16

If you get it wrong ...

‣ Lost opportunity
‣ Missing capability
‣ Frustrated & very vocal scientiﬁc staff
‣ Problems in recruiting, retention,
publication & product development

17

Enough groundwork. Lets Talk Trends*

18

Topic: DevOps & Org Charts

19

The social contract between
scientist and IT is changing forever

20

You can blame “the cloud” for this

21

DevOps & Scriptable Everything

‣ On (real) clouds,
EVERYTHING has an
API
‣ If it’s got an API you can
automate and
orchestrate it
‣ “scriptable datacenters”
are now a very real thing

22

DevOps & Scriptable Everything

‣ Incredible innovation in
the past few years
‣ Driven mainly by
companies with
massive internet
‘ﬂeets’ to manage
‣ ... but the beneﬁts
trickle down to us little
people
23

DevOps will conquer the enterprise

‣ Over the past few years
cloud automation/
orchestration methods
have been trickling
down into our local
infrastructures
‣ This will have
signiﬁcant impact on
careers, job
descriptions and org
charts
24

Scientist/SysAdmin/Programmer
2013: Continue to blur the lines between all these roles

‣ Radical change in how IT
www.opscode.com
is provisioned, delivered,
managed & supported
• Technology Driver:
Virtualization & Cloud
• Ops Driver:
Conﬁguration Mgmt, Systems
Orchestration & Infrastructure
Automation

‣ SysAdmins & IT staff need
to re-skill and retrain to
stay relevant
25


‣ When everything has an
API ...
‣ ... anything can be
‘orchestrated’ or
‘automated’ remotely
‣ And by the way ...
‣ The APIs (‘knobs &
buttons’) are accessible to
all, not just the bearded
practitioners sitting in that
room next to the datacenter
26


‣ IT jobs, roles and
responsibilities are going
to change signiﬁcantly
‣ SysAdmins must learn to
program in order to
harness automation tools
‣ Programmers &
Scientists can now self-
provision and control
sophisticated IT
resources
27


‣ My take on the future ...
• SysAdmins (Windows & Linux) who
can’t code will have career issues
• Far more control is going into the
hands of the research end user
• IT support roles will radically change
-- no longer owners or gatekeepers
‣ IT will “own” policies,
procedures, reference patterns,
identity mgmt, security & best
practices
‣ Research will control the
“what”, “when” and “how big”
28

Topic: Facility Observations

29

Facility 1: Enterprise vs Shadow IT

‣ Marked difference in the
types of facilities we’ve
been working in
‣ Discovery Research
systems are ﬁrmly
embedded in the
enterprise datacenter
‣ ... moving away from “wild
west” unchaperoned
locations and mini-
facilities
30

Facility 2: Colo Suites for R&D

‣ Marked increase in use of commercial colocation
facilities for R&D systems
• And they’ve noticed!
- Markly Group (One Summer) has a booth
- Sabey is on this afternoon’s NYGenome panel

‣ Potential reasons:
• Expensive to build high-density hosting at small scale
• Easier metro networking to link remote users/sites
• Direct connect to cloud provider(s)
• High-speed research nets only a cross-connect away

31

Facility 3: Some really old stuff ...

‣ Final facility observation
‣ Average age of infrastructure we work on seems to be
increasing
‣ ... very few aggressive 2-year refresh cycles these days
‣ Potential reasons
• Recession & consolidation still effecting or deferring major
technology upgrades and changes
• Cloud: local upgrades deferred pending strategic cloud decisions
• Cloud: economic analysis showing stark truth that local setups
need to be run efﬁciently and at high utilization in order to justify
existence

32

Facility 3: Virtualization

‣ Every HPC environment
we’ve worked on since
2011has included (or
plans to include) a local
virtualization environment
• True for big systems: 2k
cores / 2 petabyte disk
• True for small systems: 96
core CompChem cluster

‣ Unlikely to change; too
many advantages

33

Facility 3: Virtualization

‣ HPC + Virtualization solves a lot of problems
• Deals with valid biz/scientiﬁc need for researchers to
run/own/manage their own servers ‘near’ HPC stack
‣ Solves a ton of research IT support issues
• Or at least leaves us a clear boundary line
‣ Lets us obtain useful “cloud” features without
choking on endless BS shoveled at us by
“private cloud” vendors
• Example: Server Catalogs + Self-service Provisioning

34

Compute:

‣ Still feels like a solved
problem in 2013
‣ Compute power is a
commodity
• Inexpensive relative to other
costs
• Far less vendor differentiation
than storage
• Easy to acquire; easy to
deploy
36

Compute: Fat Nodes
Fat nodes are wiping out small and midsized clusters

‣ This box has 64 CPU Cores
• ... and up to 1TB of RAM
‣ Fantastic Genomics/
Chemistry system
• A 256GB RAM version only
costs $13,000*
‣ BioIT Homework:
• Go visit the Sillicon Mechanics
booth and ﬁnd out the current
cost of a box with 1TB RAM
37

Possibly the most signiﬁcant ’13 compute trend

38

Compute: Local Disk is Back
Defensive hedge against Big Data / HDFS

‣ We’ve started to see organizations move
away from blade servers and 1U pizza box
enclosures for HPC
‣ The “new normal” may be 4U enclosures
with massive local disk spindles - not
occupied, just available
‣ Why? Hadoop & Big Data
‣ This is a defensive hedge against future
HDFS or similar requirements
• Remember the ‘meta’ problem - science is
changing far faster than we can refresh IT. This
is a defensive future-prooﬁng play.
‣ Hardcore Hadoop rigs sometimes operate
at 1:1 ratio between core count and disk
count
39

Network:

‣ 10 Gigabit Ethernet still the
standard
• ... although not as pervasive as I
predicted in prior trend talks

‣ Non-Cisco options attractive
• BioIT homework: listen to the Arista
talks and visit their booth.

‣ SDN still more hype than reality
in our market
• May not see it until next round of
large private cloud rollouts or new
facility construction (if even)
41

Network:

‣ Inﬁniband for message passing
in decline
• Still see it for comp chem, modeling &
structure work; Started building such
a system last week
• Still see it for parallel and clustered
storage
• Decline seems to match decreasing
popularity of MPI for latest generation
of informatics and ‘omics tools

‣ Hadoop / HDFS seems to favor
throughput and bandwidth over
latency
42

Storage

‣ Still the biggest expense, biggest headache and scariest
systems to design in modern life science informatics
environments
‣ Most of my slides for last year’s trends talk focused on
storage & data lifecycle issues
• Check http://slideshare.net/chrisdag/ if you want to see what I’ve said
in the past
• Dag accuracy check: It was great yesterday to see DataDirect talking
about the KVM hypervisor running on their storage shelves! I’m
convinced more and more apps will run directly on storage in the future
‣ ... not doing that this year. The core problems and common
approaches are largely unchanged and don’t need to be
restated
44

It’s 2013, we know what questions to ask of our storage

45

NGS new data generation: 6-month window

Data like this lets us make realistic capacity planning and purchase decisions

46

Storage: 2013

‣ Advice: Stay on top of the
“compute nodes with
many disks” trends.
‣ HDFS if suddenly required
by your scientists can be
painful to deploy in a
standard scale-out NAS
environment

47

Storage: 2013

‣ Object Storage is
getting interesting

48

Storage: 2013
Object Storage + Commodity Disk Pods

‣ Object storage is far more approachable
• ... used to see it in proprietary solutions for speciﬁc niche needs
• potentially on it’s way to the mainstream now
‣ Why?
• Beneﬁts are compelling across a wide variety of interesting use cases
• Amazon S3 showed what a globe-spanning general purpose object
store could do; this is starting to convince developers & ISVs to modify
their software to support it
• www.swiftstack.com and others are making local object stores easy,
inexpensive and approachable on commodity gear
• Most of your Tier1 storage and server vendors have a fully supported
object store stack they can sell to you (or simply enable in a product
you already have deployed in-house)
49

Remember this disruptive technology example from last year?

50

100 Terabytes for $12,000
(more info: http://biote.am/8p )

51

Storage: 2013

‣ There are MANY reasons why you should
not build that $12K backblaze pod
• ... done wrong you will potentially inconvenience
researchers, lose critical scientiﬁc information and
(probably) lose your job
‣ Inexpensive or open source object storage
software makes the ultra-cheap storage
pod concept viable

52

Storage: 2013

‣ A single unit like this is risky and should only
be used for well known and scoped use cases.
Risks generally outweigh the disruptive price
advantage
‣ However ...
‣ What if you had 3+ of these units running an
object store stack with automatic triple
location replication, recovery and self-healing?
• Then things get interesting
• This is one of the ‘lab’ projects I hope to work on in ’13
53

Storage: 2013

‣ Caveat/Warning
• The 2013 editions of “backblaze-like” enclosures mitigate
many of the earlier availability, operational and reliability
concerns
• Still a aggressive play that carries risk in exchange for a
disruptive price point
‣ There is a middle ground
• Lots of action in the ZFS space with safer & more mainstream
enclosures
• BioIT Homework: Visit the Silicon Mechanics booth and
check out what they are doing with Nexenta’s Open Storage
stuff.
54

Can you do a Bio-IT talk without using the ‘C’ word?

56

Cloud: 2013

‣ Our core advice remains the same
‣ What’s changed

57

Cloud: 2013
Core Advice

‣ Research Organizations need a cloud
strategy today
• Those that don’t will be bypassed by frustrated
users
‣ IaaS cloud services are only a departmental
credit card away ... and some senior
scientists are too big to be ﬁred for violating
IT policy

58

Cloud Advice
Design Patterns

‣ You actually need three tested cloud design
patterns:

‣ (1) To handle ‘legacy’ scientiﬁc apps & workﬂows
‣ (2) The special stuff that is worth re-architecting
‣ (3) Hadoop & big data analytics

59

Cloud Advice
Legacy HPC on the Cloud

‣ MIT StarCluster
• http://web.mit.edu/star/cluster/
‣ This is your baseline
‣ Extend as needed

60

Cloud Advice
“Cloudy” HPC

‣ Some of our research workﬂows are important
enough to be rewritten for “the cloud” and the
advantages that a truly elastic & API-driven
infrastructure can deliver
‣ This is where you have the most freedom
‣ Many published best practices you can borrow
‣ Warning: Cloud vendor lock-in potential is
strongest here
61

Hadoop & “Big Data”
What you need to know

‣ “Hadoop” and “Big Data” are now general
terms
‣ You need to drill down to ﬁnd out what people
actually mean
‣ We are still in the period where senior
leadership may demand “Hadoop” or “BigData”
capability without any actual business or
scientiﬁc need
62

Hadoop & “Big Data”
What you need to know

‣ In broad terms you can break “Big Data” down into two
very basic use cases:
1. Compute: Hadoop can be used as a very powerful
platform for the analysis of very large data sets. The
google search term here is “map reduce”
2. Data Stores: Hadoop is driving the development of very
sophisticated “no-SQL” “non-Relational” databases and
data query engines. The google search terms include
“nosql”, “couchdb”, “hive”, “pig” & “mongodb”, etc.
‣ Your job is to ﬁgure out which type applies for the
groups requesting “Hadoop” or “BigData” capability

63

Cloud: 2013
What has changed ..

‣ Lets revisit some of my bile from prior years
‣ “... private clouds: still utter crap”
‣ “... some AWS competitors are delusional
pretenders”
‣ “... AWS has a multi-year lead on the
competition”

64

Private Clouds in 2013:

‣ I’m no longer dismissing them as “utter crap”
‣ Usable & useful in certain situations
‣ BioTeam positive experiences with OpenStack
‣ Hype vs. Reality ratio still wacky
‣ Sensible only for certain shops
• Have you seen what you have to do
to your networks & gear?
‣ Still important to remain cynical and perform proper due dillegenge

Non-AWS IaaS in 2013

‣ Three main drivers for BioTeam’s evolving IaaS practices and thinking
for 2013:
‣ (1) Real world success with OpenStack & BT
‣ (2) Real world success with Google Compute
‣ (3) Real world multi-cloud DevOps
‣ Just to remain honest though:
• AWS still has multi-year lead in product, service and features
• .. and many novel capabilities
• But some of the competition has some interesting beneﬁts that AWS can’t match

BioTeam, BT & OpenStack

‣ We’ve been working with BT for a while now on
various projects
‣ BT Cloud using OpenStack under the hood with some
really nice architecture and operational features
‣ BioTeam developed a Chef-based HPC clustering
stack and other tools that are currently being used by
BT customers
• ... some of whom have spoken openly at this meeting

BioTeam & Google Compute Engine

‣ We can’t even get into the preview program
‣ But one of our customers did
‣ ... and we’ve been able to do some successful and
interesting stuff
• Without changing operations or DevOps tools our client is capable of
running both on AWS and Google Compute
• For this client and a few other use cases we believe we can span both
clouds or construct architectures that would enable fast and relatively
friction-free transitions

Chef, AWS, OpenStack & Google
Wrapping up ...

‣ 2012 was the 1st year we did real work spanning multiple
IaaS cloud platforms or at least replicating workloads on
multiple platforms
‣ We’ve learned a lot - I think this may result in some
interesting talks at next year’s Bio-IT meeting
- By BioTeam and actual end-users

‣ What makes this all possible is the DevOps / Orchestration
stuff mentioned at the beginning of this presentation.

end; Thanks!
Slides: http://slideshare.net/chrisdag/
70

Trends from DevOps to Blurring IT Roles

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Trends from DevOps to Blurring IT Roles

Similar to Trends from DevOps to Blurring IT Roles (20)

Recently uploaded

Recently uploaded (20)

Trends from DevOps to Blurring IT Roles