2014 BioIT World - Trends from the trenches - Annual presentation

1
Trends from the trenches: 2014
slideshare.net/chrisdag/ chris@bioteam.net @chris_dag #BioIT14
Wednesday, April 30, 14

2
I’m Chris.
I’m an infrastructure geek.
I work for the BioTeam.

Apologies in advance
3
If you have not heard me speak ...
‣ ‘Infamous’ for speaking
very fast and carrying a
huge slide deck
‣ In 2014 CHI ﬁnally gave
up and just gave me a
60min talk slot
‣ Aiming to end with
enough time for
questions & discussions
By the time you see this slide
I’ll be on my ~4th espresso

Who, What, Why ...
4
BioTeam
‣ Independent consulting shop
‣ Staﬀed by scientists forced to
learn IT, SW & HPC to get our
own research done
‣ 12+ years bridging the “gap”
between science, IT & high
performance computing
‣ Our wide-ranging work is what
gets us invited to speak at
events like this ...

5
Why I do this talk every year ...
‣ Bioteam works for
everyone
• Pharma, Biotech, EDU,
Nonproﬁt, .Gov, etc.
‣ We get to see how
groups of smart people
approach similar
problems
‣ We can speak honestly &
objectively about what
we see “in the real
world”

Listen to me at your own risk
6
Standard Disclaimer
‣ I’m not an expert, pundit,
visionary or “thought leader”
‣ There are ~2000 smart people
at this event; I don’t presume to
speak for us as a whole
‣ All career success entirely due
to shamelessly copying what
actual smart people do
‣ I’m biased, burnt-out & cynical
‣ Filter my words accordingly

7
What’s new?
What’s new?
I’ve seen your slides before. <yawn>

aka ‘spreading the blame ...’
8
What’s new 1: Acknowledgements
‣ This talk used to be
made in a vacuum
each year
• ... often mere minutes
before the scheduled talk
time
‣ Not this year
• Heavily inﬂuenced by
peer group of smarter
people who get chatty
when given beer
‣ Non-comprehensive
blame gang:
• Ari Berman
• Aaron Gardner
• Adam Kraut
• Chris Botka (Harvard)
• Chris Dwan (Broad)
• James Cuff (Harvard)
• ... many more ...

What has not changed in recent talks
Not new 2: Recycled Content
‣ The core Bio-IT ‘meta’
issue remains unchanged
‣ Minor updates to report
for cloud landscape
‣ Compute landscape
largely unchanged
• ... a few updates to share in
this space but nothing earth
shattering
9

10
Why are we all here?

11
The #1 ‘meta issue’ is unchanged in 2014

12
It’s a risky time to be doing Bio-IT

13
Meta: Science evolving faster than IT
can refresh infrastructure & practices

This is what keeps Bio-IT folks up at night
The Central Problem Is ...
‣ Instrumentation & protocols are changing FAR
FASTER than we can refresh our Research-IT &
Scientiﬁc Computing infrastructure
• Bench science is changing month-to-month ...
• ... while our IT infrastructure only gets refreshed every
2-7 years
‣ Our job is to design systems TODAY that can
support unknown research requirements &
workﬂows over multi-year spans (gulp ...)
14

‣ The easy period is over
‣ 5 years ago we could toss
inexpensive storage and
servers at the problem;
even in a nearby closet or
under a lab bench if
necessary
‣ That does not work any
more; real solutions
required
15

16
This is our “new normal” for informatics

17
‣ Lab technology is being
refreshed, upgraded and
replaced at an
astonishing rate
• Bigger, faster, parallel
• Requiring increasingly
sophisticated IT support
• Cheap and easily obtainable

18
‣ ... and IT still being
caught by surprise in
2014
• Procurement practices and
cheaper instrument prices
result in situations where IT is
bypassed or not consulted in
advance

True Story - 48 Hours Ago
19

A conversation with a client
Just 48 hours ago ...
‣ Scientists tell IT that they
are getting a new PacBio
sequencing platform
• Gave IT a 5-node cluster
quote that PacBio provided
as blueprint for SMRT Portal
• Wanted conﬁrmation that
everything was cool with IT
support
20

A conversation with a client
Just 48 hours ago ...
‣ Partial “Minor” Issue List:
• Scientists had no clue about power
requirements. A pair of 60amp 220v
power outlets = multi-month facility
project
• ... assumed IT would be cool
accepting and supporting a one-off
HPC system sized for 1 instrument &
1 workgroup
• ... also appeared to believe that
storage was inﬁnite and free. At
least that is what their budget
assumed.
21

One more thing ...
22

We can’t blame the science/lab side for everything
One more thing ...
‣ Can’t blame the lab-side for all our woes
‣ IT innovation is causing headaches in research
and program management
‣ Grant funding agencies, regulatory rules and
internal risk/program management practices
not updated to reﬂect current and emerging IT
capabilities, architectures & practices
• Rules & policies often simply do not cover what we are
capable of doing right now
23

24
A related problem ...

This also hurts ...
‣ It has never been easier to
acquire vast amounts of data
cheaply and easily
‣ Growth rate of data creation/
ingest exceeds rate at which
the storage industry is
improving disk capacity
‣ Not just a storage lifecycle
problem. This data *moves*
and often needs to be shared
among multiple entities and
providers
• ... ideally without punching holes in
your ﬁrewall or consuming all
available internet bandwidth
25

The future is not looking pretty for the ill prepared
26

High Costs For Getting It Wrong
‣ Lost opportunity
‣ Missing capability
‣ Frustrated & very vocal scientiﬁc staﬀ
‣ Problems in recruiting, retention,
publication & product development
27

28
Enough groundwork. Lets Talk Trends

29
Trends: DevOps & Org Charts

30
The social contract between
scientist and IT is changing forever

31
You can blame “the cloud” for this

32
DevOps & Scriptable Everything
‣ On (real) clouds,
EVERYTHING has an
API
‣ If it’s got an API you can
automate and
orchestrate it
‣ “scriptable datacenters”
are now a very real thing

33
DevOps & Scriptable Everything
‣ Incredible innovation in
the past few years
‣ Driven mainly by
companies with
massive internet
‘ﬂeets’ to manage
‣ ... but the beneﬁts
trickle down to us
mere mortals

34
DevOps will conquer the enterprise
‣ Over the past few years
cloud automation/
orchestration methods
have been trickling
down into our local
infrastructures
‣ This will have
signiﬁcant impact on
careers, job
descriptions and org
charts

2014: Continue to blur the lines between all these roles
35
Scientist/SysAdmin/Programmer
‣ Radical change in how IT is
provisioned, delivered,
managed & supported
• Technology Driver:
Virtualization & Cloud
• Ops Driver:
Conﬁguration Mgmt, Systems
Orchestration & Infrastructure
Automation
‣ SysAdmins & IT staﬀ need to
re-skill and retrain to stay
relevant
www.opscode.com

36
‣ When everything has an
API ...
‣ ... anything can be
‘orchestrated’ or
‘automated’ remotely
‣ And by the way ...
‣ The APIs (‘knobs &
buttons’) are accessible to
all, not just the expert
practitioners sitting in that
room next to the
datacenter

37
‣ IT jobs, roles and
responsibilities are
changing
‣ SysAdmins must learn to
program in order to
harness automation tools
‣ Programmers &
Scientists can now self-
provision and control
sophisticated IT
resources

38
‣ My take on the future ...
• SysAdmins (Windows & Linux) who
can’t code will have career issues
• Far more control is going into the
hands of the research end user
• IT support roles will radically change
-- no longer owners or gatekeepers
‣ IT will “own” policies,
procedures, reference patterns,
identity mgmt, security & best
practices
‣ Research will control the
“what”, “when” and “how big”

2014 Summary
Trend: DevOps & Automation
‣ Almost every HPC project (all sizes) BioTeam worked
on in 2014 included
• A bare-metal OS provisioning service (Cobbler, etc.)
• A ‘next-gen’ configuration management service (Chef, Puppet,
Saltstack, etc.)
‣ Gut feeling: This is going to be very useful for
regulated environments
• Not BS or empty hype: IT infrastructure and server/OS/service
configuration encoded as text files
• Easy to version control, audit, revert, rebuild, verify and fold into
existing change management & documentation systems
39

40
Trends: Compute

Compute related design patterns largely static
41
Core Compute
‣ Linux compute clusters
are still the baseline
compute platform
‣ Even our lab instruments
know how to submit jobs
to common HPC cluster
schedulers
‣ Compute is not hard. It’s a
commodity that is easy to
acquire & deploy in 2014

Defensive hedge against Big Data / HDFS
42
Compute: Local Disk Matters
‣ This slide is from 2013; trend is
continuing
‣ The “new normal” may be 4U enclosures
with massive local disk spindles - not
occupied, just available
‣ Why? Hadoop & Big Data
‣ This is a defensive hedge against future
HDFS or similar requirements
• Remember the ‘meta’ problem - science is
changing far faster than we can refresh IT. This
is a defensive future-prooﬁng play.
‣ Hardcore Hadoop rigs sometimes
operate at 1:1 ratio between core count
and disk count

Faster networks are driving compute config changes
43
Compute: NICs and Disks
‣ One pain point for me in 2013-2014:
• Network links to my nodes are getting
faster
• It’s embarrassing my disks are slower
than the network feeding them
• Need to be careful about selecting and
configuring high speed NICs
- Example: that dual-port 10Gig card may
not actually be able to drive both ports if
the card was engineered for an
active:passive link failover scenario
• Also need to re-visit local disk
configurations

New and refreshed HPC systems running many node types
44
Compute: Huge trend in ‘diversity’
‣ Accelerated trend since at least 2012 ...
• HPC compute resources no longer homogenous; many
types and ﬂavors now deployed in single HPC stacks
‣ Newer clusters mix-and-match to match the
known use cases:
• GPU nodes for compute
• GPU nodes for visualization
• Large memory nodes (512GB +)
• Very Large memory nodes (1TB +)
• ‘Fat’ nodes with many CPU cores
• ‘Thin’ nodes with super-fast CPUs
• Analytic nodes with SSD, FusionIO, ﬂash or large local
disk for ‘big data’ tasks

GPUs, Coprocessors & FPGAs
45
Compute: Hardware Acceleration
‣ Specialized hardware
acceleration has it’s place
but will not take over the
world
• “... the activation energy required
for a scientist to use this stuff is
generally quite high ...”
‣ GPU, Phi and FPGA best
used in large scale pipelines
or as speciﬁc solution to a
singular pain point

Compute: Big Data & Analytics
‣ BioTeam is starting to build
“Big Data” labs and
environments for clients
‣ The most interesting trend:
• We are not designing for speciﬁc
analytic use cases; in most projects
are are adding in basic “capabilities”
with the expectation that the apps
and users will come later
• ... defensive IT hedge against
rapidly changing science
requirements, remember?
46

Compute: Big Data & Analytics
‣ This translates to infrastructure designed
to support certain capabilities rather than
speciﬁc software or application.
‣ Example:
• Beefy HDFS friendly servers
• 100% bare metal provisioning and dynamic
system reconﬁguration
• Systems for ingest
• Very large RAM systems
• Big PCIx bus systems
• Memory-resident database systems
• Mix of very fast and capacity optimized storage
• Very fast core, top-of-rack and server networking
47

Also known as hybrid clouds
Emerging Trend: Hybrid HPC
‣ No longer “utter crap” or “cynical
vendor-supported reference case”
• small local footprint
• large, dynamic, scalable, orchestrated
public cloud component
‣ DevOps is key to making this work
‣ High-speed network to public cloud
required
‣ Software interface layer acting as the
mediator between local and public
resources
‣ Good for tight budgets, has to be
done right to work
‣ Still best approached very carefully
48

BioIT World Homework
‣ We’ve got interesting hardware vendors on the
show ﬂoor this week; check them out
• Silicon Mechanics, Thinkmate, Microway:
cool commodity
• Intel, IBM, Dell, SGI: Large & enterprise
• Timelogic: hardware acceleration
• ...
49

50
Trends: Network

51
Big trouble ahead ...

52
Network: Speed @ Core and Edge
‣ Huge potential pain point
‣ May surpass storage as our #1
infrastructure headache
‣ Petascale data is useless if you
can’t move it or access it fast
enough
‣ Don’t be smug about 10 Gigabit
- folks need to start thinking
*now* about 40 and even 100
Gigabit Ethernet
‣ You may need 10Gig to some
desktops for data ingest/export

53
Network: Speed @ Core and Edge
‣ Remember ~2004 when
research storage
requirements started to dwarf
what the enterprise was
using?
‣ Same thing is happening now
for networking
‣ Research core, edge and top-
of-rack networking speeds
may exceed what the rest of
the organization has
standardized on

Massive data movement needs are driving innovation pain
This is going to be painful
‣ Enterprise networking folks
are even more aloof than
storage admins we battled in
’04
‣ Often used to driving
requirements and methods;
unhappy when science starts
to drive them out of their
comfort zones
‣ Research needs to start
pushing harder and faster for
network speeds above 10GbE
• This will take a long time so start
now!
54

Not sure how this will play out
‣ It will be interesting to see what large-scale data
movement does to our local infrastructure and
desktop experience
‣ Especially with other trends like BYOD
‣ My $.02
• Speeds to our desktops are going get very fast, or
• We give up on growing massive bandwidth to the client
and embrace a full VDI model where the users just
“remote desktop” into a well-networked scientiﬁc
informatics environment
55

‣ Visit the Internet2 booth to chat high speed
networking
• Ask about their free or low-cost training events and
technical workshops; start thinking about how you can
get your internal networking teams/leadership to attend
• Ask them about the new trend of private/corporate links
into I2 and other fast research networks
‣ Arista is here. Talking and exhibiting. They are
not Cisco. Listen, visit & talk to them.
56

Signiﬁcant new trend in networking
Science DMZs
57

It’s real and becoming necessary
Network: ‘ScienceDMZ’
‣ BioTeam building them in 2014 and beyond
‣ Central premise:
• Legacy firewall, network and security methods
architected for “many small data flows” use cases
• Not built to handle smaller #s of massive
data flows
• Also very hard to deploy ‘traditional’ security gear
on 10Gigabit and faster networks
‣ More details, background & documents at
http://fasterdata.es.net/science-dmz/
58
Background
traffic or
competing bursts
DTN traffic with
wire-speed
bursts
10GE
10GE
10GE

‣ Start thinking/discussing this sooner rather
than later
‣ Just like “the cloud” this may fundamentally
change internal operations and technology
‣ Will also require conscious buy-in and
support from senior network, security and
risk management professionals
• ... these talks take time. Best to plan ahead
59

‣ A Science DMZ has 3 required components:
1. Very fast “low-friction” network links and paths with
security policy and enforcement specific to scientific
workflows
2. Dedicated, high performance data transfer nodes
(“DTNs”) highly optimized for high speed data xfer
3. Dedicated network performance/measurement nodes
60

‣ Implementation speciﬁcs are complex; the
basic concept is not:
1. Research need to move scientiﬁc data at high speeds
is already being negatively affected by networks not
designed for this requirement
2. Likely to force fundamental changes in core enterprise
architectures on a similar disruptive scale as what
genome data storage forced in ~2004
3. Firewalls/IDS and security in particular will be affected
61

62
Simple Science DMZ:
Image source: “The Science DMZ: Introduction & Architecture” -- esnet

‣ My gut feeling:
1. The fanciest and most complex Science DMZ architectures in the literature right
now are not suitable for our world
• Expensive specialized equipment; Expensive specialist staff expertise required
• Often still experimental, not something enterprise IT would want to drop into a
production environment
2. Science DMZ concepts are sound and simple implementations are possible today
3. Start small:
• Incorporate these sorts of concepts/ideas into long term planning ASAP
• Start adding network performance monitoring nodes to research networks, DMZs and
external circuit connections now; this entire concept falls over without actionable ﬂow
and performance data
• Start work on policies and procedures for manual bypass of ﬁrewall/IDS rules when
known sender/receivers are freighting high speed data; automation comes later!
63

‣ Bookmark http://fasterdata.es.net and check
out the published materials and advice
‣ Monitor http://www.oinworkshop.com/ to see
when a workshop/event may be coming near
you (send your networking people ...)
‣ Both ESNet and Internet2 run training and
technical workshops that deliver far more value
for price than the usual training junkets
64

Check out this talk
‣ Track 1 - 3:10pm today:
• Christian Todorov talks “Accelerating Biomedical
Research Discovery: The 100G Internet2 Network – Built
and Engineered for the Most Demanding Big Data
Science Collaborations”
65

Not very signiﬁcant trend in 2014:
Software Deﬁned Networking (“SDN”)
66

More hype than useful reality at the moment
67
Network: SDN Hype vs. Reality
‣ Software Deﬁned Networking (“SDN”) is
the new buzzword
‣ It WILL become pervasive and will
change how we build and architect things
‣ But ...
‣ Not hugely practical at the moment for
most environments
• We need far more than APIs that control port
forwarding behavior on switches
• More time needed for all of the related
technologies and methods to coalesce into
something broadly useful and usable

More hype than useful reality at the moment
68
Network: SDN
‣ My gut feeling:
• It is the future but right now we are still in the
“mostly empty hype” phase if you wanna be
cynical about it; best to wait and watch
• Production enterprise use: OpenFlow
and similar stuff does not provide value
relative to implementation effort right now
• Best bang for the buck in ’14 will be getting
‘SDN’ features as part of some other
supported stack
- OpenStack, VMWare, Cloud, etc.

69
Trends: Storage

70
Storage
‣ Still the biggest expense, biggest headache and
scariest systems to design in modern life science
informatics environments
‣ Many of the pain points we’ve talked about for years
are still in place:
• Explosive growth forcing tradeoffs in capacity over performance
• Lots of monolithic single tiers of storage
• Critical need to actively manage data through it’s full life cycle
(just storing data is not enough ...)
• Need for post-POSIX solutions such as iRODS and other
metadata-aware data repositories

71
Storage Trends
‣ The large but monolithic storage platforms we’ve
built up over the years are no longer sufficient
• Do you know how many people are running a single large
scale-out NAS or parallel filesystem? Most of us!
‣ Tiered storage is now an absolute requirement
• At a minimum we need an active storage tier plus
something far cheaper/deeper for cold files
‣ Expect the tiers to involve multiple vendors,
products and technologies
• The Tier1 storage vendors tend to have higher-end pricing
for their “all in one” tiered data management solutions

72
Storage - The Old Way
‣ Single tier of scale-out NAS or parallel FS
‣ Why?
• Suitable for broadest set of use cases
• Easy to procure/integrate
• Lowest administrative & operational burden
‣ Example:
• 400TB - 1PB of ‘something’ stores ‘everything’

73
Storage - The New Way
‣ Multiple tiers; potentially from multiple vendors
‣ Why?
• Way more cost efficient (size the tier to the need)
• Single tier no longer capable of supporting all use cases and
workflow patterns
• Single tiers waste incredible money at large scale
‣ Example:
• 10-40 TB SSD/Flash for ingest & IOPS-sensitive workloads
• 50-400 TB tier (SATA/SAS/SSD mix) for active processing
• Multi-petabyte tier (Cloud, Object, SATA) for cost and operationally
efficient long term (yet reachable) storage of scientific data at rest

Sticking 100% with Tier 1 vendors gets expensive
74
Storage: Disruptive stuﬀ ahead
‣ BioTeam has built 1Petabyte ZFS-based storage pools from
commodity whitebox kit for about ~$100,000 in direct hardware
costs (engineering eﬀort & admin not included in this price ...)
‣ There are many storage vendors in the middle tier who can
provide storage systems that are less ‘risky’ than DIY
homebuilt setups yet far less expensive than the traditional
Tier 1 enterprise storage options
• Several of these vendors are here at the show!
‣ Companies like Avere Systems are producing boxes that unify
disparate storage tiers and link them to cloud and object
stores
• This is a route to unifying “tier 1” storage with the “cheap & deep” storage

Inﬁnidat aka http://izbox.com
The new thumper.
‣ 1 petabyte usable NAS
shipped as a single
integrated rack
• List price: $500 per usable
terabyte
‣ More expensive than DIY
ZFS on commodity
chassis but less
expensive than current
mainstream products
‣ Lots of interesting use
cases for ‘cheap & deep’
75

Avere Systems
Wait, I can DO that?
‣ These folks caught my eye in late 2013 for
one very speciﬁc use case
‣ Since then I keep them in mind for 4-5
common problems I regularly face
‣ It can:
• Add performance layer on top of storage bought
to be “cheap & deep”
• Virtualize many NAS islands into a single
namespace
• Replicate & move data between tiers and sites
• Act as CIFS/NFS gateway to on-premise or
offsite object stores ***
• Treat Amazon S3 and Glacier as simply another
storage tier fully integrated into your environment
76

Object Storage
‣ Object storage is the future for scientific data at rest
• Total no brainer; makes more sense than the “files and
folders” paradigm, especially for automated analysis
• Plus Amazon does it for super cheap
‣ But ... There will be a long transition period due to all
of our legacy codes and workflows
• This is where gateway devices can play
‣ It can:
• Provide a much better workflow design pattern than
assuming “files and folders” data storage
• Save millions of dollars via efficiencies of erasure coding
• Provide a much more robust and resilient peta-scale storage
framework
• Hide behind a metadata-aware layer such as IRODS to
provide very interesting capabilities
77

Object Storage
‣ Erasure coding distributed
object stores are very interesting
at peta-scale ...
‣ Think about how you would
handle & replicate 20 petabytes
of data the “traditional way”
• Purchase 2x or 3x storage capacity to
handle replication overhead
• Ignore the nightmare scenario of
having to restore from one of the
distributed replicas
78

Object Storage
‣ Eﬃciencies of erasure coding allow
for LESS raw disk to be distributed
across MORE geographic sites
‣ End result is a “single” usable
system that tolerant to the failure
of an entire datacenter/site
‣ For the 20 petabyte problem
instead of purchasing 2x disk you
buy ~1.8x and use the capex
savings to add an extra colo facility
or increase WAN link speed
79

Exercise
‣ Pick a storage size that make sense for you (100TB or
1PB suggested)
‣ Visit the various storage vendors on the show ﬂoor and
price out what 100TB or 1PB would cost
‣ You will see an awesome diversity of products,
performance, features and capabilities at various price
points
• DO NOT ﬁxate on price alone. This is a mistake.
‣ This is REALLY worth doing - there is incredible
diversity in the mix of price/features/performance/
capability out there
80

Check out these booths
‣ Object storage:
• Amplidata & CleverSafe
‣ Glue/Gateway/Acceleration:
• Avere Systems
‣ Enterprise:
• EMC Isilon, IBM, Dell, SGI, Hitachi, Panasas
‣ Mid-tier/Commodity:
• Silicon Mechanics, Thinkmate, RAID Inc., Xyratex
81

Check out these talks
‣ Track 5 - noon today:
• Aaron Gardner talks “Taming big scientiﬁc data growth with
converged infrastructure”
• Jacob Farmer talks “Bridging the Worlds of Files, Objects,
NAS, and Cloud: A Blazing Fast Crash Course in Object
Storage”
• Dirk Petersen talks “ Deploying Very Low Cost Cloud Storage
Technology in a Traditional Research HPC Environment
82

83
Can you do a Bio-IT talk without using the ‘C’ word?

84
Cloud: 2014
‣ Core advice remains the same
‣ A few new permutations ...

Core Advice
85
Cloud: 2014
‣ Research Organizations need a cloud
strategy today yesterday
• Those that don’t will be bypassed by frustrated
users or sneaky “cloud aware” devices
‣ IaaS cloud services are only a departmental
credit card away ... some senior scientists
are too big to be ﬁred for violating IT policy
‣ Instrument vendors are forcing the issue
‣ Storage vendors are forcing the issue

Design Patterns
86
Cloud Advice
‣ We actually need several tested cloud
design patterns:
‣ (1) To handle ‘legacy’ scientific apps & workflows
‣ (2) The special stuff that is worth re-architecting
‣ (3) Hadoop & big data analytics
‣ ... and maybe (4) Regulated/sensitive efforts...
‣ ...and maybe (5) a way to evaluate Commercial
solutions

Legacy HPC on the Cloud
87
Cloud Advice
‣ MIT StarCluster
• http://star.mit.edu/cluster/
• This is your baseline
• Extend as needed
‣ Also check out Univa
• Commercially supported Grid Engine
stack with compelling roadmap and
native cloud capabilities

“Cloudy” HPC
88
Cloud Advice
‣ Some of our research workﬂows are important
enough to be rewritten for “the cloud” and the
advantages that a truly elastic & API-driven
infrastructure can deliver
‣ This is where you have the most freedom
‣ Many published best practices you can borrow
‣ Warning: Cloud vendor lock-in potential is
strongest here

What has changed ..
Cloud: 2014
‣ Lets revisit some of my bile from prior years
‣ “... private clouds: still utter crap”
‣ “... some AWS competitors are delusional
pretenders”
‣ “... AWS has a multi-year lead on the
competition”
89

Private Clouds in 2014:
‣ I’m no longer dismissing them as “utter crap”
• However it is a lot of work and money to build a system that only has 5% of the
features that AWS can deliver today (for a cheaper price). Need to be careful
about the use case, justiﬁcation and operational/development burden.
‣ Usable & useful in certain situations
‣ BioTeam positive experiences with OpenStack
‣ Starting to see OpenStack pilots among our clients
‣ Hype vs. Reality ratio still wacky
‣ Sensible only for certain shops
• Have you seen what you have to do
to your networks & gear?
‣ Still important to remain cynical and perform proper due diligence

Not all AWS competitors are delusional
‣ Google Compute is viable in 2014 for scientific workflows
• Compute/Memory: Late start into IaaS means CPUs and memory are current generation; we have
‘war stories’ from AWS users who probe /proc/cpuinfo on EC2 servers so they can instantly kill any
instance running on older chipsets
• Price: Competitive on price although the shooting war between IaaS providers means it is hard to
pin down the current “winner”; The “sustained use” pricing is easier to navigate than AWS Reserved
Instances. Overall AWS pricing algorithms for various services seem more complicated than Google
equivalents.
• Network performance: Fantastic networking and excellent performance/latency figures between
regions and zones. VPC type features are baked into the default resource set
• Ops: Priced in 1min increments; no more need to hunt and kill servers at 55 min past the hour.
Google has a concept of “Projects” with assigned collaborators and quotas. Quite different from the
AWS account structure and IAM-based access control model. Project-based paradigm easier to
think about for scientific use case.
• IaaS Building Blocks: Still far fewer features than AWS but the core building blocks that we need
for science and engineering workflows are present.
‣ My $.02
• AWS is still the clear leader but Google Compute is now a viable option and it is worth ‘kicking the
tires’ in 2014 and beyond ... to me AWS has had no serious competition until now

Cloud Science Facilitators
‣ Cycle Computing is legit
• They’ve proven themselves
on some of largest IaaS HPC
grids ever built
• Experience with hybrid
systems (cloud & premise)
‣ Smart people. Nice
people.
‣ They have a booth, stop
by and chat them up ...

93
The road ahead ...

This has been a slow moving trend for years now ...
94
POSIX Alternatives Coming
‣ The scope of organizations faced with
the limitations of POSIX ﬁlesystem will
continue to expand
‣ We desperately need some sort of
“metadata aware” data management
solution in life science
‣ Nobody has an easy solution yet;
several bespoke installations but no
clear mass-market options
‣ IRODS front-ending “cheap & deep”
storage tiers or object stores appears
to be gaining signiﬁcant interest out in
our community

Application Containers are getting interesting
95
Watch out for: Containerization
‣ Application containerization via methods like
http://docker.io gaining signiﬁcant attention
• Docker support now in native RHEL kernel
• AWS Elastic Beanstalk recently added Docker
support
‣ If broadly adopted, these techniques will
stretch research IT infrastructures in
interesting directions
• This is far more interesting to me than moving virtual
machines around a network or into the cloud
‣ ... with a related impact on storage location,
features & capability
‣ Major new news and progress expected in
2014

96
Keep an eye on: Storage
‣ Data generation out-pacing
technology
‣ Really interesting disruptive
stuﬀ on the market now
‣ Cheap/easy laboratory
assays taking over
• Researchers largely don’t know
what to do with it all
• Holding on to the data until
someone ﬁgures it out
• This will cause some interesting
headaches for IT
• Huge need for real “Big Data”
applications to be developed

97
Keep an eye on: Networking
‣ Unless there’s an investment
in ultra-high speed
networking, need to change
thought on analysis
‣ Data commons are becoming
a precedent
• Need to minimize the movement of
data
• Include compute power and
analysis platform with data
commons
‣ Move the analysis to the data,
don’t move the data
• Requires sharing/Large core
institutional resources

98
Long term trends ...
‣ Compute continues to become easier
‣ Data movement and ingest (physical & network)
gets harder
‣ Cost of storage will be dwarfed by “cost of
managing stored data”
‣ We can see end-of-life for our current IT
architecture and design patterns; new patterns
will start to appear over next 2-5 years

99
Wrap-up: Final Advice & Tips

Embrace The Innovation
100
Ending Advice: 1 of 5
‣ Understand the ‘interesting times’ we are in
• Science is changing faster than we can refresh IT
• This is not going to change any time soon
‣ Advice:
• Spend as much time thinking about future ﬂexibility as
you spend on actual current needs & requirements
• Design for agility & responsiveness

Capacity
101
‣ Many of us will need ‘petabyte capable’ storage
‣ However:
• Only some of us will ever have 1PB+ under management
• The hard part is knowing whom that will be

Tiers are in your future
102
‣ Tiers are now a requirement, at least long-term
• At a minimum we need an ‘active’ tier for processing &
ingest
• ... and some sort of inexpensive cold/nearline/archive
option as well
‣ Advice:
• It’s OK to buy a single block/tier of disk
• ... but always have a strategy for diversiﬁcation

103
‣ Above a certain scale, inefficient data management
& simple storage practices are hugely wasteful
‣ Advice:
• The cost of a new hire “data manager” or curator role may
be cheaper and far more beneficial to your organization than
continuing to throw CapEx dollars at keeping a badly run
storage platform under it’s capacity limit
• Many opportunities to get clever & recapture efficiency &
capability: tiers, replication, cloud, dedupe, CRAM
compression, iRODS
• BROADEN YOUR PERSPECTIVE

104
‣ You need a cloud strategy. Yesterday.
- Users, instrument makers & IT vendors are forcing the issue
- Economic trends indicate cloud storage is inescapable
- 90% of cloud is “easy”. Remaining 10% takes time & effort
‣ Advice:
• The technical aspects of using “the cloud” are trivial
• The political, policy and risk management aspects are
difﬁcult and time consuming; start these ASAP

105
end; Thanks!
slideshare.net/chrisdag/ chris@bioteam.net @chris_dag #BioIT14

2014 BioIT World - Trends from the trenches - Annual presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to 2014 BioIT World - Trends from the trenches - Annual presentation

Similar to 2014 BioIT World - Trends from the trenches - Annual presentation (20)

More from Chris Dagdigian

More from Chris Dagdigian (8)

Recently uploaded

Recently uploaded (20)

2014 BioIT World - Trends from the trenches - Annual presentation