SlideShare a Scribd company logo
1 of 105
1
Trends from the trenches: 2014
slideshare.net/chrisdag/ chris@bioteam.net @chris_dag #BioIT14
Wednesday, April 30, 14
2
I’m Chris.
I’m an infrastructure geek.
I work for the BioTeam.
Wednesday, April 30, 14
Apologies in advance
3
If you have not heard me speak ...
‣ ‘Infamous’ for speaking
very fast and carrying a
huge slide deck
‣ In 2014 CHI finally gave
up and just gave me a
60min talk slot
‣ Aiming to end with
enough time for
questions & discussions
By the time you see this slide
I’ll be on my ~4th espresso
Wednesday, April 30, 14
Who, What, Why ...
4
BioTeam
‣ Independent consulting shop
‣ Staffed by scientists forced to
learn IT, SW & HPC to get our
own research done
‣ 12+ years bridging the “gap”
between science, IT & high
performance computing
‣ Our wide-ranging work is what
gets us invited to speak at
events like this ...
Wednesday, April 30, 14
5
Why I do this talk every year ...
‣ Bioteam works for
everyone
• Pharma, Biotech, EDU,
Nonprofit, .Gov, etc.
‣ We get to see how
groups of smart people
approach similar
problems
‣ We can speak honestly &
objectively about what
we see “in the real
world”
Wednesday, April 30, 14
Listen to me at your own risk
6
Standard Disclaimer
‣ I’m not an expert, pundit,
visionary or “thought leader”
‣ There are ~2000 smart people
at this event; I don’t presume to
speak for us as a whole
‣ All career success entirely due
to shamelessly copying what
actual smart people do
‣ I’m biased, burnt-out & cynical
‣ Filter my words accordingly
Wednesday, April 30, 14
7
What’s new?
What’s new?
I’ve seen your slides before. <yawn>
Wednesday, April 30, 14
aka ‘spreading the blame ...’
8
What’s new 1: Acknowledgements
‣ This talk used to be
made in a vacuum
each year
• ... often mere minutes
before the scheduled talk
time
‣ Not this year
• Heavily influenced by
peer group of smarter
people who get chatty
when given beer
‣ Non-comprehensive
blame gang:
• Ari Berman
• Aaron Gardner
• Adam Kraut
• Chris Botka (Harvard)
• Chris Dwan (Broad)
• James Cuff (Harvard)
• ... many more ...
Wednesday, April 30, 14
What has not changed in recent talks
Not new 2: Recycled Content
‣ The core Bio-IT ‘meta’
issue remains unchanged
‣ Minor updates to report
for cloud landscape
‣ Compute landscape
largely unchanged
• ... a few updates to share in
this space but nothing earth
shattering
9
Wednesday, April 30, 14
10
Why are we all here?
Wednesday, April 30, 14
11
The #1 ‘meta issue’ is unchanged in 2014
Wednesday, April 30, 14
12
It’s a risky time to be doing Bio-IT
Wednesday, April 30, 14
13
Meta: Science evolving faster than IT
can refresh infrastructure & practices
Wednesday, April 30, 14
This is what keeps Bio-IT folks up at night
The Central Problem Is ...
‣ Instrumentation & protocols are changing FAR
FASTER than we can refresh our Research-IT &
Scientific Computing infrastructure
• Bench science is changing month-to-month ...
• ... while our IT infrastructure only gets refreshed every
2-7 years
‣ Our job is to design systems TODAY that can
support unknown research requirements &
workflows over multi-year spans (gulp ...)
14
Wednesday, April 30, 14
The Central Problem Is ...
‣ The easy period is over
‣ 5 years ago we could toss
inexpensive storage and
servers at the problem;
even in a nearby closet or
under a lab bench if
necessary
‣ That does not work any
more; real solutions
required
15
Wednesday, April 30, 14
16
This is our “new normal” for informatics
Wednesday, April 30, 14
17
The Central Problem Is ...
‣ Lab technology is being
refreshed, upgraded and
replaced at an
astonishing rate
• Bigger, faster, parallel
• Requiring increasingly
sophisticated IT support
• Cheap and easily obtainable
Wednesday, April 30, 14
18
The Central Problem Is ...
‣ ... and IT still being
caught by surprise in
2014
• Procurement practices and
cheaper instrument prices
result in situations where IT is
bypassed or not consulted in
advance
Wednesday, April 30, 14
True Story - 48 Hours Ago
19
Wednesday, April 30, 14
A conversation with a client
Just 48 hours ago ...
‣ Scientists tell IT that they
are getting a new PacBio
sequencing platform
• Gave IT a 5-node cluster
quote that PacBio provided
as blueprint for SMRT Portal
• Wanted confirmation that
everything was cool with IT
support
20
Wednesday, April 30, 14
A conversation with a client
Just 48 hours ago ...
‣ Partial “Minor” Issue List:
• Scientists had no clue about power
requirements. A pair of 60amp 220v
power outlets = multi-month facility
project
• ... assumed IT would be cool
accepting and supporting a one-off
HPC system sized for 1 instrument &
1 workgroup
• ... also appeared to believe that
storage was infinite and free. At
least that is what their budget
assumed.
21
Wednesday, April 30, 14
One more thing ...
22
Wednesday, April 30, 14
We can’t blame the science/lab side for everything
One more thing ...
‣ Can’t blame the lab-side for all our woes
‣ IT innovation is causing headaches in research
and program management
‣ Grant funding agencies, regulatory rules and
internal risk/program management practices
not updated to reflect current and emerging IT
capabilities, architectures & practices
• Rules & policies often simply do not cover what we are
capable of doing right now
23
Wednesday, April 30, 14
24
A related problem ...
Wednesday, April 30, 14
This also hurts ...
‣ It has never been easier to
acquire vast amounts of data
cheaply and easily
‣ Growth rate of data creation/
ingest exceeds rate at which
the storage industry is
improving disk capacity
‣ Not just a storage lifecycle
problem. This data *moves*
and often needs to be shared
among multiple entities and
providers
• ... ideally without punching holes in
your firewall or consuming all
available internet bandwidth
25
Wednesday, April 30, 14
The future is not looking pretty for the ill prepared
26
Wednesday, April 30, 14
High Costs For Getting It Wrong
‣ Lost opportunity
‣ Missing capability
‣ Frustrated & very vocal scientific staff
‣ Problems in recruiting, retention,
publication & product development
27
Wednesday, April 30, 14
28
Enough groundwork. Lets Talk Trends
Wednesday, April 30, 14
29
Trends: DevOps & Org Charts
Wednesday, April 30, 14
30
The social contract between
scientist and IT is changing forever
Wednesday, April 30, 14
31
You can blame “the cloud” for this
Wednesday, April 30, 14
32
DevOps & Scriptable Everything
‣ On (real) clouds,
EVERYTHING has an
API
‣ If it’s got an API you can
automate and
orchestrate it
‣ “scriptable datacenters”
are now a very real thing
Wednesday, April 30, 14
33
DevOps & Scriptable Everything
‣ Incredible innovation in
the past few years
‣ Driven mainly by
companies with
massive internet
‘fleets’ to manage
‣ ... but the benefits
trickle down to us
mere mortals
Wednesday, April 30, 14
34
DevOps will conquer the enterprise
‣ Over the past few years
cloud automation/
orchestration methods
have been trickling
down into our local
infrastructures
‣ This will have
significant impact on
careers, job
descriptions and org
charts
Wednesday, April 30, 14
2014: Continue to blur the lines between all these roles
35
Scientist/SysAdmin/Programmer
‣ Radical change in how IT is
provisioned, delivered,
managed & supported
• Technology Driver:
Virtualization & Cloud
• Ops Driver:
Configuration Mgmt, Systems
Orchestration & Infrastructure
Automation
‣ SysAdmins & IT staff need to
re-skill and retrain to stay
relevant
www.opscode.com
Wednesday, April 30, 14
2014: Continue to blur the lines between all these roles
36
Scientist/SysAdmin/Programmer
‣ When everything has an
API ...
‣ ... anything can be
‘orchestrated’ or
‘automated’ remotely
‣ And by the way ...
‣ The APIs (‘knobs &
buttons’) are accessible to
all, not just the expert
practitioners sitting in that
room next to the
datacenter
Wednesday, April 30, 14
2014: Continue to blur the lines between all these roles
37
Scientist/SysAdmin/Programmer
‣ IT jobs, roles and
responsibilities are
changing
‣ SysAdmins must learn to
program in order to
harness automation tools
‣ Programmers &
Scientists can now self-
provision and control
sophisticated IT
resources
Wednesday, April 30, 14
2014: Continue to blur the lines between all these roles
38
Scientist/SysAdmin/Programmer
‣ My take on the future ...
• SysAdmins (Windows & Linux) who
can’t code will have career issues
• Far more control is going into the
hands of the research end user
• IT support roles will radically change
-- no longer owners or gatekeepers
‣ IT will “own” policies,
procedures, reference patterns,
identity mgmt, security & best
practices
‣ Research will control the
“what”, “when” and “how big”
Wednesday, April 30, 14
2014 Summary
Trend: DevOps & Automation
‣ Almost every HPC project (all sizes) BioTeam worked
on in 2014 included
• A bare-metal OS provisioning service (Cobbler, etc.)
• A ‘next-gen’ configuration management service (Chef, Puppet,
Saltstack, etc.)
‣ Gut feeling: This is going to be very useful for
regulated environments
• Not BS or empty hype: IT infrastructure and server/OS/service
configuration encoded as text files
• Easy to version control, audit, revert, rebuild, verify and fold into
existing change management & documentation systems
39
Wednesday, April 30, 14
40
Trends: Compute
Wednesday, April 30, 14
Compute related design patterns largely static
41
Core Compute
‣ Linux compute clusters
are still the baseline
compute platform
‣ Even our lab instruments
know how to submit jobs
to common HPC cluster
schedulers
‣ Compute is not hard. It’s a
commodity that is easy to
acquire & deploy in 2014
Wednesday, April 30, 14
Defensive hedge against Big Data / HDFS
42
Compute: Local Disk Matters
‣ This slide is from 2013; trend is
continuing
‣ The “new normal” may be 4U enclosures
with massive local disk spindles - not
occupied, just available
‣ Why? Hadoop & Big Data
‣ This is a defensive hedge against future
HDFS or similar requirements
• Remember the ‘meta’ problem - science is
changing far faster than we can refresh IT. This
is a defensive future-proofing play.
‣ Hardcore Hadoop rigs sometimes
operate at 1:1 ratio between core count
and disk count
Wednesday, April 30, 14
Faster networks are driving compute config changes
43
Compute: NICs and Disks
‣ One pain point for me in 2013-2014:
• Network links to my nodes are getting
faster
• It’s embarrassing my disks are slower
than the network feeding them
• Need to be careful about selecting and
configuring high speed NICs
- Example: that dual-port 10Gig card may
not actually be able to drive both ports if
the card was engineered for an
active:passive link failover scenario
• Also need to re-visit local disk
configurations
Wednesday, April 30, 14
New and refreshed HPC systems running many node types
44
Compute: Huge trend in ‘diversity’
‣ Accelerated trend since at least 2012 ...
• HPC compute resources no longer homogenous; many
types and flavors now deployed in single HPC stacks
‣ Newer clusters mix-and-match to match the
known use cases:
• GPU nodes for compute
• GPU nodes for visualization
• Large memory nodes (512GB +)
• Very Large memory nodes (1TB +)
• ‘Fat’ nodes with many CPU cores
• ‘Thin’ nodes with super-fast CPUs
• Analytic nodes with SSD, FusionIO, flash or large local
disk for ‘big data’ tasks
Wednesday, April 30, 14
GPUs, Coprocessors & FPGAs
45
Compute: Hardware Acceleration
‣ Specialized hardware
acceleration has it’s place
but will not take over the
world
• “... the activation energy required
for a scientist to use this stuff is
generally quite high ...”
‣ GPU, Phi and FPGA best
used in large scale pipelines
or as specific solution to a
singular pain point
Wednesday, April 30, 14
Compute: Big Data & Analytics
‣ BioTeam is starting to build
“Big Data” labs and
environments for clients
‣ The most interesting trend:
• We are not designing for specific
analytic use cases; in most projects
are are adding in basic “capabilities”
with the expectation that the apps
and users will come later
• ... defensive IT hedge against
rapidly changing science
requirements, remember?
46
Wednesday, April 30, 14
Compute: Big Data & Analytics
‣ This translates to infrastructure designed
to support certain capabilities rather than
specific software or application.
‣ Example:
• Beefy HDFS friendly servers
• 100% bare metal provisioning and dynamic
system reconfiguration
• Systems for ingest
• Very large RAM systems
• Big PCIx bus systems
• Memory-resident database systems
• Mix of very fast and capacity optimized storage
• Very fast core, top-of-rack and server networking
47
Wednesday, April 30, 14
Also known as hybrid clouds
Emerging Trend: Hybrid HPC
‣ No longer “utter crap” or “cynical
vendor-supported reference case”
• small local footprint
• large, dynamic, scalable, orchestrated
public cloud component
‣ DevOps is key to making this work
‣ High-speed network to public cloud
required
‣ Software interface layer acting as the
mediator between local and public
resources
‣ Good for tight budgets, has to be
done right to work
‣ Still best approached very carefully
48
Wednesday, April 30, 14
BioIT World Homework
‣ We’ve got interesting hardware vendors on the
show floor this week; check them out
• Silicon Mechanics, Thinkmate, Microway:
cool commodity
• Intel, IBM, Dell, SGI: Large & enterprise
• Timelogic: hardware acceleration
• ...
49
Wednesday, April 30, 14
50
Trends: Network
Wednesday, April 30, 14
51
Big trouble ahead ...
Wednesday, April 30, 14
52
Network: Speed @ Core and Edge
‣ Huge potential pain point
‣ May surpass storage as our #1
infrastructure headache
‣ Petascale data is useless if you
can’t move it or access it fast
enough
‣ Don’t be smug about 10 Gigabit
- folks need to start thinking
*now* about 40 and even 100
Gigabit Ethernet
‣ You may need 10Gig to some
desktops for data ingest/export
Wednesday, April 30, 14
53
Network: Speed @ Core and Edge
‣ Remember ~2004 when
research storage
requirements started to dwarf
what the enterprise was
using?
‣ Same thing is happening now
for networking
‣ Research core, edge and top-
of-rack networking speeds
may exceed what the rest of
the organization has
standardized on
Wednesday, April 30, 14
Massive data movement needs are driving innovation pain
This is going to be painful
‣ Enterprise networking folks
are even more aloof than
storage admins we battled in
’04
‣ Often used to driving
requirements and methods;
unhappy when science starts
to drive them out of their
comfort zones
‣ Research needs to start
pushing harder and faster for
network speeds above 10GbE
• This will take a long time so start
now!
54
Wednesday, April 30, 14
Not sure how this will play out
‣ It will be interesting to see what large-scale data
movement does to our local infrastructure and
desktop experience
‣ Especially with other trends like BYOD
‣ My $.02
• Speeds to our desktops are going get very fast, or
• We give up on growing massive bandwidth to the client
and embrace a full VDI model where the users just
“remote desktop” into a well-networked scientific
informatics environment
55
Wednesday, April 30, 14
BioIT World Homework
‣ Visit the Internet2 booth to chat high speed
networking
• Ask about their free or low-cost training events and
technical workshops; start thinking about how you can
get your internal networking teams/leadership to attend
• Ask them about the new trend of private/corporate links
into I2 and other fast research networks
‣ Arista is here. Talking and exhibiting. They are
not Cisco. Listen, visit & talk to them.
56
Wednesday, April 30, 14
Significant new trend in networking
Science DMZs
57
Wednesday, April 30, 14
It’s real and becoming necessary
Network: ‘ScienceDMZ’
‣ BioTeam building them in 2014 and beyond
‣ Central premise:
• Legacy firewall, network and security methods
architected for “many small data flows” use cases
• Not built to handle smaller #s of massive
data flows
• Also very hard to deploy ‘traditional’ security gear
on 10Gigabit and faster networks
‣ More details, background & documents at
http://fasterdata.es.net/science-dmz/
58
Background
traffic or
competing bursts
DTN traffic with
wire-speed
bursts
10GE
10GE
10GE
Wednesday, April 30, 14
Network: ‘ScienceDMZ’
‣ Start thinking/discussing this sooner rather
than later
‣ Just like “the cloud” this may fundamentally
change internal operations and technology
‣ Will also require conscious buy-in and
support from senior network, security and
risk management professionals
• ... these talks take time. Best to plan ahead
59
Wednesday, April 30, 14
Network: ‘ScienceDMZ’
‣ A Science DMZ has 3 required components:
1. Very fast “low-friction” network links and paths with
security policy and enforcement specific to scientific
workflows
2. Dedicated, high performance data transfer nodes
(“DTNs”) highly optimized for high speed data xfer
3. Dedicated network performance/measurement nodes
60
Wednesday, April 30, 14
Network: ‘ScienceDMZ’
‣ Implementation specifics are complex; the
basic concept is not:
1. Research need to move scientific data at high speeds
is already being negatively affected by networks not
designed for this requirement
2. Likely to force fundamental changes in core enterprise
architectures on a similar disruptive scale as what
genome data storage forced in ~2004
3. Firewalls/IDS and security in particular will be affected
61
Wednesday, April 30, 14
62
Simple Science DMZ:
Image source: “The Science DMZ: Introduction & Architecture” -- esnet
Wednesday, April 30, 14
Network: ‘ScienceDMZ’
‣ My gut feeling:
1. The fanciest and most complex Science DMZ architectures in the literature right
now are not suitable for our world
• Expensive specialized equipment; Expensive specialist staff expertise required
• Often still experimental, not something enterprise IT would want to drop into a
production environment
2. Science DMZ concepts are sound and simple implementations are possible today
3. Start small:
• Incorporate these sorts of concepts/ideas into long term planning ASAP
• Start adding network performance monitoring nodes to research networks, DMZs and
external circuit connections now; this entire concept falls over without actionable flow
and performance data
• Start work on policies and procedures for manual bypass of firewall/IDS rules when
known sender/receivers are freighting high speed data; automation comes later!
63
Wednesday, April 30, 14
BioIT World Homework
‣ Bookmark http://fasterdata.es.net and check
out the published materials and advice
‣ Monitor http://www.oinworkshop.com/ to see
when a workshop/event may be coming near
you (send your networking people ...)
‣ Both ESNet and Internet2 run training and
technical workshops that deliver far more value
for price than the usual training junkets
64
Wednesday, April 30, 14
Check out this talk
BioIT World Homework
‣ Track 1 - 3:10pm today:
• Christian Todorov talks “Accelerating Biomedical
Research Discovery: The 100G Internet2 Network – Built
and Engineered for the Most Demanding Big Data
Science Collaborations”
65
Wednesday, April 30, 14
Not very significant trend in 2014:
Software Defined Networking (“SDN”)
66
Wednesday, April 30, 14
More hype than useful reality at the moment
67
Network: SDN Hype vs. Reality
‣ Software Defined Networking (“SDN”) is
the new buzzword
‣ It WILL become pervasive and will
change how we build and architect things
‣ But ...
‣ Not hugely practical at the moment for
most environments
• We need far more than APIs that control port
forwarding behavior on switches
• More time needed for all of the related
technologies and methods to coalesce into
something broadly useful and usable
Wednesday, April 30, 14
More hype than useful reality at the moment
68
Network: SDN
‣ My gut feeling:
• It is the future but right now we are still in the
“mostly empty hype” phase if you wanna be
cynical about it; best to wait and watch
• Production enterprise use: OpenFlow
and similar stuff does not provide value
relative to implementation effort right now
• Best bang for the buck in ’14 will be getting
‘SDN’ features as part of some other
supported stack
- OpenStack, VMWare, Cloud, etc.
Wednesday, April 30, 14
69
Trends: Storage
Wednesday, April 30, 14
70
Storage
‣ Still the biggest expense, biggest headache and
scariest systems to design in modern life science
informatics environments
‣ Many of the pain points we’ve talked about for years
are still in place:
• Explosive growth forcing tradeoffs in capacity over performance
• Lots of monolithic single tiers of storage
• Critical need to actively manage data through it’s full life cycle
(just storing data is not enough ...)
• Need for post-POSIX solutions such as iRODS and other
metadata-aware data repositories
Wednesday, April 30, 14
71
Storage Trends
‣ The large but monolithic storage platforms we’ve
built up over the years are no longer sufficient
• Do you know how many people are running a single large
scale-out NAS or parallel filesystem? Most of us!
‣ Tiered storage is now an absolute requirement
• At a minimum we need an active storage tier plus
something far cheaper/deeper for cold files
‣ Expect the tiers to involve multiple vendors,
products and technologies
• The Tier1 storage vendors tend to have higher-end pricing
for their “all in one” tiered data management solutions
Wednesday, April 30, 14
72
Storage - The Old Way
‣ Single tier of scale-out NAS or parallel FS
‣ Why?
• Suitable for broadest set of use cases
• Easy to procure/integrate
• Lowest administrative & operational burden
‣ Example:
• 400TB - 1PB of ‘something’ stores ‘everything’
Wednesday, April 30, 14
73
Storage - The New Way
‣ Multiple tiers; potentially from multiple vendors
‣ Why?
• Way more cost efficient (size the tier to the need)
• Single tier no longer capable of supporting all use cases and
workflow patterns
• Single tiers waste incredible money at large scale
‣ Example:
• 10-40 TB SSD/Flash for ingest & IOPS-sensitive workloads
• 50-400 TB tier (SATA/SAS/SSD mix) for active processing
• Multi-petabyte tier (Cloud, Object, SATA) for cost and operationally
efficient long term (yet reachable) storage of scientific data at rest
Wednesday, April 30, 14
Sticking 100% with Tier 1 vendors gets expensive
74
Storage: Disruptive stuff ahead
‣ BioTeam has built 1Petabyte ZFS-based storage pools from
commodity whitebox kit for about ~$100,000 in direct hardware
costs (engineering effort & admin not included in this price ...)
‣ There are many storage vendors in the middle tier who can
provide storage systems that are less ‘risky’ than DIY
homebuilt setups yet far less expensive than the traditional
Tier 1 enterprise storage options
• Several of these vendors are here at the show!
‣ Companies like Avere Systems are producing boxes that unify
disparate storage tiers and link them to cloud and object
stores
• This is a route to unifying “tier 1” storage with the “cheap & deep” storage
Wednesday, April 30, 14
Infinidat aka http://izbox.com
The new thumper.
‣ 1 petabyte usable NAS
shipped as a single
integrated rack
• List price: $500 per usable
terabyte
‣ More expensive than DIY
ZFS on commodity
chassis but less
expensive than current
mainstream products
‣ Lots of interesting use
cases for ‘cheap & deep’
75
Wednesday, April 30, 14
Avere Systems
Wait, I can DO that?
‣ These folks caught my eye in late 2013 for
one very specific use case
‣ Since then I keep them in mind for 4-5
common problems I regularly face
‣ It can:
• Add performance layer on top of storage bought
to be “cheap & deep”
• Virtualize many NAS islands into a single
namespace
• Replicate & move data between tiers and sites
• Act as CIFS/NFS gateway to on-premise or
offsite object stores ***
• Treat Amazon S3 and Glacier as simply another
storage tier fully integrated into your environment
76
Wednesday, April 30, 14
Object Storage
‣ Object storage is the future for scientific data at rest
• Total no brainer; makes more sense than the “files and
folders” paradigm, especially for automated analysis
• Plus Amazon does it for super cheap
‣ But ... There will be a long transition period due to all
of our legacy codes and workflows
• This is where gateway devices can play
‣ It can:
• Provide a much better workflow design pattern than
assuming “files and folders” data storage
• Save millions of dollars via efficiencies of erasure coding
• Provide a much more robust and resilient peta-scale storage
framework
• Hide behind a metadata-aware layer such as IRODS to
provide very interesting capabilities
77
Wednesday, April 30, 14
Object Storage
‣ Erasure coding distributed
object stores are very interesting
at peta-scale ...
‣ Think about how you would
handle & replicate 20 petabytes
of data the “traditional way”
• Purchase 2x or 3x storage capacity to
handle replication overhead
• Ignore the nightmare scenario of
having to restore from one of the
distributed replicas
78
Wednesday, April 30, 14
Object Storage
‣ Efficiencies of erasure coding allow
for LESS raw disk to be distributed
across MORE geographic sites
‣ End result is a “single” usable
system that tolerant to the failure
of an entire datacenter/site
‣ For the 20 petabyte problem
instead of purchasing 2x disk you
buy ~1.8x and use the capex
savings to add an extra colo facility
or increase WAN link speed
79
Wednesday, April 30, 14
Exercise
BioIT World Homework
‣ Pick a storage size that make sense for you (100TB or
1PB suggested)
‣ Visit the various storage vendors on the show floor and
price out what 100TB or 1PB would cost
‣ You will see an awesome diversity of products,
performance, features and capabilities at various price
points
• DO NOT fixate on price alone. This is a mistake.
‣ This is REALLY worth doing - there is incredible
diversity in the mix of price/features/performance/
capability out there
80
Wednesday, April 30, 14
Check out these booths
BioIT World Homework
‣ Object storage:
• Amplidata & CleverSafe
‣ Glue/Gateway/Acceleration:
• Avere Systems
‣ Enterprise:
• EMC Isilon, IBM, Dell, SGI, Hitachi, Panasas
‣ Mid-tier/Commodity:
• Silicon Mechanics, Thinkmate, RAID Inc., Xyratex
81
Wednesday, April 30, 14
Check out these talks
BioIT World Homework
‣ Track 5 - noon today:
• Aaron Gardner talks “Taming big scientific data growth with
converged infrastructure”
‣ Track 1 - 2:55pm today:
• Jacob Farmer talks “Bridging the Worlds of Files, Objects,
NAS, and Cloud: A Blazing Fast Crash Course in Object
Storage”
‣ Track 1 - 4:30pm today:
• Dirk Petersen talks “ Deploying Very Low Cost Cloud Storage
Technology in a Traditional Research HPC Environment
82
Wednesday, April 30, 14
83
Can you do a Bio-IT talk without using the ‘C’ word?
Wednesday, April 30, 14
84
Cloud: 2014
‣ Core advice remains the same
‣ A few new permutations ...
Wednesday, April 30, 14
Core Advice
85
Cloud: 2014
‣ Research Organizations need a cloud
strategy today yesterday
• Those that don’t will be bypassed by frustrated
users or sneaky “cloud aware” devices
‣ IaaS cloud services are only a departmental
credit card away ... some senior scientists
are too big to be fired for violating IT policy
‣ Instrument vendors are forcing the issue
‣ Storage vendors are forcing the issue
Wednesday, April 30, 14
Design Patterns
86
Cloud Advice
‣ We actually need several tested cloud
design patterns:
‣ (1) To handle ‘legacy’ scientific apps & workflows
‣ (2) The special stuff that is worth re-architecting
‣ (3) Hadoop & big data analytics
‣ ... and maybe (4) Regulated/sensitive efforts...
‣ ...and maybe (5) a way to evaluate Commercial
solutions
Wednesday, April 30, 14
Legacy HPC on the Cloud
87
Cloud Advice
‣ MIT StarCluster
• http://star.mit.edu/cluster/
• This is your baseline
• Extend as needed
‣ Also check out Univa
• Commercially supported Grid Engine
stack with compelling roadmap and
native cloud capabilities
Wednesday, April 30, 14
“Cloudy” HPC
88
Cloud Advice
‣ Some of our research workflows are important
enough to be rewritten for “the cloud” and the
advantages that a truly elastic & API-driven
infrastructure can deliver
‣ This is where you have the most freedom
‣ Many published best practices you can borrow
‣ Warning: Cloud vendor lock-in potential is
strongest here
Wednesday, April 30, 14
What has changed ..
Cloud: 2014
‣ Lets revisit some of my bile from prior years
‣ “... private clouds: still utter crap”
‣ “... some AWS competitors are delusional
pretenders”
‣ “... AWS has a multi-year lead on the
competition”
89
Wednesday, April 30, 14
Private Clouds in 2014:
‣ I’m no longer dismissing them as “utter crap”
• However it is a lot of work and money to build a system that only has 5% of the
features that AWS can deliver today (for a cheaper price). Need to be careful
about the use case, justification and operational/development burden.
‣ Usable & useful in certain situations
‣ BioTeam positive experiences with OpenStack
‣ Starting to see OpenStack pilots among our clients
‣ Hype vs. Reality ratio still wacky
‣ Sensible only for certain shops
• Have you seen what you have to do
to your networks & gear?
‣ Still important to remain cynical and perform proper due diligence
Wednesday, April 30, 14
Not all AWS competitors are delusional
‣ Google Compute is viable in 2014 for scientific workflows
• Compute/Memory: Late start into IaaS means CPUs and memory are current generation; we have
‘war stories’ from AWS users who probe /proc/cpuinfo on EC2 servers so they can instantly kill any
instance running on older chipsets
• Price: Competitive on price although the shooting war between IaaS providers means it is hard to
pin down the current “winner”; The “sustained use” pricing is easier to navigate than AWS Reserved
Instances. Overall AWS pricing algorithms for various services seem more complicated than Google
equivalents.
• Network performance: Fantastic networking and excellent performance/latency figures between
regions and zones. VPC type features are baked into the default resource set
• Ops: Priced in 1min increments; no more need to hunt and kill servers at 55 min past the hour.
Google has a concept of “Projects” with assigned collaborators and quotas. Quite different from the
AWS account structure and IAM-based access control model. Project-based paradigm easier to
think about for scientific use case.
• IaaS Building Blocks: Still far fewer features than AWS but the core building blocks that we need
for science and engineering workflows are present.
‣ My $.02
• AWS is still the clear leader but Google Compute is now a viable option and it is worth ‘kicking the
tires’ in 2014 and beyond ... to me AWS has had no serious competition until now
Wednesday, April 30, 14
Cloud Science Facilitators
‣ Cycle Computing is legit
• They’ve proven themselves
on some of largest IaaS HPC
grids ever built
• Experience with hybrid
systems (cloud & premise)
‣ Smart people. Nice
people.
‣ They have a booth, stop
by and chat them up ...
Wednesday, April 30, 14
93
The road ahead ...
Wednesday, April 30, 14
This has been a slow moving trend for years now ...
94
POSIX Alternatives Coming
‣ The scope of organizations faced with
the limitations of POSIX filesystem will
continue to expand
‣ We desperately need some sort of
“metadata aware” data management
solution in life science
‣ Nobody has an easy solution yet;
several bespoke installations but no
clear mass-market options
‣ IRODS front-ending “cheap & deep”
storage tiers or object stores appears
to be gaining significant interest out in
our community
Wednesday, April 30, 14
Application Containers are getting interesting
95
Watch out for: Containerization
‣ Application containerization via methods like
http://docker.io gaining significant attention
• Docker support now in native RHEL kernel
• AWS Elastic Beanstalk recently added Docker
support
‣ If broadly adopted, these techniques will
stretch research IT infrastructures in
interesting directions
• This is far more interesting to me than moving virtual
machines around a network or into the cloud
‣ ... with a related impact on storage location,
features & capability
‣ Major new news and progress expected in
2014
Wednesday, April 30, 14
96
Keep an eye on: Storage
‣ Data generation out-pacing
technology
‣ Really interesting disruptive
stuff on the market now
‣ Cheap/easy laboratory
assays taking over
• Researchers largely don’t know
what to do with it all
• Holding on to the data until
someone figures it out
• This will cause some interesting
headaches for IT
• Huge need for real “Big Data”
applications to be developed
Wednesday, April 30, 14
97
Keep an eye on: Networking
‣ Unless there’s an investment
in ultra-high speed
networking, need to change
thought on analysis
‣ Data commons are becoming
a precedent
• Need to minimize the movement of
data
• Include compute power and
analysis platform with data
commons
‣ Move the analysis to the data,
don’t move the data
• Requires sharing/Large core
institutional resources
Wednesday, April 30, 14
98
Long term trends ...
‣ Compute continues to become easier
‣ Data movement and ingest (physical & network)
gets harder
‣ Cost of storage will be dwarfed by “cost of
managing stored data”
‣ We can see end-of-life for our current IT
architecture and design patterns; new patterns
will start to appear over next 2-5 years
Wednesday, April 30, 14
99
Wrap-up: Final Advice & Tips
Wednesday, April 30, 14
Embrace The Innovation
100
Ending Advice: 1 of 5
‣ Understand the ‘interesting times’ we are in
• Science is changing faster than we can refresh IT
• This is not going to change any time soon
‣ Advice:
• Spend as much time thinking about future flexibility as
you spend on actual current needs & requirements
• Design for agility & responsiveness
Wednesday, April 30, 14
Capacity
101
Ending Advice: 2 of 5
‣ Many of us will need ‘petabyte capable’ storage
‣ However:
• Only some of us will ever have 1PB+ under management
• The hard part is knowing whom that will be
Wednesday, April 30, 14
Tiers are in your future
102
Ending Advice: 3 of 5
‣ Tiers are now a requirement, at least long-term
• At a minimum we need an ‘active’ tier for processing &
ingest
• ... and some sort of inexpensive cold/nearline/archive
option as well
‣ Advice:
• It’s OK to buy a single block/tier of disk
• ... but always have a strategy for diversification
Wednesday, April 30, 14
103
Ending Advice: 4 of 5
‣ Above a certain scale, inefficient data management
& simple storage practices are hugely wasteful
‣ Advice:
• The cost of a new hire “data manager” or curator role may
be cheaper and far more beneficial to your organization than
continuing to throw CapEx dollars at keeping a badly run
storage platform under it’s capacity limit
• Many opportunities to get clever & recapture efficiency &
capability: tiers, replication, cloud, dedupe, CRAM
compression, iRODS
• BROADEN YOUR PERSPECTIVE
Wednesday, April 30, 14
104
Ending Advice: 5 of 5
‣ You need a cloud strategy. Yesterday.
- Users, instrument makers & IT vendors are forcing the issue
- Economic trends indicate cloud storage is inescapable
- 90% of cloud is “easy”. Remaining 10% takes time & effort
‣ Advice:
• The technical aspects of using “the cloud” are trivial
• The political, policy and risk management aspects are
difficult and time consuming; start these ASAP
Wednesday, April 30, 14
105
end; Thanks!
slideshare.net/chrisdag/ chris@bioteam.net @chris_dag #BioIT14
Wednesday, April 30, 14

More Related Content

What's hot

Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudChris Dagdigian
 
2013: Trends from the Trenches
2013: Trends from the Trenches2013: Trends from the Trenches
2013: Trends from the TrenchesChris Dagdigian
 
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Chris Dagdigian
 
Multi-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersMulti-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersChris Dagdigian
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it worldChris Dwan
 
BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014Ari Berman
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?mark madsen
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)mark madsen
 
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons LearnedBio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons LearnedChris Dagdigian
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except usmark madsen
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogiesmark madsen
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionmark madsen
 
IT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOsIT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOsVikram Ramesh
 
2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy edition2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy editionChris Dwan
 
The Evolution of Big Data Frameworks
The Evolution of Big Data FrameworksThe Evolution of Big Data Frameworks
The Evolution of Big Data FrameworkseXascale Infolab
 
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)Ioan Toma
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerMicrosoft
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
 
Big data - Aditya Yadav
Big data - Aditya YadavBig data - Aditya Yadav
Big data - Aditya YadavAditya Yadav
 

What's hot (20)

Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the Cloud
 
2013: Trends from the Trenches
2013: Trends from the Trenches2013: Trends from the Trenches
2013: Trends from the Trenches
 
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
 
Multi-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersMulti-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC Clusters
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
 
BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)
 
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons LearnedBio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except us
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogies
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehouse
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collection
 
IT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOsIT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOs
 
2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy edition2019 BioIt World - Post cloud legacy edition
2019 BioIt World - Post cloud legacy edition
 
The Evolution of Big Data Frameworks
The Evolution of Big Data FrameworksThe Evolution of Big Data Frameworks
The Evolution of Big Data Frameworks
 
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
Big data - Aditya Yadav
Big data - Aditya YadavBig data - Aditya Yadav
Big data - Aditya Yadav
 

Viewers also liked

Reference Letter from Albert Allen 20151026154034331
Reference Letter from Albert Allen 20151026154034331Reference Letter from Albert Allen 20151026154034331
Reference Letter from Albert Allen 20151026154034331Art Martinez
 
OgAAAKWcUUnVhnkGX99f7m5rJwXI8IWyPv-WI1Z11mCu3WKjro2xzcwSWGMMsRNPs
OgAAAKWcUUnVhnkGX99f7m5rJwXI8IWyPv-WI1Z11mCu3WKjro2xzcwSWGMMsRNPsOgAAAKWcUUnVhnkGX99f7m5rJwXI8IWyPv-WI1Z11mCu3WKjro2xzcwSWGMMsRNPs
OgAAAKWcUUnVhnkGX99f7m5rJwXI8IWyPv-WI1Z11mCu3WKjro2xzcwSWGMMsRNPszezinhocoimbra
 
Revolución tecnológica sin herramientas
Revolución tecnológica sin herramientas Revolución tecnológica sin herramientas
Revolución tecnológica sin herramientas Concepcion Brito
 
Biología aplicada
Biología aplicadaBiología aplicada
Biología aplicadajesus_782
 
First Nonfiction Reading - Walkthrough
First Nonfiction Reading - WalkthroughFirst Nonfiction Reading - Walkthrough
First Nonfiction Reading - WalkthroughCompass Publishing
 
Makers Go To College - Your Digital Future 2016
Makers Go To College - Your Digital Future 2016Makers Go To College - Your Digital Future 2016
Makers Go To College - Your Digital Future 2016Martin Hamilton
 
Teoria y percepción del color
Teoria y percepción del colorTeoria y percepción del color
Teoria y percepción del colorjesicarivasb
 
Bitsat 2017 schedule
Bitsat 2017 schedule Bitsat 2017 schedule
Bitsat 2017 schedule Shilpa Nupur
 
Proceso Conativo - Volitivo
Proceso Conativo - VolitivoProceso Conativo - Volitivo
Proceso Conativo - VolitivoVictor Nesterez
 
Power circ
Power circPower circ
Power circIngridBP
 

Viewers also liked (12)

Reference Letter from Albert Allen 20151026154034331
Reference Letter from Albert Allen 20151026154034331Reference Letter from Albert Allen 20151026154034331
Reference Letter from Albert Allen 20151026154034331
 
OgAAAKWcUUnVhnkGX99f7m5rJwXI8IWyPv-WI1Z11mCu3WKjro2xzcwSWGMMsRNPs
OgAAAKWcUUnVhnkGX99f7m5rJwXI8IWyPv-WI1Z11mCu3WKjro2xzcwSWGMMsRNPsOgAAAKWcUUnVhnkGX99f7m5rJwXI8IWyPv-WI1Z11mCu3WKjro2xzcwSWGMMsRNPs
OgAAAKWcUUnVhnkGX99f7m5rJwXI8IWyPv-WI1Z11mCu3WKjro2xzcwSWGMMsRNPs
 
Revolución tecnológica sin herramientas
Revolución tecnológica sin herramientas Revolución tecnológica sin herramientas
Revolución tecnológica sin herramientas
 
Biología aplicada
Biología aplicadaBiología aplicada
Biología aplicada
 
4 fragen
4 fragen4 fragen
4 fragen
 
Formula renault 2.0 Fecha 3
Formula renault 2.0 Fecha 3Formula renault 2.0 Fecha 3
Formula renault 2.0 Fecha 3
 
First Nonfiction Reading - Walkthrough
First Nonfiction Reading - WalkthroughFirst Nonfiction Reading - Walkthrough
First Nonfiction Reading - Walkthrough
 
Makers Go To College - Your Digital Future 2016
Makers Go To College - Your Digital Future 2016Makers Go To College - Your Digital Future 2016
Makers Go To College - Your Digital Future 2016
 
Teoria y percepción del color
Teoria y percepción del colorTeoria y percepción del color
Teoria y percepción del color
 
Bitsat 2017 schedule
Bitsat 2017 schedule Bitsat 2017 schedule
Bitsat 2017 schedule
 
Proceso Conativo - Volitivo
Proceso Conativo - VolitivoProceso Conativo - Volitivo
Proceso Conativo - Volitivo
 
Power circ
Power circPower circ
Power circ
 

Similar to 2014 BioIT World - Trends from the trenches - Annual presentation

From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life ...
From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life ...From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life ...
From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life ...Ari Berman
 
Puppet Camp London 2014: Keynote
Puppet Camp London 2014: KeynotePuppet Camp London 2014: Keynote
Puppet Camp London 2014: KeynotePuppet
 
Trends from the Trenches: 2019
Trends from the Trenches: 2019Trends from the Trenches: 2019
Trends from the Trenches: 2019Chris Dagdigian
 
Learning Analytics Toolkit & TinCan/xAPI@Work Proof Of Concept Progress
Learning Analytics Toolkit & TinCan/xAPI@Work Proof Of Concept ProgressLearning Analytics Toolkit & TinCan/xAPI@Work Proof Of Concept Progress
Learning Analytics Toolkit & TinCan/xAPI@Work Proof Of Concept ProgressLearningCafe
 
Continuous Deployment of Clojure Apps
Continuous Deployment of Clojure AppsContinuous Deployment of Clojure Apps
Continuous Deployment of Clojure AppsSiva Jagadeesan
 
Autodiscovery or The long tail of open data
Autodiscovery or The long tail of open dataAutodiscovery or The long tail of open data
Autodiscovery or The long tail of open dataConnected Data World
 
What is Continuous Delivery really?
What is Continuous Delivery really?What is Continuous Delivery really?
What is Continuous Delivery really?XebiaLabs
 
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantLynne Thomas
 
Data Governance: Why, What & How
Data Governance: Why, What & HowData Governance: Why, What & How
Data Governance: Why, What & HowSenturus
 
An End to End Stack for a Container Age - Continuous Delivery London 2016
An End to End Stack for a Container Age - Continuous Delivery London 2016An End to End Stack for a Container Age - Continuous Delivery London 2016
An End to End Stack for a Container Age - Continuous Delivery London 2016Chris Jackson
 
Having presence within the OU brand
Having  presence within the OU brandHaving  presence within the OU brand
Having presence within the OU brandAndrew Smith
 
Culture, Processes and Tools of Continuous Delivery
Culture, Processes and Tools of Continuous DeliveryCulture, Processes and Tools of Continuous Delivery
Culture, Processes and Tools of Continuous DeliveryXebiaLabs
 
Filling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on ChangeFilling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on ChangePERICLES_FP7
 
Would you bet your job on your A/B test results?
Would you bet your job on your A/B test results?Would you bet your job on your A/B test results?
Would you bet your job on your A/B test results?Qubit
 
Using innovative CBT for nationwide educational exams
Using innovative CBT for nationwide educational examsUsing innovative CBT for nationwide educational exams
Using innovative CBT for nationwide educational examsCito
 
Chicago CD Summit: 3 Pillars of Continuous Delivery
Chicago CD Summit: 3 Pillars of Continuous DeliveryChicago CD Summit: 3 Pillars of Continuous Delivery
Chicago CD Summit: 3 Pillars of Continuous DeliveryXebiaLabs
 
No, you don't need to learn python
No, you don't need to learn pythonNo, you don't need to learn python
No, you don't need to learn pythonQuantUniversity
 
High-Performance Networking Use Cases in Life Sciences
High-Performance Networking Use Cases in Life SciencesHigh-Performance Networking Use Cases in Life Sciences
High-Performance Networking Use Cases in Life SciencesAri Berman
 
From the Benchtop to the Datacenter: HPC Requirements in Life Science Research
From the Benchtop to the Datacenter: HPC Requirements in Life Science ResearchFrom the Benchtop to the Datacenter: HPC Requirements in Life Science Research
From the Benchtop to the Datacenter: HPC Requirements in Life Science ResearchAri Berman
 

Similar to 2014 BioIT World - Trends from the trenches - Annual presentation (20)

From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life ...
From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life ...From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life ...
From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life ...
 
Puppet Camp London 2014: Keynote
Puppet Camp London 2014: KeynotePuppet Camp London 2014: Keynote
Puppet Camp London 2014: Keynote
 
Trends from the Trenches: 2019
Trends from the Trenches: 2019Trends from the Trenches: 2019
Trends from the Trenches: 2019
 
Learning Analytics Toolkit & TinCan/xAPI@Work Proof Of Concept Progress
Learning Analytics Toolkit & TinCan/xAPI@Work Proof Of Concept ProgressLearning Analytics Toolkit & TinCan/xAPI@Work Proof Of Concept Progress
Learning Analytics Toolkit & TinCan/xAPI@Work Proof Of Concept Progress
 
Continuous Deployment of Clojure Apps
Continuous Deployment of Clojure AppsContinuous Deployment of Clojure Apps
Continuous Deployment of Clojure Apps
 
Autodiscovery or The long tail of open data
Autodiscovery or The long tail of open dataAutodiscovery or The long tail of open data
Autodiscovery or The long tail of open data
 
What is Continuous Delivery really?
What is Continuous Delivery really?What is Continuous Delivery really?
What is Continuous Delivery really?
 
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
 
Data Governance: Why, What & How
Data Governance: Why, What & HowData Governance: Why, What & How
Data Governance: Why, What & How
 
An End to End Stack for a Container Age - Continuous Delivery London 2016
An End to End Stack for a Container Age - Continuous Delivery London 2016An End to End Stack for a Container Age - Continuous Delivery London 2016
An End to End Stack for a Container Age - Continuous Delivery London 2016
 
Having presence within the OU brand
Having  presence within the OU brandHaving  presence within the OU brand
Having presence within the OU brand
 
Culture, Processes and Tools of Continuous Delivery
Culture, Processes and Tools of Continuous DeliveryCulture, Processes and Tools of Continuous Delivery
Culture, Processes and Tools of Continuous Delivery
 
Filling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on ChangeFilling the Digital Preservation Gap - Acting on Change
Filling the Digital Preservation Gap - Acting on Change
 
Would you bet your job on your A/B test results?
Would you bet your job on your A/B test results?Would you bet your job on your A/B test results?
Would you bet your job on your A/B test results?
 
Using innovative CBT for nationwide educational exams
Using innovative CBT for nationwide educational examsUsing innovative CBT for nationwide educational exams
Using innovative CBT for nationwide educational exams
 
Chicago CD Summit: 3 Pillars of Continuous Delivery
Chicago CD Summit: 3 Pillars of Continuous DeliveryChicago CD Summit: 3 Pillars of Continuous Delivery
Chicago CD Summit: 3 Pillars of Continuous Delivery
 
The future of jobs
The future of jobsThe future of jobs
The future of jobs
 
No, you don't need to learn python
No, you don't need to learn pythonNo, you don't need to learn python
No, you don't need to learn python
 
High-Performance Networking Use Cases in Life Sciences
High-Performance Networking Use Cases in Life SciencesHigh-Performance Networking Use Cases in Life Sciences
High-Performance Networking Use Cases in Life Sciences
 
From the Benchtop to the Datacenter: HPC Requirements in Life Science Research
From the Benchtop to the Datacenter: HPC Requirements in Life Science ResearchFrom the Benchtop to the Datacenter: HPC Requirements in Life Science Research
From the Benchtop to the Datacenter: HPC Requirements in Life Science Research
 

More from Chris Dagdigian

2021 Trends from the Trenches
2021 Trends from the Trenches2021 Trends from the Trenches
2021 Trends from the TrenchesChris Dagdigian
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Chris Dagdigian
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte PushingChris Dagdigian
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Chris Dagdigian
 
AWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating ResearchAWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating ResearchChris Dagdigian
 
Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Chris Dagdigian
 
2012: Trends from the Trenches
2012: Trends from the Trenches2012: Trends from the Trenches
2012: Trends from the TrenchesChris Dagdigian
 
Practical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationPractical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationChris Dagdigian
 

More from Chris Dagdigian (8)

2021 Trends from the Trenches
2021 Trends from the Trenches2021 Trends from the Trenches
2021 Trends from the Trenches
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte Pushing
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
 
AWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating ResearchAWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating Research
 
Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)
 
2012: Trends from the Trenches
2012: Trends from the Trenches2012: Trends from the Trenches
2012: Trends from the Trenches
 
Practical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationPractical Cloud & Workflow Orchestration
Practical Cloud & Workflow Orchestration
 

Recently uploaded

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

2014 BioIT World - Trends from the trenches - Annual presentation

  • 1. 1 Trends from the trenches: 2014 slideshare.net/chrisdag/ chris@bioteam.net @chris_dag #BioIT14 Wednesday, April 30, 14
  • 2. 2 I’m Chris. I’m an infrastructure geek. I work for the BioTeam. Wednesday, April 30, 14
  • 3. Apologies in advance 3 If you have not heard me speak ... ‣ ‘Infamous’ for speaking very fast and carrying a huge slide deck ‣ In 2014 CHI finally gave up and just gave me a 60min talk slot ‣ Aiming to end with enough time for questions & discussions By the time you see this slide I’ll be on my ~4th espresso Wednesday, April 30, 14
  • 4. Who, What, Why ... 4 BioTeam ‣ Independent consulting shop ‣ Staffed by scientists forced to learn IT, SW & HPC to get our own research done ‣ 12+ years bridging the “gap” between science, IT & high performance computing ‣ Our wide-ranging work is what gets us invited to speak at events like this ... Wednesday, April 30, 14
  • 5. 5 Why I do this talk every year ... ‣ Bioteam works for everyone • Pharma, Biotech, EDU, Nonprofit, .Gov, etc. ‣ We get to see how groups of smart people approach similar problems ‣ We can speak honestly & objectively about what we see “in the real world” Wednesday, April 30, 14
  • 6. Listen to me at your own risk 6 Standard Disclaimer ‣ I’m not an expert, pundit, visionary or “thought leader” ‣ There are ~2000 smart people at this event; I don’t presume to speak for us as a whole ‣ All career success entirely due to shamelessly copying what actual smart people do ‣ I’m biased, burnt-out & cynical ‣ Filter my words accordingly Wednesday, April 30, 14
  • 7. 7 What’s new? What’s new? I’ve seen your slides before. <yawn> Wednesday, April 30, 14
  • 8. aka ‘spreading the blame ...’ 8 What’s new 1: Acknowledgements ‣ This talk used to be made in a vacuum each year • ... often mere minutes before the scheduled talk time ‣ Not this year • Heavily influenced by peer group of smarter people who get chatty when given beer ‣ Non-comprehensive blame gang: • Ari Berman • Aaron Gardner • Adam Kraut • Chris Botka (Harvard) • Chris Dwan (Broad) • James Cuff (Harvard) • ... many more ... Wednesday, April 30, 14
  • 9. What has not changed in recent talks Not new 2: Recycled Content ‣ The core Bio-IT ‘meta’ issue remains unchanged ‣ Minor updates to report for cloud landscape ‣ Compute landscape largely unchanged • ... a few updates to share in this space but nothing earth shattering 9 Wednesday, April 30, 14
  • 10. 10 Why are we all here? Wednesday, April 30, 14
  • 11. 11 The #1 ‘meta issue’ is unchanged in 2014 Wednesday, April 30, 14
  • 12. 12 It’s a risky time to be doing Bio-IT Wednesday, April 30, 14
  • 13. 13 Meta: Science evolving faster than IT can refresh infrastructure & practices Wednesday, April 30, 14
  • 14. This is what keeps Bio-IT folks up at night The Central Problem Is ... ‣ Instrumentation & protocols are changing FAR FASTER than we can refresh our Research-IT & Scientific Computing infrastructure • Bench science is changing month-to-month ... • ... while our IT infrastructure only gets refreshed every 2-7 years ‣ Our job is to design systems TODAY that can support unknown research requirements & workflows over multi-year spans (gulp ...) 14 Wednesday, April 30, 14
  • 15. The Central Problem Is ... ‣ The easy period is over ‣ 5 years ago we could toss inexpensive storage and servers at the problem; even in a nearby closet or under a lab bench if necessary ‣ That does not work any more; real solutions required 15 Wednesday, April 30, 14
  • 16. 16 This is our “new normal” for informatics Wednesday, April 30, 14
  • 17. 17 The Central Problem Is ... ‣ Lab technology is being refreshed, upgraded and replaced at an astonishing rate • Bigger, faster, parallel • Requiring increasingly sophisticated IT support • Cheap and easily obtainable Wednesday, April 30, 14
  • 18. 18 The Central Problem Is ... ‣ ... and IT still being caught by surprise in 2014 • Procurement practices and cheaper instrument prices result in situations where IT is bypassed or not consulted in advance Wednesday, April 30, 14
  • 19. True Story - 48 Hours Ago 19 Wednesday, April 30, 14
  • 20. A conversation with a client Just 48 hours ago ... ‣ Scientists tell IT that they are getting a new PacBio sequencing platform • Gave IT a 5-node cluster quote that PacBio provided as blueprint for SMRT Portal • Wanted confirmation that everything was cool with IT support 20 Wednesday, April 30, 14
  • 21. A conversation with a client Just 48 hours ago ... ‣ Partial “Minor” Issue List: • Scientists had no clue about power requirements. A pair of 60amp 220v power outlets = multi-month facility project • ... assumed IT would be cool accepting and supporting a one-off HPC system sized for 1 instrument & 1 workgroup • ... also appeared to believe that storage was infinite and free. At least that is what their budget assumed. 21 Wednesday, April 30, 14
  • 22. One more thing ... 22 Wednesday, April 30, 14
  • 23. We can’t blame the science/lab side for everything One more thing ... ‣ Can’t blame the lab-side for all our woes ‣ IT innovation is causing headaches in research and program management ‣ Grant funding agencies, regulatory rules and internal risk/program management practices not updated to reflect current and emerging IT capabilities, architectures & practices • Rules & policies often simply do not cover what we are capable of doing right now 23 Wednesday, April 30, 14
  • 24. 24 A related problem ... Wednesday, April 30, 14
  • 25. This also hurts ... ‣ It has never been easier to acquire vast amounts of data cheaply and easily ‣ Growth rate of data creation/ ingest exceeds rate at which the storage industry is improving disk capacity ‣ Not just a storage lifecycle problem. This data *moves* and often needs to be shared among multiple entities and providers • ... ideally without punching holes in your firewall or consuming all available internet bandwidth 25 Wednesday, April 30, 14
  • 26. The future is not looking pretty for the ill prepared 26 Wednesday, April 30, 14
  • 27. High Costs For Getting It Wrong ‣ Lost opportunity ‣ Missing capability ‣ Frustrated & very vocal scientific staff ‣ Problems in recruiting, retention, publication & product development 27 Wednesday, April 30, 14
  • 28. 28 Enough groundwork. Lets Talk Trends Wednesday, April 30, 14
  • 29. 29 Trends: DevOps & Org Charts Wednesday, April 30, 14
  • 30. 30 The social contract between scientist and IT is changing forever Wednesday, April 30, 14
  • 31. 31 You can blame “the cloud” for this Wednesday, April 30, 14
  • 32. 32 DevOps & Scriptable Everything ‣ On (real) clouds, EVERYTHING has an API ‣ If it’s got an API you can automate and orchestrate it ‣ “scriptable datacenters” are now a very real thing Wednesday, April 30, 14
  • 33. 33 DevOps & Scriptable Everything ‣ Incredible innovation in the past few years ‣ Driven mainly by companies with massive internet ‘fleets’ to manage ‣ ... but the benefits trickle down to us mere mortals Wednesday, April 30, 14
  • 34. 34 DevOps will conquer the enterprise ‣ Over the past few years cloud automation/ orchestration methods have been trickling down into our local infrastructures ‣ This will have significant impact on careers, job descriptions and org charts Wednesday, April 30, 14
  • 35. 2014: Continue to blur the lines between all these roles 35 Scientist/SysAdmin/Programmer ‣ Radical change in how IT is provisioned, delivered, managed & supported • Technology Driver: Virtualization & Cloud • Ops Driver: Configuration Mgmt, Systems Orchestration & Infrastructure Automation ‣ SysAdmins & IT staff need to re-skill and retrain to stay relevant www.opscode.com Wednesday, April 30, 14
  • 36. 2014: Continue to blur the lines between all these roles 36 Scientist/SysAdmin/Programmer ‣ When everything has an API ... ‣ ... anything can be ‘orchestrated’ or ‘automated’ remotely ‣ And by the way ... ‣ The APIs (‘knobs & buttons’) are accessible to all, not just the expert practitioners sitting in that room next to the datacenter Wednesday, April 30, 14
  • 37. 2014: Continue to blur the lines between all these roles 37 Scientist/SysAdmin/Programmer ‣ IT jobs, roles and responsibilities are changing ‣ SysAdmins must learn to program in order to harness automation tools ‣ Programmers & Scientists can now self- provision and control sophisticated IT resources Wednesday, April 30, 14
  • 38. 2014: Continue to blur the lines between all these roles 38 Scientist/SysAdmin/Programmer ‣ My take on the future ... • SysAdmins (Windows & Linux) who can’t code will have career issues • Far more control is going into the hands of the research end user • IT support roles will radically change -- no longer owners or gatekeepers ‣ IT will “own” policies, procedures, reference patterns, identity mgmt, security & best practices ‣ Research will control the “what”, “when” and “how big” Wednesday, April 30, 14
  • 39. 2014 Summary Trend: DevOps & Automation ‣ Almost every HPC project (all sizes) BioTeam worked on in 2014 included • A bare-metal OS provisioning service (Cobbler, etc.) • A ‘next-gen’ configuration management service (Chef, Puppet, Saltstack, etc.) ‣ Gut feeling: This is going to be very useful for regulated environments • Not BS or empty hype: IT infrastructure and server/OS/service configuration encoded as text files • Easy to version control, audit, revert, rebuild, verify and fold into existing change management & documentation systems 39 Wednesday, April 30, 14
  • 41. Compute related design patterns largely static 41 Core Compute ‣ Linux compute clusters are still the baseline compute platform ‣ Even our lab instruments know how to submit jobs to common HPC cluster schedulers ‣ Compute is not hard. It’s a commodity that is easy to acquire & deploy in 2014 Wednesday, April 30, 14
  • 42. Defensive hedge against Big Data / HDFS 42 Compute: Local Disk Matters ‣ This slide is from 2013; trend is continuing ‣ The “new normal” may be 4U enclosures with massive local disk spindles - not occupied, just available ‣ Why? Hadoop & Big Data ‣ This is a defensive hedge against future HDFS or similar requirements • Remember the ‘meta’ problem - science is changing far faster than we can refresh IT. This is a defensive future-proofing play. ‣ Hardcore Hadoop rigs sometimes operate at 1:1 ratio between core count and disk count Wednesday, April 30, 14
  • 43. Faster networks are driving compute config changes 43 Compute: NICs and Disks ‣ One pain point for me in 2013-2014: • Network links to my nodes are getting faster • It’s embarrassing my disks are slower than the network feeding them • Need to be careful about selecting and configuring high speed NICs - Example: that dual-port 10Gig card may not actually be able to drive both ports if the card was engineered for an active:passive link failover scenario • Also need to re-visit local disk configurations Wednesday, April 30, 14
  • 44. New and refreshed HPC systems running many node types 44 Compute: Huge trend in ‘diversity’ ‣ Accelerated trend since at least 2012 ... • HPC compute resources no longer homogenous; many types and flavors now deployed in single HPC stacks ‣ Newer clusters mix-and-match to match the known use cases: • GPU nodes for compute • GPU nodes for visualization • Large memory nodes (512GB +) • Very Large memory nodes (1TB +) • ‘Fat’ nodes with many CPU cores • ‘Thin’ nodes with super-fast CPUs • Analytic nodes with SSD, FusionIO, flash or large local disk for ‘big data’ tasks Wednesday, April 30, 14
  • 45. GPUs, Coprocessors & FPGAs 45 Compute: Hardware Acceleration ‣ Specialized hardware acceleration has it’s place but will not take over the world • “... the activation energy required for a scientist to use this stuff is generally quite high ...” ‣ GPU, Phi and FPGA best used in large scale pipelines or as specific solution to a singular pain point Wednesday, April 30, 14
  • 46. Compute: Big Data & Analytics ‣ BioTeam is starting to build “Big Data” labs and environments for clients ‣ The most interesting trend: • We are not designing for specific analytic use cases; in most projects are are adding in basic “capabilities” with the expectation that the apps and users will come later • ... defensive IT hedge against rapidly changing science requirements, remember? 46 Wednesday, April 30, 14
  • 47. Compute: Big Data & Analytics ‣ This translates to infrastructure designed to support certain capabilities rather than specific software or application. ‣ Example: • Beefy HDFS friendly servers • 100% bare metal provisioning and dynamic system reconfiguration • Systems for ingest • Very large RAM systems • Big PCIx bus systems • Memory-resident database systems • Mix of very fast and capacity optimized storage • Very fast core, top-of-rack and server networking 47 Wednesday, April 30, 14
  • 48. Also known as hybrid clouds Emerging Trend: Hybrid HPC ‣ No longer “utter crap” or “cynical vendor-supported reference case” • small local footprint • large, dynamic, scalable, orchestrated public cloud component ‣ DevOps is key to making this work ‣ High-speed network to public cloud required ‣ Software interface layer acting as the mediator between local and public resources ‣ Good for tight budgets, has to be done right to work ‣ Still best approached very carefully 48 Wednesday, April 30, 14
  • 49. BioIT World Homework ‣ We’ve got interesting hardware vendors on the show floor this week; check them out • Silicon Mechanics, Thinkmate, Microway: cool commodity • Intel, IBM, Dell, SGI: Large & enterprise • Timelogic: hardware acceleration • ... 49 Wednesday, April 30, 14
  • 51. 51 Big trouble ahead ... Wednesday, April 30, 14
  • 52. 52 Network: Speed @ Core and Edge ‣ Huge potential pain point ‣ May surpass storage as our #1 infrastructure headache ‣ Petascale data is useless if you can’t move it or access it fast enough ‣ Don’t be smug about 10 Gigabit - folks need to start thinking *now* about 40 and even 100 Gigabit Ethernet ‣ You may need 10Gig to some desktops for data ingest/export Wednesday, April 30, 14
  • 53. 53 Network: Speed @ Core and Edge ‣ Remember ~2004 when research storage requirements started to dwarf what the enterprise was using? ‣ Same thing is happening now for networking ‣ Research core, edge and top- of-rack networking speeds may exceed what the rest of the organization has standardized on Wednesday, April 30, 14
  • 54. Massive data movement needs are driving innovation pain This is going to be painful ‣ Enterprise networking folks are even more aloof than storage admins we battled in ’04 ‣ Often used to driving requirements and methods; unhappy when science starts to drive them out of their comfort zones ‣ Research needs to start pushing harder and faster for network speeds above 10GbE • This will take a long time so start now! 54 Wednesday, April 30, 14
  • 55. Not sure how this will play out ‣ It will be interesting to see what large-scale data movement does to our local infrastructure and desktop experience ‣ Especially with other trends like BYOD ‣ My $.02 • Speeds to our desktops are going get very fast, or • We give up on growing massive bandwidth to the client and embrace a full VDI model where the users just “remote desktop” into a well-networked scientific informatics environment 55 Wednesday, April 30, 14
  • 56. BioIT World Homework ‣ Visit the Internet2 booth to chat high speed networking • Ask about their free or low-cost training events and technical workshops; start thinking about how you can get your internal networking teams/leadership to attend • Ask them about the new trend of private/corporate links into I2 and other fast research networks ‣ Arista is here. Talking and exhibiting. They are not Cisco. Listen, visit & talk to them. 56 Wednesday, April 30, 14
  • 57. Significant new trend in networking Science DMZs 57 Wednesday, April 30, 14
  • 58. It’s real and becoming necessary Network: ‘ScienceDMZ’ ‣ BioTeam building them in 2014 and beyond ‣ Central premise: • Legacy firewall, network and security methods architected for “many small data flows” use cases • Not built to handle smaller #s of massive data flows • Also very hard to deploy ‘traditional’ security gear on 10Gigabit and faster networks ‣ More details, background & documents at http://fasterdata.es.net/science-dmz/ 58 Background traffic or competing bursts DTN traffic with wire-speed bursts 10GE 10GE 10GE Wednesday, April 30, 14
  • 59. Network: ‘ScienceDMZ’ ‣ Start thinking/discussing this sooner rather than later ‣ Just like “the cloud” this may fundamentally change internal operations and technology ‣ Will also require conscious buy-in and support from senior network, security and risk management professionals • ... these talks take time. Best to plan ahead 59 Wednesday, April 30, 14
  • 60. Network: ‘ScienceDMZ’ ‣ A Science DMZ has 3 required components: 1. Very fast “low-friction” network links and paths with security policy and enforcement specific to scientific workflows 2. Dedicated, high performance data transfer nodes (“DTNs”) highly optimized for high speed data xfer 3. Dedicated network performance/measurement nodes 60 Wednesday, April 30, 14
  • 61. Network: ‘ScienceDMZ’ ‣ Implementation specifics are complex; the basic concept is not: 1. Research need to move scientific data at high speeds is already being negatively affected by networks not designed for this requirement 2. Likely to force fundamental changes in core enterprise architectures on a similar disruptive scale as what genome data storage forced in ~2004 3. Firewalls/IDS and security in particular will be affected 61 Wednesday, April 30, 14
  • 62. 62 Simple Science DMZ: Image source: “The Science DMZ: Introduction & Architecture” -- esnet Wednesday, April 30, 14
  • 63. Network: ‘ScienceDMZ’ ‣ My gut feeling: 1. The fanciest and most complex Science DMZ architectures in the literature right now are not suitable for our world • Expensive specialized equipment; Expensive specialist staff expertise required • Often still experimental, not something enterprise IT would want to drop into a production environment 2. Science DMZ concepts are sound and simple implementations are possible today 3. Start small: • Incorporate these sorts of concepts/ideas into long term planning ASAP • Start adding network performance monitoring nodes to research networks, DMZs and external circuit connections now; this entire concept falls over without actionable flow and performance data • Start work on policies and procedures for manual bypass of firewall/IDS rules when known sender/receivers are freighting high speed data; automation comes later! 63 Wednesday, April 30, 14
  • 64. BioIT World Homework ‣ Bookmark http://fasterdata.es.net and check out the published materials and advice ‣ Monitor http://www.oinworkshop.com/ to see when a workshop/event may be coming near you (send your networking people ...) ‣ Both ESNet and Internet2 run training and technical workshops that deliver far more value for price than the usual training junkets 64 Wednesday, April 30, 14
  • 65. Check out this talk BioIT World Homework ‣ Track 1 - 3:10pm today: • Christian Todorov talks “Accelerating Biomedical Research Discovery: The 100G Internet2 Network – Built and Engineered for the Most Demanding Big Data Science Collaborations” 65 Wednesday, April 30, 14
  • 66. Not very significant trend in 2014: Software Defined Networking (“SDN”) 66 Wednesday, April 30, 14
  • 67. More hype than useful reality at the moment 67 Network: SDN Hype vs. Reality ‣ Software Defined Networking (“SDN”) is the new buzzword ‣ It WILL become pervasive and will change how we build and architect things ‣ But ... ‣ Not hugely practical at the moment for most environments • We need far more than APIs that control port forwarding behavior on switches • More time needed for all of the related technologies and methods to coalesce into something broadly useful and usable Wednesday, April 30, 14
  • 68. More hype than useful reality at the moment 68 Network: SDN ‣ My gut feeling: • It is the future but right now we are still in the “mostly empty hype” phase if you wanna be cynical about it; best to wait and watch • Production enterprise use: OpenFlow and similar stuff does not provide value relative to implementation effort right now • Best bang for the buck in ’14 will be getting ‘SDN’ features as part of some other supported stack - OpenStack, VMWare, Cloud, etc. Wednesday, April 30, 14
  • 70. 70 Storage ‣ Still the biggest expense, biggest headache and scariest systems to design in modern life science informatics environments ‣ Many of the pain points we’ve talked about for years are still in place: • Explosive growth forcing tradeoffs in capacity over performance • Lots of monolithic single tiers of storage • Critical need to actively manage data through it’s full life cycle (just storing data is not enough ...) • Need for post-POSIX solutions such as iRODS and other metadata-aware data repositories Wednesday, April 30, 14
  • 71. 71 Storage Trends ‣ The large but monolithic storage platforms we’ve built up over the years are no longer sufficient • Do you know how many people are running a single large scale-out NAS or parallel filesystem? Most of us! ‣ Tiered storage is now an absolute requirement • At a minimum we need an active storage tier plus something far cheaper/deeper for cold files ‣ Expect the tiers to involve multiple vendors, products and technologies • The Tier1 storage vendors tend to have higher-end pricing for their “all in one” tiered data management solutions Wednesday, April 30, 14
  • 72. 72 Storage - The Old Way ‣ Single tier of scale-out NAS or parallel FS ‣ Why? • Suitable for broadest set of use cases • Easy to procure/integrate • Lowest administrative & operational burden ‣ Example: • 400TB - 1PB of ‘something’ stores ‘everything’ Wednesday, April 30, 14
  • 73. 73 Storage - The New Way ‣ Multiple tiers; potentially from multiple vendors ‣ Why? • Way more cost efficient (size the tier to the need) • Single tier no longer capable of supporting all use cases and workflow patterns • Single tiers waste incredible money at large scale ‣ Example: • 10-40 TB SSD/Flash for ingest & IOPS-sensitive workloads • 50-400 TB tier (SATA/SAS/SSD mix) for active processing • Multi-petabyte tier (Cloud, Object, SATA) for cost and operationally efficient long term (yet reachable) storage of scientific data at rest Wednesday, April 30, 14
  • 74. Sticking 100% with Tier 1 vendors gets expensive 74 Storage: Disruptive stuff ahead ‣ BioTeam has built 1Petabyte ZFS-based storage pools from commodity whitebox kit for about ~$100,000 in direct hardware costs (engineering effort & admin not included in this price ...) ‣ There are many storage vendors in the middle tier who can provide storage systems that are less ‘risky’ than DIY homebuilt setups yet far less expensive than the traditional Tier 1 enterprise storage options • Several of these vendors are here at the show! ‣ Companies like Avere Systems are producing boxes that unify disparate storage tiers and link them to cloud and object stores • This is a route to unifying “tier 1” storage with the “cheap & deep” storage Wednesday, April 30, 14
  • 75. Infinidat aka http://izbox.com The new thumper. ‣ 1 petabyte usable NAS shipped as a single integrated rack • List price: $500 per usable terabyte ‣ More expensive than DIY ZFS on commodity chassis but less expensive than current mainstream products ‣ Lots of interesting use cases for ‘cheap & deep’ 75 Wednesday, April 30, 14
  • 76. Avere Systems Wait, I can DO that? ‣ These folks caught my eye in late 2013 for one very specific use case ‣ Since then I keep them in mind for 4-5 common problems I regularly face ‣ It can: • Add performance layer on top of storage bought to be “cheap & deep” • Virtualize many NAS islands into a single namespace • Replicate & move data between tiers and sites • Act as CIFS/NFS gateway to on-premise or offsite object stores *** • Treat Amazon S3 and Glacier as simply another storage tier fully integrated into your environment 76 Wednesday, April 30, 14
  • 77. Object Storage ‣ Object storage is the future for scientific data at rest • Total no brainer; makes more sense than the “files and folders” paradigm, especially for automated analysis • Plus Amazon does it for super cheap ‣ But ... There will be a long transition period due to all of our legacy codes and workflows • This is where gateway devices can play ‣ It can: • Provide a much better workflow design pattern than assuming “files and folders” data storage • Save millions of dollars via efficiencies of erasure coding • Provide a much more robust and resilient peta-scale storage framework • Hide behind a metadata-aware layer such as IRODS to provide very interesting capabilities 77 Wednesday, April 30, 14
  • 78. Object Storage ‣ Erasure coding distributed object stores are very interesting at peta-scale ... ‣ Think about how you would handle & replicate 20 petabytes of data the “traditional way” • Purchase 2x or 3x storage capacity to handle replication overhead • Ignore the nightmare scenario of having to restore from one of the distributed replicas 78 Wednesday, April 30, 14
  • 79. Object Storage ‣ Efficiencies of erasure coding allow for LESS raw disk to be distributed across MORE geographic sites ‣ End result is a “single” usable system that tolerant to the failure of an entire datacenter/site ‣ For the 20 petabyte problem instead of purchasing 2x disk you buy ~1.8x and use the capex savings to add an extra colo facility or increase WAN link speed 79 Wednesday, April 30, 14
  • 80. Exercise BioIT World Homework ‣ Pick a storage size that make sense for you (100TB or 1PB suggested) ‣ Visit the various storage vendors on the show floor and price out what 100TB or 1PB would cost ‣ You will see an awesome diversity of products, performance, features and capabilities at various price points • DO NOT fixate on price alone. This is a mistake. ‣ This is REALLY worth doing - there is incredible diversity in the mix of price/features/performance/ capability out there 80 Wednesday, April 30, 14
  • 81. Check out these booths BioIT World Homework ‣ Object storage: • Amplidata & CleverSafe ‣ Glue/Gateway/Acceleration: • Avere Systems ‣ Enterprise: • EMC Isilon, IBM, Dell, SGI, Hitachi, Panasas ‣ Mid-tier/Commodity: • Silicon Mechanics, Thinkmate, RAID Inc., Xyratex 81 Wednesday, April 30, 14
  • 82. Check out these talks BioIT World Homework ‣ Track 5 - noon today: • Aaron Gardner talks “Taming big scientific data growth with converged infrastructure” ‣ Track 1 - 2:55pm today: • Jacob Farmer talks “Bridging the Worlds of Files, Objects, NAS, and Cloud: A Blazing Fast Crash Course in Object Storage” ‣ Track 1 - 4:30pm today: • Dirk Petersen talks “ Deploying Very Low Cost Cloud Storage Technology in a Traditional Research HPC Environment 82 Wednesday, April 30, 14
  • 83. 83 Can you do a Bio-IT talk without using the ‘C’ word? Wednesday, April 30, 14
  • 84. 84 Cloud: 2014 ‣ Core advice remains the same ‣ A few new permutations ... Wednesday, April 30, 14
  • 85. Core Advice 85 Cloud: 2014 ‣ Research Organizations need a cloud strategy today yesterday • Those that don’t will be bypassed by frustrated users or sneaky “cloud aware” devices ‣ IaaS cloud services are only a departmental credit card away ... some senior scientists are too big to be fired for violating IT policy ‣ Instrument vendors are forcing the issue ‣ Storage vendors are forcing the issue Wednesday, April 30, 14
  • 86. Design Patterns 86 Cloud Advice ‣ We actually need several tested cloud design patterns: ‣ (1) To handle ‘legacy’ scientific apps & workflows ‣ (2) The special stuff that is worth re-architecting ‣ (3) Hadoop & big data analytics ‣ ... and maybe (4) Regulated/sensitive efforts... ‣ ...and maybe (5) a way to evaluate Commercial solutions Wednesday, April 30, 14
  • 87. Legacy HPC on the Cloud 87 Cloud Advice ‣ MIT StarCluster • http://star.mit.edu/cluster/ • This is your baseline • Extend as needed ‣ Also check out Univa • Commercially supported Grid Engine stack with compelling roadmap and native cloud capabilities Wednesday, April 30, 14
  • 88. “Cloudy” HPC 88 Cloud Advice ‣ Some of our research workflows are important enough to be rewritten for “the cloud” and the advantages that a truly elastic & API-driven infrastructure can deliver ‣ This is where you have the most freedom ‣ Many published best practices you can borrow ‣ Warning: Cloud vendor lock-in potential is strongest here Wednesday, April 30, 14
  • 89. What has changed .. Cloud: 2014 ‣ Lets revisit some of my bile from prior years ‣ “... private clouds: still utter crap” ‣ “... some AWS competitors are delusional pretenders” ‣ “... AWS has a multi-year lead on the competition” 89 Wednesday, April 30, 14
  • 90. Private Clouds in 2014: ‣ I’m no longer dismissing them as “utter crap” • However it is a lot of work and money to build a system that only has 5% of the features that AWS can deliver today (for a cheaper price). Need to be careful about the use case, justification and operational/development burden. ‣ Usable & useful in certain situations ‣ BioTeam positive experiences with OpenStack ‣ Starting to see OpenStack pilots among our clients ‣ Hype vs. Reality ratio still wacky ‣ Sensible only for certain shops • Have you seen what you have to do to your networks & gear? ‣ Still important to remain cynical and perform proper due diligence Wednesday, April 30, 14
  • 91. Not all AWS competitors are delusional ‣ Google Compute is viable in 2014 for scientific workflows • Compute/Memory: Late start into IaaS means CPUs and memory are current generation; we have ‘war stories’ from AWS users who probe /proc/cpuinfo on EC2 servers so they can instantly kill any instance running on older chipsets • Price: Competitive on price although the shooting war between IaaS providers means it is hard to pin down the current “winner”; The “sustained use” pricing is easier to navigate than AWS Reserved Instances. Overall AWS pricing algorithms for various services seem more complicated than Google equivalents. • Network performance: Fantastic networking and excellent performance/latency figures between regions and zones. VPC type features are baked into the default resource set • Ops: Priced in 1min increments; no more need to hunt and kill servers at 55 min past the hour. Google has a concept of “Projects” with assigned collaborators and quotas. Quite different from the AWS account structure and IAM-based access control model. Project-based paradigm easier to think about for scientific use case. • IaaS Building Blocks: Still far fewer features than AWS but the core building blocks that we need for science and engineering workflows are present. ‣ My $.02 • AWS is still the clear leader but Google Compute is now a viable option and it is worth ‘kicking the tires’ in 2014 and beyond ... to me AWS has had no serious competition until now Wednesday, April 30, 14
  • 92. Cloud Science Facilitators ‣ Cycle Computing is legit • They’ve proven themselves on some of largest IaaS HPC grids ever built • Experience with hybrid systems (cloud & premise) ‣ Smart people. Nice people. ‣ They have a booth, stop by and chat them up ... Wednesday, April 30, 14
  • 93. 93 The road ahead ... Wednesday, April 30, 14
  • 94. This has been a slow moving trend for years now ... 94 POSIX Alternatives Coming ‣ The scope of organizations faced with the limitations of POSIX filesystem will continue to expand ‣ We desperately need some sort of “metadata aware” data management solution in life science ‣ Nobody has an easy solution yet; several bespoke installations but no clear mass-market options ‣ IRODS front-ending “cheap & deep” storage tiers or object stores appears to be gaining significant interest out in our community Wednesday, April 30, 14
  • 95. Application Containers are getting interesting 95 Watch out for: Containerization ‣ Application containerization via methods like http://docker.io gaining significant attention • Docker support now in native RHEL kernel • AWS Elastic Beanstalk recently added Docker support ‣ If broadly adopted, these techniques will stretch research IT infrastructures in interesting directions • This is far more interesting to me than moving virtual machines around a network or into the cloud ‣ ... with a related impact on storage location, features & capability ‣ Major new news and progress expected in 2014 Wednesday, April 30, 14
  • 96. 96 Keep an eye on: Storage ‣ Data generation out-pacing technology ‣ Really interesting disruptive stuff on the market now ‣ Cheap/easy laboratory assays taking over • Researchers largely don’t know what to do with it all • Holding on to the data until someone figures it out • This will cause some interesting headaches for IT • Huge need for real “Big Data” applications to be developed Wednesday, April 30, 14
  • 97. 97 Keep an eye on: Networking ‣ Unless there’s an investment in ultra-high speed networking, need to change thought on analysis ‣ Data commons are becoming a precedent • Need to minimize the movement of data • Include compute power and analysis platform with data commons ‣ Move the analysis to the data, don’t move the data • Requires sharing/Large core institutional resources Wednesday, April 30, 14
  • 98. 98 Long term trends ... ‣ Compute continues to become easier ‣ Data movement and ingest (physical & network) gets harder ‣ Cost of storage will be dwarfed by “cost of managing stored data” ‣ We can see end-of-life for our current IT architecture and design patterns; new patterns will start to appear over next 2-5 years Wednesday, April 30, 14
  • 99. 99 Wrap-up: Final Advice & Tips Wednesday, April 30, 14
  • 100. Embrace The Innovation 100 Ending Advice: 1 of 5 ‣ Understand the ‘interesting times’ we are in • Science is changing faster than we can refresh IT • This is not going to change any time soon ‣ Advice: • Spend as much time thinking about future flexibility as you spend on actual current needs & requirements • Design for agility & responsiveness Wednesday, April 30, 14
  • 101. Capacity 101 Ending Advice: 2 of 5 ‣ Many of us will need ‘petabyte capable’ storage ‣ However: • Only some of us will ever have 1PB+ under management • The hard part is knowing whom that will be Wednesday, April 30, 14
  • 102. Tiers are in your future 102 Ending Advice: 3 of 5 ‣ Tiers are now a requirement, at least long-term • At a minimum we need an ‘active’ tier for processing & ingest • ... and some sort of inexpensive cold/nearline/archive option as well ‣ Advice: • It’s OK to buy a single block/tier of disk • ... but always have a strategy for diversification Wednesday, April 30, 14
  • 103. 103 Ending Advice: 4 of 5 ‣ Above a certain scale, inefficient data management & simple storage practices are hugely wasteful ‣ Advice: • The cost of a new hire “data manager” or curator role may be cheaper and far more beneficial to your organization than continuing to throw CapEx dollars at keeping a badly run storage platform under it’s capacity limit • Many opportunities to get clever & recapture efficiency & capability: tiers, replication, cloud, dedupe, CRAM compression, iRODS • BROADEN YOUR PERSPECTIVE Wednesday, April 30, 14
  • 104. 104 Ending Advice: 5 of 5 ‣ You need a cloud strategy. Yesterday. - Users, instrument makers & IT vendors are forcing the issue - Economic trends indicate cloud storage is inescapable - 90% of cloud is “easy”. Remaining 10% takes time & effort ‣ Advice: • The technical aspects of using “the cloud” are trivial • The political, policy and risk management aspects are difficult and time consuming; start these ASAP Wednesday, April 30, 14
  • 105. 105 end; Thanks! slideshare.net/chrisdag/ chris@bioteam.net @chris_dag #BioIT14 Wednesday, April 30, 14