1
Life Science HPC & Informatics: Trends from the trenches
April 2014
Wednesday, April 9, 14
Who, What, Why ...
2
BioTeam
‣ Independent consulting shop
‣ Staffed by scientists forced to
learn IT, SW & HPC to get our
...
Active at NIH since 2008
3
BioTeam & NIH
‣ Our primary goal: make science
easier for researchers at NIH via
scientific comp...
4
Topic 1: Scariest thing first ...
The biggest meta-issue facing life science informatics
Wednesday, April 9, 14
5
It’s a risky time to be doing Bio-IT
Wednesday, April 9, 14
6
Big Picture / Meta Issue
‣ HUGE revolution in the rate at which
lab platforms are being redesigned,
improved & refreshed...
Science progressing way faster than IT can refresh/change
The Central Problem Is ...
‣ Instrumentation & protocols are cha...
The Central Problem Is ...
‣ The easy period is over
‣ 5 years ago we could toss
inexpensive storage and
servers at the pr...
9
The new normal for informatics
Wednesday, April 9, 14
And a related problem ...
‣ It has never been easier to
acquire vast amounts of data
cheaply and easily
‣ Growth rate of d...
If we get it wrong ...
‣ Lost opportunity
‣ Missing capability
‣ Frustrated & very vocal scientific staff
‣ Slowed pace of s...
Up to a two line subtitle, generally used to describe the
takeaway for the slide
12
Basic Bio/IT Landscape
Wednesday, Apri...
Compute related design patterns largely static
13
Core Compute
‣ Linux compute clusters
are still the baseline
compute pla...
We have them all
File & Data Types
‣ Massive text files
‣ Massive binary files
‣ Flatfile ‘databases’
‣ Spreadsheets everywhe...
15
Application characteristics
‣ Mostly SMP/threaded apps
performance bound by IO and/or
RAM
‣ Hundreds of apps, codes & t...
16
Storage & Data Management
‣ LifeSci core requirement:
• Shared, simultaneous read/write
access across many instruments,...
17
Storage & Data Management
‣ Storage & data mgmt. is the #1
infrastructure headache in life
science environments
‣ Most ...
18
Data Movement & Data Sharing
‣ Peta-scale data movement
needs
• Within an organization
• To/from collaborators
• To/fro...
19
Networking
‣ Major 2014 focus
‣ May surpass storage as our
#1 infrastructure headache
‣ Why?
• Petascale storage meanin...
Physical & Network
20
We Have Both Ingest Problems
‣ Significant physical ingest
occurring in Life Science
• Standard media...
21
Physical Ingest Just Plain Nasty
‣ Most common high-speed
network: FedEx
‣ Easy to talk about in theory
‣ Seems “easy” ...
And huge need for fast(er) research networks!
22
Huge Need For Network Ingest
1. Public data repositories have
petabytes o...
23
It all boils down to this ...
Wednesday, April 9, 14
24
Life Science In One Slide:
‣ Huge compute needs but not intractable and generally
solved via Linux HPC farms. Most of o...
25
Trends: DevOps & Org Charts
Wednesday, April 9, 14
26
The social contract between
scientist and IT is changing forever
Wednesday, April 9, 14
27
You can blame “the cloud” for this
Wednesday, April 9, 14
28
DevOps & Scriptable Everything
‣ On (real) clouds,
EVERYTHING has an API
‣ If it’s got an API you can
automate and orch...
29
DevOps & Scriptable Everything
‣ Incredible innovation in
the past few years
‣ Driven mainly by
companies with
massive ...
... and conquer the enterprise
30
DevOps will enable hybrid HPC
‣ Cloud automation/
orchestration methods
have been trickl...
2014: Continue to blur the lines between all these roles
31
Scientist/SysAdmin/Programmer
‣ IT jobs, roles and
responsibil...
2014: Continue to blur the lines between all these roles
32
Scientist/SysAdmin/Programmer
‣ My take on the future ...
• Sy...
Research needing more and more compute
33
IT Orgs are Changing as well...
‣ 25% of researchers will
need HPC this year
‣ 7...
Research needing more and more compute
34
IT Orgs are Changing as well...
‣ Three types of adaptations
• IT evolved to inc...
35
Trends: Compute
Wednesday, April 9, 14
36
Compute:
‣ Kind of boring. Solved
problem in 2014
‣ Compute power is a
commodity
• Inexpensive relative to other
costs
...
Defensive hedge against Big Data / HDFS
37
Compute: Local Disk is Back
‣ We’ve started to see organizations move
away from...
New and refreshed HPC systems running many node types
38
Compute: Huge trend in ‘diversity’
‣ Accelerated trend since at l...
GPUs, Coprocessors & FPGAs
39
Compute: Hardware Acceleration
‣ Specialized hardware
acceleration has it’s place
but will n...
Also known as hybrid clouds
Emerging Trend: Hybrid HPC
‣ Relatively new idea
• small local footprint
• large, dynamic, sca...
41
Trends: Network
Wednesday, April 9, 14
42
Network: Speed @ Core and Edge
‣ Huge potential pain point
‣ May surpass storage as our
#1 infrastructure headache
‣ Pe...
43
Network: Speed @ Core and Edge
‣ Remember 2004 when
research storage
requirements started to dwarf
what the enterprise ...
Massive data movement needs are driving innovation
NIH Tackling this now!
‣ Currently installing
100Gb research network
‣ ...
Network: ‘ScienceDMZ’
‣ “ScienceDMZ” concept is real and necessary
‣ BioTeam will be building them in 2014 and
beyond
‣ Ce...
Network: ‘ScienceDMZ’
‣ Start thinking/discussing this sooner rather
than later
‣ Just like “the cloud” this may fundament...
Network: ‘ScienceDMZ’
‣ A Science DMZ has 3 required components:
1. Very fast “low-friction” network links and paths with
...
48
Simple Science DMZ:
Image source: “The Science DMZ: Introduction & Architecture” -- esnet
Wednesday, April 9, 14
More hype than useful reality at the moment
49
Network: SDN Hype vs. Reality
‣ Software Defined Networking
(“SDN”) is the n...
50
Trends: Storage
Wednesday, April 9, 14
51
Storage
‣ Still the biggest expense, biggest headache and
scariest systems to design in modern life science
informatics...
52
Storage Trends
‣ The large but monolithic storage platforms we’ve
built up over the years are no longer sufficient
• Do y...
The Tier 1 storage vendors may be too expensive ...
53
Storage: Disruptive stuff ahead
‣ BioTeam has built 1Petabyte ZFS-ba...
54
The road ahead ...
Wednesday, April 9, 14
Some final thoughts
55
Future Trends and Patterns
‣ Data generation out-
pacing technology
‣ Cheap/easy laboratory
assays t...
Some final thoughts
56
Future Trends and Patterns
‣ Unless there’s an investment
in ultra-high speed
networking, need to ch...
Some final thoughts
57
Future Trends & Patterns
‣ Compute continues to become easier
‣ Data movement (physical & network) g...
A new challenge ...
58
Future Trends & Patterns
‣ Responsible sharing of clinical and genomic data
will be the grand chall...
Up to a two line subtitle, generally used to describe the
takeaway for the slide
59
end; Thanks!
`
Wednesday, April 9, 14
Upcoming SlideShare
Loading in …5
×

BioTeam Trends from the Trenches - NIH, April 2014

580 views
490 views

Published on

NIH Workshop on Advanced Networking for Data-Intensive Biomedical Research presentation - April 9, 2014

Published in: Science
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
580
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

BioTeam Trends from the Trenches - NIH, April 2014

  1. 1. 1 Life Science HPC & Informatics: Trends from the trenches April 2014 Wednesday, April 9, 14
  2. 2. Who, What, Why ... 2 BioTeam ‣ Independent consulting shop ‣ Staffed by scientists forced to learn IT, SW & HPC to get our own research done ‣ 12+ years bridging the “gap” between science, IT & high performance computing ‣ Our wide-ranging work is what gets us invited to speak at events like this ... Wednesday, April 9, 14
  3. 3. Active at NIH since 2008 3 BioTeam & NIH ‣ Our primary goal: make science easier for researchers at NIH via scientific computing ‣ Recently involved in many projects: • NIH-Wide HPC Assessment • NIAID HPC Assessment • NIMH Bioinformatics Assessment • NCATS IT/Informatics Assessment • NIH Network Modernization Project Wednesday, April 9, 14
  4. 4. 4 Topic 1: Scariest thing first ... The biggest meta-issue facing life science informatics Wednesday, April 9, 14
  5. 5. 5 It’s a risky time to be doing Bio-IT Wednesday, April 9, 14
  6. 6. 6 Big Picture / Meta Issue ‣ HUGE revolution in the rate at which lab platforms are being redesigned, improved & refreshed • Example: CCD sensor upgrade on that confocal microscopy rig just doubled storage requirements • Example: The 2D ultrasound imager is now a 3D imager • Example: Illumina HiSeq upgrade just doubled the rate at which you can acquire genomes. Massive downstream increase in storage, compute & data movement needs ‣ For the above examples, do you think IT was informed in advance? Wednesday, April 9, 14
  7. 7. Science progressing way faster than IT can refresh/change The Central Problem Is ... ‣ Instrumentation & protocols are changing FAR FASTER than we can refresh our Research-IT & Scientific Computing infrastructure • Bench science is changing month-to-month ... • ... while our IT infrastructure only gets refreshed every 2-7 years ‣ We have to design systems TODAY that can support unknown research requirements & workflows over many years (gulp ...) 7 Wednesday, April 9, 14
  8. 8. The Central Problem Is ... ‣ The easy period is over ‣ 5 years ago we could toss inexpensive storage and servers at the problem; even in a nearby closet or under a lab bench if necessary ‣ That does not work any more; real solutions required 8 Wednesday, April 9, 14
  9. 9. 9 The new normal for informatics Wednesday, April 9, 14
  10. 10. And a related problem ... ‣ It has never been easier to acquire vast amounts of data cheaply and easily ‣ Growth rate of data creation/ ingest exceeds rate at which the storage industry is improving disk capacity ‣ Not just a storage lifecycle problem. This data *moves* and often needs to be shared among multiple entities and providers • ... ideally without punching holes in your firewall or consuming all available internet bandwidth 10 Wednesday, April 9, 14
  11. 11. If we get it wrong ... ‣ Lost opportunity ‣ Missing capability ‣ Frustrated & very vocal scientific staff ‣ Slowed pace of scientific discovery ‣ Problems in recruiting, retention, publication & product development 11 Wednesday, April 9, 14
  12. 12. Up to a two line subtitle, generally used to describe the takeaway for the slide 12 Basic Bio/IT Landscape Wednesday, April 9, 14
  13. 13. Compute related design patterns largely static 13 Core Compute ‣ Linux compute clusters are still the baseline compute platform ‣ Even our lab instruments know how to submit jobs to common HPC cluster schedulers ‣ Compute is not hard. It’s a commodity that is easy to acquire & deploy in 2014 Wednesday, April 9, 14
  14. 14. We have them all File & Data Types ‣ Massive text files ‣ Massive binary files ‣ Flatfile ‘databases’ ‣ Spreadsheets everywhere ‣ Directories w/ 6 million files ‣ Large files: 600GB+ ‣ Small files: 30kb or smaller 14 Wednesday, April 9, 14
  15. 15. 15 Application characteristics ‣ Mostly SMP/threaded apps performance bound by IO and/or RAM ‣ Hundreds of apps, codes & toolkits ‣ 1TB - 2TB RAM “High Memory” nodes becoming essential ‣ Lots of Perl/Python/R ‣ MPI is rare • Well written MPI is even rarer ‣ Few MPI apps actually benefit from expensive low-latency interconnects* • *Chemistry, modeling and structure work is the exception Wednesday, April 9, 14
  16. 16. 16 Storage & Data Management ‣ LifeSci core requirement: • Shared, simultaneous read/write access across many instruments, desktops & HPC silos ‣ NAS = “easiest” option • Scale Out NAS products are the mainstream standard ‣ Parallel & Distributed storage for edge cases and large organizations with known performance needs • Becoming much more common: GPFS has taken hold in LifeSci Wednesday, April 9, 14
  17. 17. 17 Storage & Data Management ‣ Storage & data mgmt. is the #1 infrastructure headache in life science environments ‣ Most labs need “peta capable” storage due to unpredictable future • Only a small % will actually hit 1PB • Often forced to trade away performance in order to obtain capacity ‣ Object stores, ZFS and commodity “Nexentastor-style” methods are making significant inroads Wednesday, April 9, 14
  18. 18. 18 Data Movement & Data Sharing ‣ Peta-scale data movement needs • Within an organization • To/from collaborators • To/from suppliers • To/from public data repos ‣ Peta-scale data sharing needs • Collaborators and partners may be all over the world Wednesday, April 9, 14
  19. 19. 19 Networking ‣ Major 2014 focus ‣ May surpass storage as our #1 infrastructure headache ‣ Why? • Petascale storage meaningless if you can’t access/move it • 10-Gig, 40-Gig and 100-Gig networking will force significant changes elsewhere in the ‘bio- IT’ infrastructure Wednesday, April 9, 14
  20. 20. Physical & Network 20 We Have Both Ingest Problems ‣ Significant physical ingest occurring in Life Science • Standard media: naked SATA drives shipped via Fedex ‣ Cliche example: • 30 genomes outsourced means 30 drives will soon be sitting in your mail pile ‣ Organizations often use similar methods to freight data between buildings and among geographic sites Wednesday, April 9, 14
  21. 21. 21 Physical Ingest Just Plain Nasty ‣ Most common high-speed network: FedEx ‣ Easy to talk about in theory ‣ Seems “easy” to scientists and even IT at first glance ‣ Really really nasty in practice • Incredibly time consuming • Significant operational burden • Easy to do badly / lose data Wednesday, April 9, 14
  22. 22. And huge need for fast(er) research networks! 22 Huge Need For Network Ingest 1. Public data repositories have petabytes of useful data 2. Collaborators still need to swap data in serious ways 3. Amazon becoming an important repo of public and private sources 4. Many vendors now “deliver” to the cloud Wednesday, April 9, 14
  23. 23. 23 It all boils down to this ... Wednesday, April 9, 14
  24. 24. 24 Life Science In One Slide: ‣ Huge compute needs but not intractable and generally solved via Linux HPC farms. Most of our workloads are serial/batch in nature ‣ Ludicrous rate of innovation in lab drives a similar rate of change for our software and tool environment ‣ With science changing faster than IT, emphasis is on agility and flexibility - we’ll trade performance for some measure of future proofing ‣ Buried in data. Getting worse. Individual scientists can generate petascale data streams. ‣ We have all of the Information Lifecycle problems: Storing, Curating, Managing, Sharing, Ingesting and Moving Wednesday, April 9, 14
  25. 25. 25 Trends: DevOps & Org Charts Wednesday, April 9, 14
  26. 26. 26 The social contract between scientist and IT is changing forever Wednesday, April 9, 14
  27. 27. 27 You can blame “the cloud” for this Wednesday, April 9, 14
  28. 28. 28 DevOps & Scriptable Everything ‣ On (real) clouds, EVERYTHING has an API ‣ If it’s got an API you can automate and orchestrate it ‣ “scriptable infrastructure” is now a reality ‣ Driving capabilities that we will need in 2014 and beyond Wednesday, April 9, 14
  29. 29. 29 DevOps & Scriptable Everything ‣ Incredible innovation in the past few years ‣ Driven mainly by companies with massive internet ‘fleets’ to manage ‣ ... but the benefits trickle down to us little people Wednesday, April 9, 14
  30. 30. ... and conquer the enterprise 30 DevOps will enable hybrid HPC ‣ Cloud automation/ orchestration methods have been trickling down into our local infrastructures ‣ Driving significant impact on careers, job descriptions and org charts ‣ These methods are necessary for emerging hybrid cloud models for HPC/sharing Wednesday, April 9, 14
  31. 31. 2014: Continue to blur the lines between all these roles 31 Scientist/SysAdmin/Programmer ‣ IT jobs, roles and responsibilities are going to change significantly ‣ SysAdmins must learn to program in order to harness automation tools ‣ Programmers & Scientists can now self- provision and control sophisticated IT resources Wednesday, April 9, 14
  32. 32. 2014: Continue to blur the lines between all these roles 32 Scientist/SysAdmin/Programmer ‣ My take on the future ... • SysAdmins (Windows & Linux) who can’t code will have career issues • Far more control is going into the hands of the research end user • IT support roles will radically change -- no longer owners or gatekeepers ‣ IT will “own” policies, procedures, reference patterns, identity mgmt, security & best practices ‣ Research will control the “what”, “when” and “how big” Wednesday, April 9, 14
  33. 33. Research needing more and more compute 33 IT Orgs are Changing as well... ‣ 25% of researchers will need HPC this year ‣ 75% will need high- volume storage ‣ IT evolved from administrative need • Science started grabbing resources • IT either adapted or was replaced Wednesday, April 9, 14
  34. 34. Research needing more and more compute 34 IT Orgs are Changing as well... ‣ Three types of adaptations • IT evolved to include research IT support • IT split into research IT and corporate IT • IT became primarily research org -> run by CSIO ‣ Orgs with scientific missions need adaptive IT with stake in research projects -> restrictions kill science Wednesday, April 9, 14
  35. 35. 35 Trends: Compute Wednesday, April 9, 14
  36. 36. 36 Compute: ‣ Kind of boring. Solved problem in 2014 ‣ Compute power is a commodity • Inexpensive relative to other costs • Far less vendor differentiation than storage • Easy to acquire; easy to deploy Wednesday, April 9, 14
  37. 37. Defensive hedge against Big Data / HDFS 37 Compute: Local Disk is Back ‣ We’ve started to see organizations move away from blade servers and 1U pizza box enclosures for HPC ‣ The “new normal” may be 4U enclosures with massive local disk spindles - not occupied, just available ‣ Why? Hadoop & Big Data ‣ This is a defensive hedge against future HDFS or similar requirements • Remember the ‘meta’ problem - science is changing far faster than we can refresh IT. This is a defensive future-proofing play. ‣ Hardcore Hadoop rigs sometimes operate at 1:1 ratio between core count and disk count Wednesday, April 9, 14
  38. 38. New and refreshed HPC systems running many node types 38 Compute: Huge trend in ‘diversity’ ‣ Accelerated trend since at least 2012 ... • HPC compute resources no longer homogenous; many types and flavors now deployed in single HPC stacks ‣ Newer clusters mix-and-match to match the known use cases: • GPU nodes for compute • GPU nodes for visualization • Large memory nodes (512GB +) • Very Large memory nodes (1TB +) • ‘Fat’ nodes with many CPU cores • ‘Thin’ nodes with super-fast CPUs • Analytic nodes with SSD, FusionIO, flash or large local disk for ‘big data’ tasks Wednesday, April 9, 14
  39. 39. GPUs, Coprocessors & FPGAs 39 Compute: Hardware Acceleration ‣ Specialized hardware acceleration has it’s place but will not take over the world • “... the activation energy required for a scientist to use this stuff is generally quite high ...” ‣ GPU, Phi and FPGA best used in large scale pipelines or as specific solution to a singular pain point Wednesday, April 9, 14
  40. 40. Also known as hybrid clouds Emerging Trend: Hybrid HPC ‣ Relatively new idea • small local footprint • large, dynamic, scalable, orchestrated public cloud component ‣ DevOps is key to making this work ‣ High-speed network to public cloud required ‣ Software interface layer acting as the mediator between local and public resources ‣ Good for tight budgets, has to be done right to work ‣ Not many working examples yet 40 Wednesday, April 9, 14
  41. 41. 41 Trends: Network Wednesday, April 9, 14
  42. 42. 42 Network: Speed @ Core and Edge ‣ Huge potential pain point ‣ May surpass storage as our #1 infrastructure headache ‣ Petascale data is useless if you can’t move it or access it fast enough ‣ Don’t be smug about 10 Gigabit - folks need to start thinking *now* about 40 and even 100 Gigabit Ethernet Wednesday, April 9, 14
  43. 43. 43 Network: Speed @ Core and Edge ‣ Remember 2004 when research storage requirements started to dwarf what the enterprise was using? ‣ Same thing is happening now for networking ‣ Research core, edge and top- of-rack networking speeds may exceed what the rest of the organization has standardized on Wednesday, April 9, 14
  44. 44. Massive data movement needs are driving innovation NIH Tackling this now! ‣ Currently installing 100Gb research network ‣ Will tackle the petascale data movement head on • NIH gaining ground on 1PB/month • Collaboration, core compute, data commons, external data sources • Science DMZ! 44 Wednesday, April 9, 14
  45. 45. Network: ‘ScienceDMZ’ ‣ “ScienceDMZ” concept is real and necessary ‣ BioTeam will be building them in 2014 and beyond ‣ Central premise: • Legacy firewall, network and security methods architected for “many small data flows” use cases • Not built to handle smaller #s of massive data flows • Also very hard to deploy ‘traditional’ security gear on 10Gigabit and faster networks ‣ More details, background & documents at http://fasterdata.es.net/science-dmz/ 45 Background traffic or competing bursts DTN traffic with wire-speed bursts 10GE 10GE 10GE Wednesday, April 9, 14
  46. 46. Network: ‘ScienceDMZ’ ‣ Start thinking/discussing this sooner rather than later ‣ Just like “the cloud” this may fundamentally change internal operations and technology ‣ Will also require conscious buy-in and support from senior network, security and risk management professionals • ... these talks take time. Best to plan ahead 46 Wednesday, April 9, 14
  47. 47. Network: ‘ScienceDMZ’ ‣ A Science DMZ has 3 required components: 1. Very fast “low-friction” network links and paths with security policy and enforcement specific to scientific workflows 2. Dedicated, high performance data transfer nodes (“DTNs”) highly optimized for high speed data xfer 3. Dedicated network performance/measurement nodes 47 Wednesday, April 9, 14
  48. 48. 48 Simple Science DMZ: Image source: “The Science DMZ: Introduction & Architecture” -- esnet Wednesday, April 9, 14
  49. 49. More hype than useful reality at the moment 49 Network: SDN Hype vs. Reality ‣ Software Defined Networking (“SDN”) is the new buzzword ‣ It will become pervasive and will change how we build and architect things ‣ But ... ‣ Not hugely practical at the moment for most environments • We need far more than APIs that control port forwarding behavior on switches • More time needed for all of the related technologies and methods to coalesce into something broadly useful and usable Wednesday, April 9, 14
  50. 50. 50 Trends: Storage Wednesday, April 9, 14
  51. 51. 51 Storage ‣ Still the biggest expense, biggest headache and scariest systems to design in modern life science informatics environments ‣ Many of the pain points we’ve talked about for years are still in place: • Explosive growth forcing tradeoffs in capacity over performance • Lots of monolithic single tiers of storage • Critical need to actively manage data through it’s full life cycle (just storing data is not enough ...) • Need for post-POSIX solutions such as iRODS and other metadata-aware data repositories Wednesday, April 9, 14
  52. 52. 52 Storage Trends ‣ The large but monolithic storage platforms we’ve built up over the years are no longer sufficient • Do you know how many people are running a single large scale-out NAS or parallel filesystem? Most of us! ‣ Tiered storage is now an absolute requirement • At a minimum we need an active storage tier plus something far cheaper/deeper for cold files ‣ Expect the tiers to involve multiple vendors, products and technologies • The Tier1 storage vendors tend to have unacceptable pricing for their “all in one” tiered data management solutions Wednesday, April 9, 14
  53. 53. The Tier 1 storage vendors may be too expensive ... 53 Storage: Disruptive stuff ahead ‣ BioTeam has built 1Petabyte ZFS-based storage pools from commodity whitebox hardware for about $100,000 ‣ Infinidat “IZbox” provides 1Petabyte of usable NAS as a turnkey appliance for roughly $375,000 • Both of these would be a nice, cost-effective archive or “cold” tier for less- active file and data storage • Solutions like these cost far, far less than what Tier 1 storage vendors would charge for a petabyte of usable storage • ... of course they come with their own risks and operational burden. This is an area where proper research and due diligence is essential ‣ Companies like Avere Systems are producing boxes that unify disparate storage tiers and link them to cloud and object stores • This is a route to unifying “tier 1” storage with the “cheap & deep” storage Wednesday, April 9, 14
  54. 54. 54 The road ahead ... Wednesday, April 9, 14
  55. 55. Some final thoughts 55 Future Trends and Patterns ‣ Data generation out- pacing technology ‣ Cheap/easy laboratory assays taking over • Researchers largely don’t know what to do with it all • Holding on to the data until someone figures it out • This will cause some interesting headaches for IT • Huge need for real “Big Data” applications to be developed Wednesday, April 9, 14
  56. 56. Some final thoughts 56 Future Trends and Patterns ‣ Unless there’s an investment in ultra-high speed networking, need to change thought on analysis ‣ Data commons are becoming a precedent • Need to minimize the movement of data • Include compute power and analysis platform with data commons ‣ Move the analysis to the data, don’t move the data • Requires sharing/Large core institutional resources Wednesday, April 9, 14
  57. 57. Some final thoughts 57 Future Trends & Patterns ‣ Compute continues to become easier ‣ Data movement (physical & network) gets harder ‣ Cost of storage will be dwarfed by “cost of managing stored data” ‣ We can see end-of-life for our current IT architecture and design patterns; new patterns will start to appear over next 2-5 years ‣ We’ve got a new headache to worry about ... Wednesday, April 9, 14
  58. 58. A new challenge ... 58 Future Trends & Patterns ‣ Responsible sharing of clinical and genomic data will be the grand challenge of the post human genome project era ‣ We HAVE to get it right ‣ The ‘Global Alliance’ whitepaper cosigned by 70+ organizations is a must read: • Short link to whitepaper: http://biote.am/9j • Long link: https://www.broadinstitute.org/files/news/pdfs/ GAWhitePaperJune3.pdf • NIH will be critical in making this work for the world Wednesday, April 9, 14
  59. 59. Up to a two line subtitle, generally used to describe the takeaway for the slide 59 end; Thanks! ` Wednesday, April 9, 14

×