SlideShare a Scribd company logo
1
ScienceDMZ: 

Industry Trends & Widespread Drivers
slideshare.net/chrisdag/ chris@bioteam.net @chris_dag
2
I’m Chris.
I’m an infrastructure geek.
I work for the BioTeam.
(and CDC …)
3
Chris & Ari: Why 2 of Us Today?
Answer: Ari concentrates on Federal/US.Gov while I deal mostly
with commercial biotech/pharma, EDU and non-profit Orgs. 

They are very different.
4
Bottom Line: Science evolves faster than
IT can refresh infrastructure & practices
5
This is why we are here today.
6
Terabyte-scale Instruments & Lab Tools
Cheap, easy to acquire and popping up EVERYWHERE
HiSeq 2500 MiSeq NextSeq 500
And this …
$1,000 human genome @ 30x coverage
* some caveats
7
Illumina HiSeq X 10
This data will be moving constantly …
Illumina HiSeq x 10
‣ Raw Instrument Data
• +13 TB every 3 days
‣ FASTQ Conversion
• +8 TB every 3 days
‣ Align -> Compressed BAM
• +2 TB every three days
‣ Data Distribution
• ?
8
9
Coming Soon To a Researcher Near You:
USB-attached genomic sequencing
Gulp.
10
Tipping Point #1
Effort/cost of generating or acquiring vast piles of data
in 2015 is far less than real world cost of storing and
managing that data through a realistic lifecycle.
11
Tipping Point #2
Scientists still believe storage is cheap & near-infinite.
Data triage no longer sufficient. Scientists rarely asked
to articulate a scientific/business case for storage.
12
Tipping Point #3
Centralized infrastructure models are not sufficient and
must be modified. Data & compute WILL span sites and
locations with or without active IT involvement. 



We need to start preparing now.
13
“Center Of Gravity” Problem
14
“Center Of Gravity” Problem
Current methods involving centralized storage and bringing
“users” and “compute” very close “… to the data” are going
to face significant problems in 2015 and beyond.
15
“Center Of Gravity” Pain #1
Terabyte class instruments. Everywhere. Gulp.


We can not stop this trend - large scale data generation will span labs,
bulding, campus sites & WANs
16
“Center Of Gravity” Pain #2
Collaborations & Peta-scale Open Access Data


The future of large scale genomics|informatics increasingly involves
multi-party / multi-site collaboration. Also: Petabytes of free data (!!)
17
“Center Of Gravity” Pain #3
Object Storage Less Effective @ Single Site


Object storage is the future of scientific data at rest. Some major side
benefits (erasure coding, etc.) can only be realized when 3 or more
sites are involved
18
“Center Of Gravity” Summarized
Data spread is unavoidable. Effectively Unstoppable.


We have a WAN-scale data movement/access problem.
There are ~2 viable approaches going forward ...
19
Option 1 - “Stay Centralized”
Still totally viable but much faster connectivity to
instruments & collaborators will be essential
Nutshell: Significant investment in edge/WAN connectivity required,
likely requiring bandwidth exceeding 10Gbps
20
Option 2 - “Go With The Flow”
Embrace the distributed & “cloudy” future where
compute & storage span multiple zones
Nutshell: Still requires massive bandwidth upgrades to support
metadata-aware or location-aware access & compute
21
It all boils down to …
22
Terabyte-scale data movement is
going to be an informatics “grand
challenge” for the next 2-3+ years
And far harder/scarier than previous compute & storage challenges
23
History Time …
Long history of engagement & cooperation
Research IT vs. Enterprise IT
‣ Historically our infrastructure requirements
often surpassed what the Enterprise uses to
sustain day to day operation
‣ We’ve spent ~20 years working closely with
Enterprise IT to enable “data intensive
science”
‣ Relatively easy to align informatics IT
infrastructure with established vendor,
product, technology and architecture
standards
24
Barely worth talking about in 2015
25
Computing Power
‣ 32 CPU cores to 60,000
cores - it almost does not
matter
‣ Simple commodity
‣ Interesting & challenging
but not insanely hard.
‣ Easy to acquire & deploy
in 2015 at whatever scale
is needed (budget
permitting)
Still a hassle but no longer intractable
26
Storage
‣ Petabyte-capable storage is no
big deal in 2015
‣ Pricing slowly being
commoditized
‣ Many opportunities to do clever
stuff or waste phenomenal
amounts of money
‣ Biggest risk may be research
driving towards object storage
faster than Enterprise is willing
to commit/support
Hard but not insurmountable
27
Data Management
‣ Managing scientific data
at rest is still very hard
‣ … but we have seen a
few successful ways
forward
‣ DIY/RDBMS/LIMS
‣ iRODS
‣ Object Storage
$#%(*&@#@*&^@!*^@!(*&# !!!!!!!!!!!!!!!!!!!!!!!!
28
Data Movement
Prepare For Pain …
29
2015 Grand Challenge
Large-scale Data Movement (and why this will be very difficult …)
30
Issue #1
Current LAN/WAN stacks bad for emerging use case
Existing technology we’ve used for decades has been architected to
support many small network flows; not a single big data flow
31
Issue #2
Ratio of LAN:WAN bandwidth is out of whack
We will need faster links to “outside” than most organizations have
anticipated or accounted for in long-term technology planning
32
Issue #3
Core, Campus, Edge and “Top of Rack” bandwidth
Enterprise networking types can be *smug* about 10Gbps at the
network core. Boy are they in for a bad surprise.
33
Issue #4
Bigger blast radius when stuff goes wrong
Compute & storage can be logically or physically contained to
minimize disruption/risk when Research does stupid things.


Networks, however, touch EVERYTHING EVERYWHERE. Major risk.
34
What We Need:
- Ludicrous bandwidth @ network core
- Very fast (10-40Gbps) ToR, Edge, Campus links
- 1Gbps - 10Gbps connections to “outside”
- Switches/Routers/Firewalls that can support
small #s of very large data flows
35
Why this will be difficult to achieve
36
Issue #4
Social, trust & cultural issues
We lack the multi-year relationship and track record we’ve built with
facility, compute & storage teams. We are “strangers” to many WAN
and SecurityOps types
37
Issue #5
Our “deep bench” of internal expertise is lacking
Research IT usually has very good “shadow IT” skills but we don’t
have homegrown experts in BGP, Firewalls, Dark Fiber, Routing etc.
38
Issue #5
Cost. Cost. Cost.
Have you seen what Cisco charges for a 100Gbps line card?
39
Issue #5
Cisco. Cisco. Cisco.
The elephant in the room. Cisco rarely 1st choice for greenfield efforts
in this space but Cisco shops often refuse to entertain any
alternatives. Massive existing install base & on-premise expertise
must be balanced, recognized & carefully handled.
40
Issue #5
Firewalls, SecOps & Incumbent Vendors
Legacy security products supporting 10Gbps can cost $150,000+ and
still utterly fail to perform without heroic tuning & deep config magic.
Alternatives exist but massive institutional inertia to overcome. 



Deeply Challenging Issue.
Wrapping Up …
41
42
‣ Peta-scale becoming the norm, not exception
‣ Compute is a commodity; Storage getting there
‣ Historically it has been pretty easy to integrate
“Research Computing” with “Enterprise”
facilities and operational standards
‣ We can no longer assume the majority of our
infrastructure will reside in a single datacenter
43
‣ We need a massive increase in end-to-end
network connectivity & bandwidth
‣ … and kit that can handle large data flows
‣ Current state of “Enterprise” LAN/WAN
networking is not aligned with emerging needs:
‣ Cost, Capability, Performance, Security …
44
‣ New hardware, reference architectures, best
practices and methods will be required
‣ There is no easy path forward …
45
‣ And this brings us to …
‣ ScienceDMZ
46
‣ Science DMZ
‣ Only viable reference architecture &
collection of operational practices /
philosophy BioTeam has seen to date
‣ In-use today. Real world. No BS.
‣ High level visibility & support within US.GOV,
grant funding agencies and supporters of
data intensive science and R&E networks
47
‣ If you did not know why you were attending this
workshop today; hopefully you do now!
‣ Enjoy the rest of the talks!
48
end; Thanks!
slideshare.net/chrisdag/ chris@bioteam.net @chris_dag

More Related Content

What's hot

Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Chris Dagdigian
 
2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentation2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentation
Chris Dagdigian
 
Multi-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersMulti-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC Clusters
Chris Dagdigian
 
Bio-IT for Core Facility Managers
Bio-IT for Core Facility ManagersBio-IT for Core Facility Managers
Bio-IT for Core Facility Managers
Chris Dagdigian
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Chris Dagdigian
 
Trends from the Trenches: 2019
Trends from the Trenches: 2019Trends from the Trenches: 2019
Trends from the Trenches: 2019
Chris Dagdigian
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)
Chris Dagdigian
 
2021 Trends from the Trenches
2021 Trends from the Trenches2021 Trends from the Trenches
2021 Trends from the Trenches
Chris Dagdigian
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte Pushing
Chris Dagdigian
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?
mark madsen
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehouse
mark madsen
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)
mark madsen
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collection
mark madsen
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogies
mark madsen
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except us
mark madsen
 
Lean approach to IT development
Lean approach to IT developmentLean approach to IT development
Lean approach to IT developmentMark Krebs
 
IT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOsIT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOs
Vikram Ramesh
 
BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014
Ari Berman
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
Microsoft
 
Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)
Chris Dagdigian
 

What's hot (20)

Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
 
2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentation2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentation
 
Multi-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersMulti-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC Clusters
 
Bio-IT for Core Facility Managers
Bio-IT for Core Facility ManagersBio-IT for Core Facility Managers
Bio-IT for Core Facility Managers
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
 
Trends from the Trenches: 2019
Trends from the Trenches: 2019Trends from the Trenches: 2019
Trends from the Trenches: 2019
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)
 
2021 Trends from the Trenches
2021 Trends from the Trenches2021 Trends from the Trenches
2021 Trends from the Trenches
 
Practical Petabyte Pushing
Practical Petabyte PushingPractical Petabyte Pushing
Practical Petabyte Pushing
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehouse
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collection
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogies
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except us
 
Lean approach to IT development
Lean approach to IT developmentLean approach to IT development
Lean approach to IT development
 
IT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOsIT Performance Management Handbook for CIOs
IT Performance Management Handbook for CIOs
 
BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014BioTeam Trends from the Trenches - NIH, April 2014
BioTeam Trends from the Trenches - NIH, April 2014
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
 
Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)
 

Similar to 2015 CDC Workshop on ScienceDMZ

UCISA 2013 Presentation
UCISA 2013 PresentationUCISA 2013 Presentation
UCISA 2013 Presentation
DataIntegration
 
Top data center trends and predictions to watch for in 2016.
Top data center trends and predictions to watch for in 2016.Top data center trends and predictions to watch for in 2016.
Top data center trends and predictions to watch for in 2016.
Swaroopanand Laxmikruppaneth
 
2012: Trends from the Trenches
2012: Trends from the Trenches2012: Trends from the Trenches
2012: Trends from the Trenches
Chris Dagdigian
 
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons LearnedBio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Chris Dagdigian
 
Big data issues and challenges
Big data issues and challengesBig data issues and challenges
Big data issues and challenges
Dilpreet kaur Virk
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data Centers
Gina Buck
 
Big Data
Big DataBig Data
Future of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonFuture of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonKhazret Sapenov
 
Big Data & the importance of Data Science
Big Data & the importance of Data ScienceBig Data & the importance of Data Science
Big Data & the importance of Data Science
Wim Van Leuven
 
High-Performance Networking Use Cases in Life Sciences
High-Performance Networking Use Cases in Life SciencesHigh-Performance Networking Use Cases in Life Sciences
High-Performance Networking Use Cases in Life Sciences
Ari Berman
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
Bob Hardaway
 
Clouds, Grids and Data
Clouds, Grids and DataClouds, Grids and Data
Clouds, Grids and Data
Guy Coates
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
Tony Bain
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
Nagarjuna D.N
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
Denodo
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentation
Doug Denton
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
Slim Baltagi
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar
ibi
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessions
JessicaMurrell3
 
Keynote: Graphs in Government_Lance Walter, CMO
Keynote:  Graphs in Government_Lance Walter, CMOKeynote:  Graphs in Government_Lance Walter, CMO
Keynote: Graphs in Government_Lance Walter, CMO
Neo4j
 

Similar to 2015 CDC Workshop on ScienceDMZ (20)

UCISA 2013 Presentation
UCISA 2013 PresentationUCISA 2013 Presentation
UCISA 2013 Presentation
 
Top data center trends and predictions to watch for in 2016.
Top data center trends and predictions to watch for in 2016.Top data center trends and predictions to watch for in 2016.
Top data center trends and predictions to watch for in 2016.
 
2012: Trends from the Trenches
2012: Trends from the Trenches2012: Trends from the Trenches
2012: Trends from the Trenches
 
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons LearnedBio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
 
Big data issues and challenges
Big data issues and challengesBig data issues and challenges
Big data issues and challenges
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data Centers
 
Big Data
Big DataBig Data
Big Data
 
Future of cloud up presentation m_dawson
Future of cloud up presentation m_dawsonFuture of cloud up presentation m_dawson
Future of cloud up presentation m_dawson
 
Big Data & the importance of Data Science
Big Data & the importance of Data ScienceBig Data & the importance of Data Science
Big Data & the importance of Data Science
 
High-Performance Networking Use Cases in Life Sciences
High-Performance Networking Use Cases in Life SciencesHigh-Performance Networking Use Cases in Life Sciences
High-Performance Networking Use Cases in Life Sciences
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Clouds, Grids and Data
Clouds, Grids and DataClouds, Grids and Data
Clouds, Grids and Data
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentation
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessions
 
Keynote: Graphs in Government_Lance Walter, CMO
Keynote:  Graphs in Government_Lance Walter, CMOKeynote:  Graphs in Government_Lance Walter, CMO
Keynote: Graphs in Government_Lance Walter, CMO
 

Recently uploaded

How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Arif0071
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
nirahealhty
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
natyesu
 
ER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAEER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAE
Himani415946
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
laozhuseo02
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
laozhuseo02
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Sanjeev Rampal
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
3ipehhoa
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
JungkooksNonexistent
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
JeyaPerumal1
 
Output determination SAP S4 HANA SAP SD CC
Output determination SAP S4 HANA SAP SD CCOutput determination SAP S4 HANA SAP SD CC
Output determination SAP S4 HANA SAP SD CC
ShahulHameed54211
 
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptxLiving-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
TristanJasperRamos
 

Recently uploaded (16)

How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
 
ER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAEER(Entity Relationship) Diagram for online shopping - TAE
ER(Entity Relationship) Diagram for online shopping - TAE
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 
Output determination SAP S4 HANA SAP SD CC
Output determination SAP S4 HANA SAP SD CCOutput determination SAP S4 HANA SAP SD CC
Output determination SAP S4 HANA SAP SD CC
 
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptxLiving-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
Living-in-IT-era-Module-7-Imaging-and-Design-for-Social-Impact.pptx
 

2015 CDC Workshop on ScienceDMZ

  • 1. 1 ScienceDMZ: 
 Industry Trends & Widespread Drivers slideshare.net/chrisdag/ chris@bioteam.net @chris_dag
  • 2. 2 I’m Chris. I’m an infrastructure geek. I work for the BioTeam. (and CDC …)
  • 3. 3 Chris & Ari: Why 2 of Us Today? Answer: Ari concentrates on Federal/US.Gov while I deal mostly with commercial biotech/pharma, EDU and non-profit Orgs. 
 They are very different.
  • 4. 4 Bottom Line: Science evolves faster than IT can refresh infrastructure & practices
  • 5. 5 This is why we are here today.
  • 6. 6 Terabyte-scale Instruments & Lab Tools Cheap, easy to acquire and popping up EVERYWHERE HiSeq 2500 MiSeq NextSeq 500 And this …
  • 7. $1,000 human genome @ 30x coverage * some caveats 7 Illumina HiSeq X 10
  • 8. This data will be moving constantly … Illumina HiSeq x 10 ‣ Raw Instrument Data • +13 TB every 3 days ‣ FASTQ Conversion • +8 TB every 3 days ‣ Align -> Compressed BAM • +2 TB every three days ‣ Data Distribution • ? 8
  • 9. 9 Coming Soon To a Researcher Near You: USB-attached genomic sequencing Gulp.
  • 10. 10 Tipping Point #1 Effort/cost of generating or acquiring vast piles of data in 2015 is far less than real world cost of storing and managing that data through a realistic lifecycle.
  • 11. 11 Tipping Point #2 Scientists still believe storage is cheap & near-infinite. Data triage no longer sufficient. Scientists rarely asked to articulate a scientific/business case for storage.
  • 12. 12 Tipping Point #3 Centralized infrastructure models are not sufficient and must be modified. Data & compute WILL span sites and locations with or without active IT involvement. 
 
 We need to start preparing now.
  • 14. 14 “Center Of Gravity” Problem Current methods involving centralized storage and bringing “users” and “compute” very close “… to the data” are going to face significant problems in 2015 and beyond.
  • 15. 15 “Center Of Gravity” Pain #1 Terabyte class instruments. Everywhere. Gulp. 
 We can not stop this trend - large scale data generation will span labs, bulding, campus sites & WANs
  • 16. 16 “Center Of Gravity” Pain #2 Collaborations & Peta-scale Open Access Data 
 The future of large scale genomics|informatics increasingly involves multi-party / multi-site collaboration. Also: Petabytes of free data (!!)
  • 17. 17 “Center Of Gravity” Pain #3 Object Storage Less Effective @ Single Site 
 Object storage is the future of scientific data at rest. Some major side benefits (erasure coding, etc.) can only be realized when 3 or more sites are involved
  • 18. 18 “Center Of Gravity” Summarized Data spread is unavoidable. Effectively Unstoppable. 
 We have a WAN-scale data movement/access problem. There are ~2 viable approaches going forward ...
  • 19. 19 Option 1 - “Stay Centralized” Still totally viable but much faster connectivity to instruments & collaborators will be essential Nutshell: Significant investment in edge/WAN connectivity required, likely requiring bandwidth exceeding 10Gbps
  • 20. 20 Option 2 - “Go With The Flow” Embrace the distributed & “cloudy” future where compute & storage span multiple zones Nutshell: Still requires massive bandwidth upgrades to support metadata-aware or location-aware access & compute
  • 21. 21 It all boils down to …
  • 22. 22 Terabyte-scale data movement is going to be an informatics “grand challenge” for the next 2-3+ years And far harder/scarier than previous compute & storage challenges
  • 24. Long history of engagement & cooperation Research IT vs. Enterprise IT ‣ Historically our infrastructure requirements often surpassed what the Enterprise uses to sustain day to day operation ‣ We’ve spent ~20 years working closely with Enterprise IT to enable “data intensive science” ‣ Relatively easy to align informatics IT infrastructure with established vendor, product, technology and architecture standards 24
  • 25. Barely worth talking about in 2015 25 Computing Power ‣ 32 CPU cores to 60,000 cores - it almost does not matter ‣ Simple commodity ‣ Interesting & challenging but not insanely hard. ‣ Easy to acquire & deploy in 2015 at whatever scale is needed (budget permitting)
  • 26. Still a hassle but no longer intractable 26 Storage ‣ Petabyte-capable storage is no big deal in 2015 ‣ Pricing slowly being commoditized ‣ Many opportunities to do clever stuff or waste phenomenal amounts of money ‣ Biggest risk may be research driving towards object storage faster than Enterprise is willing to commit/support
  • 27. Hard but not insurmountable 27 Data Management ‣ Managing scientific data at rest is still very hard ‣ … but we have seen a few successful ways forward ‣ DIY/RDBMS/LIMS ‣ iRODS ‣ Object Storage
  • 29. 29 2015 Grand Challenge Large-scale Data Movement (and why this will be very difficult …)
  • 30. 30 Issue #1 Current LAN/WAN stacks bad for emerging use case Existing technology we’ve used for decades has been architected to support many small network flows; not a single big data flow
  • 31. 31 Issue #2 Ratio of LAN:WAN bandwidth is out of whack We will need faster links to “outside” than most organizations have anticipated or accounted for in long-term technology planning
  • 32. 32 Issue #3 Core, Campus, Edge and “Top of Rack” bandwidth Enterprise networking types can be *smug* about 10Gbps at the network core. Boy are they in for a bad surprise.
  • 33. 33 Issue #4 Bigger blast radius when stuff goes wrong Compute & storage can be logically or physically contained to minimize disruption/risk when Research does stupid things. 
 Networks, however, touch EVERYTHING EVERYWHERE. Major risk.
  • 34. 34 What We Need: - Ludicrous bandwidth @ network core - Very fast (10-40Gbps) ToR, Edge, Campus links - 1Gbps - 10Gbps connections to “outside” - Switches/Routers/Firewalls that can support small #s of very large data flows
  • 35. 35 Why this will be difficult to achieve
  • 36. 36 Issue #4 Social, trust & cultural issues We lack the multi-year relationship and track record we’ve built with facility, compute & storage teams. We are “strangers” to many WAN and SecurityOps types
  • 37. 37 Issue #5 Our “deep bench” of internal expertise is lacking Research IT usually has very good “shadow IT” skills but we don’t have homegrown experts in BGP, Firewalls, Dark Fiber, Routing etc.
  • 38. 38 Issue #5 Cost. Cost. Cost. Have you seen what Cisco charges for a 100Gbps line card?
  • 39. 39 Issue #5 Cisco. Cisco. Cisco. The elephant in the room. Cisco rarely 1st choice for greenfield efforts in this space but Cisco shops often refuse to entertain any alternatives. Massive existing install base & on-premise expertise must be balanced, recognized & carefully handled.
  • 40. 40 Issue #5 Firewalls, SecOps & Incumbent Vendors Legacy security products supporting 10Gbps can cost $150,000+ and still utterly fail to perform without heroic tuning & deep config magic. Alternatives exist but massive institutional inertia to overcome. 
 
 Deeply Challenging Issue.
  • 42. 42 ‣ Peta-scale becoming the norm, not exception ‣ Compute is a commodity; Storage getting there ‣ Historically it has been pretty easy to integrate “Research Computing” with “Enterprise” facilities and operational standards ‣ We can no longer assume the majority of our infrastructure will reside in a single datacenter
  • 43. 43 ‣ We need a massive increase in end-to-end network connectivity & bandwidth ‣ … and kit that can handle large data flows ‣ Current state of “Enterprise” LAN/WAN networking is not aligned with emerging needs: ‣ Cost, Capability, Performance, Security …
  • 44. 44 ‣ New hardware, reference architectures, best practices and methods will be required ‣ There is no easy path forward …
  • 45. 45 ‣ And this brings us to … ‣ ScienceDMZ
  • 46. 46 ‣ Science DMZ ‣ Only viable reference architecture & collection of operational practices / philosophy BioTeam has seen to date ‣ In-use today. Real world. No BS. ‣ High level visibility & support within US.GOV, grant funding agencies and supporters of data intensive science and R&E networks
  • 47. 47 ‣ If you did not know why you were attending this workshop today; hopefully you do now! ‣ Enjoy the rest of the talks!