AAMC	
  2013	
  Informa0on	
  Technology	
  in	
  Academic	
  Medicine	
  Conference	
  
Vancouver	
  CA	
  	
  	
  June	
...
Overview	
  
Internet2	
  Research	
  Support	
  
• 
• 
• 
• 

Community	
  and	
  Network	
  
Data-­‐intensive	
  Science...
Internet2	
  Community	
  	
  
	
  	
  	
  	
  	
  	
  	
  220	
  Universi0es	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  60	
...
Advanced	
  100G	
  Produc0on	
  and	
  Research	
  Network	
  

4	
  –	
  6/7/13,	
  ©	
  2010	
  Internet2	
  
Data	
  Tsunami	
  
Life	
  Sciences	
  

Physics	
  
Large	
  Hadron	
  Collider	
  

Image by: CERN
"

5	
  –	
  6/7/13,...
Visualizing	
  Big	
  Data	
  
Physics	
  

Life	
  Sciences	
  

LHC	
  –	
  Lead	
  Ion	
  Collision	
  

MRI	
  –	
  Mo...
Sequencing:	
  Smaller,	
  Faster,	
  Cheaper	
  

Illumina HiSeq 2500/1500
	
  

Source: http://www.illumina.com/systems/...
Democra0za0on	
  of	
  Sequencing	
  
2,386	
  Genome	
  Sequencers	
  Worldwide	
  –	
  30	
  May	
  2013	
  

Source: Ma...
North	
  American	
  Genome	
  Sequencers	
  
998	
  Sequencers	
  in	
  NA	
  –	
  30	
  May	
  2013	
  

Source: Map of ...
Sequencing	
  in	
  Vancouver	
  
13	
  Sequencers	
  at	
  the	
  Genome	
  Science	
  Center	
  

Source: Map of High-th...
Canarie	
  Weathermap	
  

11	
  –	
  6/7/13,	
  ©	
  2012	
  Internet2	
  
US-­‐based	
  Interna0onal	
  Exchange	
  Points	
  

US-­‐based	
  Exchange	
  Points	
  
StarLight,	
  Chicago	
  IL	
  ...
GEANT	
  Interna0onal	
  	
  

13	
  –	
  6/7/13,	
  ©	
  2011	
  Internet2	
  
APAN	
  

14	
  –	
  6/7/13,	
  ©	
  2012	
  Internet2	
  
14	
  –	
  6/7/13,	
  ©	
  2011	
  Internet2	
  
Synchronized	
  Genomic	
  Repositories:	
  NCBI,	
  EBI,	
  DDBJ	
  

15	
  –	
  6/7/13,	
  ©	
  2012	
  Internet2	
  
US	
  –	
  China	
  10	
  Gbps	
  Link	
  	
  
Dr.	
  Lin	
  Fang	
  

Fed	
  Ex:	
   	
  2	
  days	
  
Internet	
  +	
  F...
Innovation Platform
100	
  GigE	
  Layer	
  2	
  ConnecOon	
  

Science	
  DMZ	
  

SoWware	
  Defined	
  Networking	
  

S...
Innova0on	
  PlaLorm	
  Pilot	
  Sites	
  

18	
  –	
  6/7/13,	
  ©	
  2012	
  Internet2	
  
Mee0ng	
  the	
  Big	
  Data	
  Challenges	
  
Transport	
  
• 
• 
• 
• 

Science	
  DMZ	
  
PerfSONAR	
  Toolkit	
  
MaDD...
Challenge	
  #1:	
  Transport	
  

Science	
  DMZ	
  

hhp://fasterdata.es.net/science-­‐dmz/science-­‐dmz-­‐security/	
  ...
Performance	
  Monitoring	
  

21	
  –	
  6/7/13,	
  ©	
  2012	
  Internet2	
  
MaDDash	
  XSEDE	
  Tes0ng	
  Mesh	
  

22	
  –	
  6/7/13,	
  ©	
  2012	
  Internet2	
  
File	
  Transfer	
  Tools	
  
Unix	
  
LAN	
  Tools	
  

TCP	
  –	
  based	
  
Open	
  Source	
  

•  scp,	
  smp,	
  rsyn...
Tool	
  Speeds	
  
Berkeley,	
  CA	
  	
  çè	
  Argonne,	
  IL	
  	
  	
  RTT=53	
  

24	
  –	
  6/7/13,	
  ©	
  2012	
 ...
Challenge	
  #2:	
  Security	
  
Hardening	
  the	
  Science	
  DMZ	
  
• 
• 
• 
• 

ESnet	
  Big	
  Data	
  design	
  pah...
Federated	
  Iden0ty	
  Management	
  

450

Number of Participants

400
350
300
250
200
150
100
50
0

2004

26	
  –	
  6/...
NSTIC	
  –	
  Na0onal	
  Strategy	
  for	
  Trusted	
  Iden00es	
  in	
  Cyberspace	
  
• 
• 
• 
• 

White	
  House	
  ini...
Challenge	
  #3:	
  Storage	
  and	
  Compute	
  
•  Cloud	
  CompuOng	
  –	
  many	
  iniOaOves	
  
– 
– 
– 
– 

Private:...
NCI:	
  Cancer	
  Knowledge	
  Cloud	
  -­‐	
  RFI	
  
Summary	
  of	
  Community	
  Input	
  

hhps://wiki.nci.nih.gov/di...
NCBI:	
  Four	
  Different	
  Approaches	
  

Reduced	
  Data	
  
Size	
  

Incrementally	
  
Transfer	
  
Large	
  Files	
...
BioNimbus:	
  An	
  Open	
  Cloud	
  with	
  Protected	
  Data	
  

bionimbus.opensciencedatacloud.org	
  
EasyGenomics:	
  BGI’s	
  Cloud	
  Solu0on	
  

Source:	
  Xu	
  Xing,	
  Managing	
  Big	
  Data:	
  The	
  Genome	
  Cen...
Na0onal	
  Cyberinfrastructure	
  
•  XSEDE	
  
–  NSF-­‐funded	
  
–  Supercomputers	
  
–  HPC	
  resources	
  

•  Inte...
NCGAS Virtual Instrument
Indiana	
  University	
  
6	
  PB	
  	
  
Storage	
  

NSF-­‐Funded	
  or	
  
	
  XSEDE	
  Alloca...
Networking	
  Issues	
  for	
  Life	
  Sciences	
  Research	
  
Focused	
  Technical	
  Workshop	
  on	
  July	
  17-­‐	
 ...
Resources	
  
•  The	
  Fourth	
  Paradigm	
  –	
  Data-­‐Intensive	
  Scien0fic	
  Discovery	
  
–  http://research.micros...
Thank	
  You	
  
INTERNET2	
  SUPPORT	
  FOR	
  BIOMEDICAL	
  RESEARCH	
  
AAMC	
  2013	
  Informa0on	
  Technology	
  in	...
Upcoming SlideShare
Loading in …5
×

Internet2 Support for Biomedical Research

1,444 views
1,261 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,444
On SlideShare
0
From Embeds
0
Number of Embeds
436
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Internet2 Support for Biomedical Research

  1. 1. AAMC  2013  Informa0on  Technology  in  Academic  Medicine  Conference   Vancouver  CA      June  5-­‐7,  2013     Michael  Sullivan,  M.D.   Associate  Director,  Health  Sciences,  Internet2   Internet2  Support  for  Biomedical  Research  
  2. 2. Overview   Internet2  Research  Support   •  •  •  •  Community  and  Network   Data-­‐intensive  Science   Interna0onal  Collabora0on   Innova0on  PlaLorm     Big  Data  Challenges   •  Transport   •  Security   •  Storage  and  Compute   2  –  6/7/13,  ©  2012  Internet2  
  3. 3. Internet2  Community                  220  Universi0es                    60  Corpora0ons                    70  Government  agencies                    38  Regional  and  state  networks                    65  Interna0onal  R&E  networks   3  –  6/7/13,  ©  2010  Internet2  
  4. 4. Advanced  100G  Produc0on  and  Research  Network   4  –  6/7/13,  ©  2010  Internet2  
  5. 5. Data  Tsunami   Life  Sciences   Physics   Large  Hadron  Collider   Image by: CERN " 5  –  6/7/13,  ©  2012  Internet2   Magne0c  Resonance  Imager  (MRI)  
  6. 6. Visualizing  Big  Data   Physics   Life  Sciences   LHC  –  Lead  Ion  Collision   MRI  –  Monkey  Brain   Source: CERN (ALICE detector) " Source: Van Wedeen, M.D., Martinos Center and Dept. of Radiology, Massachusetts General Hospital and Harvard University Medical School " 6  –  6/7/13,  ©  2012  Internet2  
  7. 7. Sequencing:  Smaller,  Faster,  Cheaper   Illumina HiSeq 2500/1500   Source: http://www.illumina.com/systems/hiseq_systems/hiseq_2500_1500.ilmn " 7  –  6/7/13,  ©  2012  Internet2   Handheld USB Sequencer " Image: Oxford Nanopore Technologies "
  8. 8. Democra0za0on  of  Sequencing   2,386  Genome  Sequencers  Worldwide  –  30  May  2013   Source: Map of High-throughput Sequencers" 8  –  6/7/13,  ©  2012  Internet2  
  9. 9. North  American  Genome  Sequencers   998  Sequencers  in  NA  –  30  May  2013   Source: Map of High-throughput Sequencers" 9  –  6/7/13,  ©  2012  Internet2  
  10. 10. Sequencing  in  Vancouver   13  Sequencers  at  the  Genome  Science  Center   Source: Map of High-throughput Sequencers" 10  –  6/7/13,  ©  2012  Internet2  
  11. 11. Canarie  Weathermap   11  –  6/7/13,  ©  2012  Internet2  
  12. 12. US-­‐based  Interna0onal  Exchange  Points   US-­‐based  Exchange  Points   StarLight,  Chicago  IL   MAN  LAN,  New  York  NY   NGIX-­‐East,  College  Park  MD   Atlan0cWave  (distributed)   AMPATH,  Miami  FL   PacificWave-­‐S,  Los  Angeles  CA   PacificWave-­‐N,  Seahle  WA   12  –  6/7/13,  ©  2011  Internet2  
  13. 13. GEANT  Interna0onal     13  –  6/7/13,  ©  2011  Internet2  
  14. 14. APAN   14  –  6/7/13,  ©  2012  Internet2   14  –  6/7/13,  ©  2011  Internet2  
  15. 15. Synchronized  Genomic  Repositories:  NCBI,  EBI,  DDBJ   15  –  6/7/13,  ©  2012  Internet2  
  16. 16. US  –  China  10  Gbps  Link     Dr.  Lin  Fang   Fed  Ex:    2  days   Internet  +  FTP:   26  hours   China-­‐US  10G  Link:   30  seconds   Sample.fa   (24GB)   16  –  6/7/13,  ©  2012  Internet2   Dr.  Dawei  Lin  
  17. 17. Innovation Platform 100  GigE  Layer  2  ConnecOon   Science  DMZ   SoWware  Defined  Networking   SDN  Control  Server   Internet2   innovaOon   backbone   delivered   as  100G  L1   High-­‐Performance   Layer  2/3   Switch/Router   TradiOonal   regional  and   commodity   providers   Performance  Node   Switches,  data  stores  for   data-­‐intensive  science   IP  Network   Layer  3   GENI   Experiments   Your  Research   StaOc   Layer  2   Dynamic   Layer  2   GENI   ?   For  more  informaOon,  see   fasterdata.es.net   InnovaOon  Services   TradiOonal  Switch   Substrate   TradiOonal  L3  Campus  Border  Security   17  –  6/7/13,  ©  2012  Internet2   TR-­‐CPS   TradiOonal  Services   TradiOonal   Campus   Border  Router   Campus   Enterprise   Network   R&E  IP   SoWware  Defined  Networking   Substrate   OpOcal  System   Dark  Fiber   www.internet2.edu  
  18. 18. Innova0on  PlaLorm  Pilot  Sites   18  –  6/7/13,  ©  2012  Internet2  
  19. 19. Mee0ng  the  Big  Data  Challenges   Transport   •  •  •  •  Science  DMZ   PerfSONAR  Toolkit   MaDDash  Tes0ng  Mesh   File  Transfer  Tools   Security   •  Science  DMZ  Hardening   •  Federated  IdM:  InCommon  and  NSTIC   Storage  and  Compute   •  Storage  and  Compute     19  –  6/7/13,  ©  2012  Internet2  
  20. 20. Challenge  #1:  Transport   Science  DMZ   hhp://fasterdata.es.net/science-­‐dmz/science-­‐dmz-­‐security/   20  –  6/7/13,  ©  2012  Internet2  
  21. 21. Performance  Monitoring   21  –  6/7/13,  ©  2012  Internet2  
  22. 22. MaDDash  XSEDE  Tes0ng  Mesh   22  –  6/7/13,  ©  2012  Internet2  
  23. 23. File  Transfer  Tools   Unix   LAN  Tools   TCP  –  based   Open  Source   •  scp,  smp,  rsync  –  poor  choices  for  WAN  (RTT  >  25ms)   •  scp  with  HPN  patch  –  beher  but  s0ll  has  limita0ons   •  Globus  Online  –  hhp://www.globusonline.org   –  Uses  GridFTP  with  TCP  op0miza0ons   –  Friendly  GUI,  Fire  and  Forget,  Galaxy  integra0on   UDP  –  based   •  Aspera:  hhp://www.asperasom.com/   Commercial   •  Annai  Systems:  hhp://www.annaisystems.com   23  –  6/7/13,  ©  2012  Internet2  
  24. 24. Tool  Speeds   Berkeley,  CA    çè  Argonne,  IL      RTT=53   24  –  6/7/13,  ©  2012  Internet2  
  25. 25. Challenge  #2:  Security   Hardening  the  Science  DMZ   •  •  •  •  ESnet  Big  Data  design  pahern   Internet2  Innova0on  PlaLorm   NSF  CC-­‐NIE  grants   University  of  Florida   –  –  –  –  HIPAA  alignment   Efficient  encryp0on   Comprehensive  logging   Robust  authen0ca0on   25  –  6/7/13,  ©  2012  Internet2   Source:  www.securearc.com    
  26. 26. Federated  Iden0ty  Management   450 Number of Participants 400 350 300 250 200 150 100 50 0 2004 26  –  6/7/13,  ©  2012  Internet2   2005 2006 2007 2008 2009 2010 2011 2012 (June)
  27. 27. NSTIC  –  Na0onal  Strategy  for  Trusted  Iden00es  in  Cyberspace   •  •  •  •  White  House  iniOaOve  administered  by  NIST   Goal  is  to  create  an  “IdenOty  Ecosystem”   IDEGS  –  IdenOty  Ecosystem  Steering  Group   Five  awards  for  pilots  spanning  mulOple  sectors:   –  –  –  –  –  Resilient  Network  Systems,  AMA,  Aetna,  ACC,  NeHC,  …   Criterion  Systems,  ID/DataWeb,  AOL,  Experian,  Ping  Iden0ty,  …   Daon,  Inc.,  AARP,  PayPal,  Purdue,  …   American  Assoc.  of  Motor  Vehile  Admins,  Microsom,  AT&A,  etc…     Internet2,  Carnegie  Mellon,  Brown,  MIT,  U.  of  Texas,  U.  of  Utah…   27  –  6/7/13,  ©  2012  Internet2  
  28. 28. Challenge  #3:  Storage  and  Compute   •  Cloud  CompuOng  –  many  iniOaOves   –  –  –  –  Private:  NCI  bake-­‐off  to  create  Cancer  Knowledge  Clouds   Public/Private:  AWS  EC2  instances  ––  [100G]  ––  NCBI  repository   Open  Cloud:  BioNimbus  Protected  Data  Cloud   Proprietary:  BGI  EasyGenomics  Cloud   •  NaOonal  Cyberinfrastructure   –  XSEDE   –  Internet2   –  NCGAS     28  –  6/7/13,  ©  2012  Internet2  
  29. 29. NCI:  Cancer  Knowledge  Cloud  -­‐  RFI   Summary  of  Community  Input   hhps://wiki.nci.nih.gov/display/NCIPinput/Summary+of+Input+Request%3A+Computa0onal+Needs+to+Support+Large-­‐Scale+Genomics+Inves0ga0ons   29  –  6/7/13,  ©  2012  Internet2  
  30. 30. NCBI:  Four  Different  Approaches   Reduced  Data   Size   Incrementally   Transfer   Large  Files   High  Speed   Network   Connec0ons   Cloud  Access   and  Support   Source:  Don  Preuss,  NCBI  Experiences  and  Big  Data  Strategy,  presented  at  2013  Internet2  Annual   Mee0ng,  Arlington,  VA   30  –  ©  2013  Internet2  
  31. 31. BioNimbus:  An  Open  Cloud  with  Protected  Data   bionimbus.opensciencedatacloud.org  
  32. 32. EasyGenomics:  BGI’s  Cloud  Solu0on   Source:  Xu  Xing,  Managing  Big  Data:  The  Genome  Center  PerspecBve,  presented  at  Bio-­‐IT  World   Conference  &  Expo  ‘13,  Boston,  MA   32  –  6/7/13,  ©  2012  Internet2  
  33. 33. Na0onal  Cyberinfrastructure   •  XSEDE   –  NSF-­‐funded   –  Supercomputers   –  HPC  resources   •  Internet2   –  220  universi0es   –  XSEDEnet   •  NCGAS   –  –  –  –  Indiana  University   TACC   SDSC   PSC   33  –  6/7/13,  ©  2012  Internet2   Source:  hhps://www.xsede.org/networking  
  34. 34. NCGAS Virtual Instrument Indiana  University   6  PB     Storage   NSF-­‐Funded  or    XSEDE  Alloca0on   NCGAS   Galaxy     Portal   5.5  PB     Storage   SDSC   Mason   5  PB     D.C.   POD   Galaxy     Portal   TACC   100  Gig    Internet2   POD   4  PB     Storage   Federally  Funded   10  Gig    NLR   Sequencing  Center   Source:  Barneh,  W.K.,  and  R.D.  LeDuc,  Next  GeneraBon  Cyberinfrastructures  for  Next  GeneraBon   Sequencing  and  Genome  Science,  presented  at  2013  AAMC  GIR  Conference,  Vancouver,  BC   NCBI   PSC  
  35. 35. Networking  Issues  for  Life  Sciences  Research   Focused  Technical  Workshop  on  July  17-­‐  18,  2013   Lawrence  Berkeley  NaOonal  Laboratory   Berkeley,  California     •  Building  on  the  success  of  Joint  Techs,  mee0ng  will  bring  together   technical  experts  in  a  smaller  seyng  with  domain  scien0sts.     •  Workshop  will  include  a  slate  of  invited  speakers  and  panels.   •  Format  to  encourage  lively,  interac0ve  discussions  with  the  goal  of   developing  a  set  of  tangible  next  steps  for  suppor0ng  this  data-­‐intensive   science  community   •  Four  sub-­‐topic  areas:    Network  Architectures,  Workflow  Engines,  Public   and  Private  Cloud  Architectures,  and  Data  Movement  Tools   •  See:    hhp://events.internet2.edu/2013/mw-­‐life-­‐sciences/index.cfm   35  –  6/7/13,  ©  2012  Internet2  
  36. 36. Resources   •  The  Fourth  Paradigm  –  Data-­‐Intensive  Scien0fic  Discovery   –  http://research.microsoft.com/en-us/collaboration/fourthparadigm/   •  Internet2  Network  and  Innova0on  PlaLorm   –  http://www.internet2.edu/network/   •  Science  DMZ   –  http://fasterdata.es.net/science-dmz/   •  perfSONAR   –  http://www.perfsonar.net/   Contact   •  Internet2  Research  Support  Center   –  rs@internet2.edu •  Internet2  Life  Sciences  –  Michael  Sullivan,  MD,  Associate  Director   –  msullivan@internet2.edu   36  –  6/7/13,  ©  2012  Internet2  
  37. 37. Thank  You   INTERNET2  SUPPORT  FOR  BIOMEDICAL  RESEARCH   AAMC  2013  Informa0on  Technology  in  Academic  Medicine  Conference   Vancouver  CA      June  5-­‐7,  2013     Michael  Sullivan,  M.D.   Associate  Director,  Health  Sciences,  Internet2   37  –  6/7/13,  ©  2012  Internet2  

×