Drowning in data
• The need to deal with
  and benefit from large
  quantities of data is not
  a new concept: it has
  been noted in many
  policy
  reports, particularly in
  the US and UK, over the
  past several years.

                              Source: Ian Foster, UoChicago
                                                              1
The Data Deluge
                                                       2004: 36 TB
                                                       2012: 2,300 TB



Genomic sequencing output x2 every         Climate
9 month                                    model intercomparison
                                           project (CMIP) of the IPCC


                                           MACHO et al.: 1 TB
                                              Palomar: 3 TB
                                              2MASS: 10 TB
                                               GALEX: 30 TB
                                                Sloan: 40 TB
                                               Pan-STARRS:
                                                  40,000 TB
             1330 molec. bio databases
 Nucleic Acids Research (96 in Jan 2001)                         Source: Ian Foster, UoChicago
Big science has achieved big successes
                                   OSG: 1.4M CPU-
                                   hours/day, >90 sites, >3000
                                   users,
                                   >260 pubs in 2010


LIGO: 1 PB data in last science
run, distributed worldwide
 Robust production solutions
 Substantial teams and expense
 Sustained, multi-year effort
 Application-specific solutions,
  built on common technology


ESG: 1.2 PB climate data
delivered to 23,000 users; 600+ pubs
                                                    Source: Ian Foster, UoChicago
Growth in sensor networks and Citizen
              Science




                                              Glacier Tracking




  Real Time Health Monitoring


                                Smart Trash
                                                                 4
NSF Vision




             5
Critical Factors




                   Source: NSF   6
But small & medium science in Canada is
                   struggling




More data, more complex data
Ad-hoc solutions
Inadequate software, hardware
Data plan mandates

                                    Source: Ian Foster, UoChicago
Time-consuming tasks in science
                          • Communicate with
• Run experiments           colleagues
• Collect data            • Publish papers
• Manage data             • Find, configure, install
• Move data                 relevant software
• Acquire computers       • Find, access, analyze
• Analyze data              relevant data
• Run simulations         • Order supplies
• Compare experiment      • Write proposals
  with simulation         • Write reports
• Search the literature   • …         Source: Ian Foster, UoChicago
                                                                  8
SaaS services in action: The XSEDE
                      vision
Academic institution                                                    = Standard
                                                                          interface


                                     XUAS
                              Globus Online: Hosted persistent services

                           User   Team    Catalog   Transfer       Compute             ...
                       2
InCommon




                                          ...                            Open
                           Commercial             Data                  Science
  XSEDE service provider    provider            provider                 Grid


                                                       Source: Ian Foster, UoChicago
                                                                                   9
The real cost of campus computing
• HPC represents 15-20% of campus
  electrical energy at many Canadian
  universities*

• Closet clusters consume 5-10% of
  campus electricity*

• Universities collectively spending
  millions of dollars on capital cost
  and electrical energy of computing
                                                     Belady, C., “In the Data Center, Power and Cooling Costs
                                                     More than IT Equipment it Supports”, Electronics Cooling
                                                     Magazine (February 2007)


                                                                    Source: Christian Belady

* Studies undertaken by CANARIE of 4 universities: UBC, Dalhousie, Ottawa U, UoAlberta
Research Computing Pyramid
                   Compute, compute, compute

                                  Petascale/Exascale/…                                 102


                                         National HPC infrastructure




                                                                                         Capable Users
  Compute Canada
                                            University HPC infrastructure

Role for cloud
computing                                         Closet clusters

                                                       Mobile/Desktop                  109
                                                       computing

                      Data, data, data
                                                                    Source: Dan Reed, PCAST              11
USA & Europe programs -commercial
     clouds to support research
• US Government $200 million “Big Data for Research and Discovery”
  research universities, government labs and commercial cloud
  providers
   – For example 1000 person genome project stored on Amazon with free
     access to researchers
   – Grants available to researcher to use Amazon tools to undertake
     computation

• European public –private clouds for research partnership –
  “European Cloud Partnership”
   – CERN, European Space Agency, European Molecular Laboratory plus
     several Internet companies

• Network organizations in USA, UK , Netherlands etc are brokering
  commercial cloud services for research and education to
  significantly reduce costs
                                                                       12
Other Canadian initiatives
• CANARIE + Compute Canada
   – “Integrated Digital Infrastructure”
   – Integrating networks and HPC



• Research directions being
  determined by the
  infrastructure?

• Workshop in Saskatoon in
  June

                                           13
Questions for attendees
1.   Should Canada pursue a research cyber-infrastructure and/or Big Data strategy?

2.   Do we need an organization or leadership council to promote a cyber-
     infrastructure or Big Data strategy in Canada?

3.   Given Canada is so far behind, should we partner with international groups such
     as XSEDE, NeCTAR, etc

4.   Should we focus on those who need the most help – small and medium science
     in Canada?

5.   Who should lead cyber-infrastructure in Canada? Researchers, infrastructure
     providers, funding councils, VPRs, CIOs, Government?

6.   Is it the role of universities to operate 1 MW power plants and massive compute
     facilities that are identical to commercial facilities?



                                                                                   14

Cifar

  • 1.
    Drowning in data •The need to deal with and benefit from large quantities of data is not a new concept: it has been noted in many policy reports, particularly in the US and UK, over the past several years. Source: Ian Foster, UoChicago 1
  • 2.
    The Data Deluge 2004: 36 TB 2012: 2,300 TB Genomic sequencing output x2 every Climate 9 month model intercomparison project (CMIP) of the IPCC MACHO et al.: 1 TB Palomar: 3 TB 2MASS: 10 TB GALEX: 30 TB Sloan: 40 TB Pan-STARRS: 40,000 TB 1330 molec. bio databases Nucleic Acids Research (96 in Jan 2001) Source: Ian Foster, UoChicago
  • 3.
    Big science hasachieved big successes OSG: 1.4M CPU- hours/day, >90 sites, >3000 users, >260 pubs in 2010 LIGO: 1 PB data in last science run, distributed worldwide Robust production solutions Substantial teams and expense Sustained, multi-year effort Application-specific solutions, built on common technology ESG: 1.2 PB climate data delivered to 23,000 users; 600+ pubs Source: Ian Foster, UoChicago
  • 4.
    Growth in sensornetworks and Citizen Science Glacier Tracking Real Time Health Monitoring Smart Trash 4
  • 5.
  • 6.
    Critical Factors Source: NSF 6
  • 7.
    But small &medium science in Canada is struggling More data, more complex data Ad-hoc solutions Inadequate software, hardware Data plan mandates Source: Ian Foster, UoChicago
  • 8.
    Time-consuming tasks inscience • Communicate with • Run experiments colleagues • Collect data • Publish papers • Manage data • Find, configure, install • Move data relevant software • Acquire computers • Find, access, analyze • Analyze data relevant data • Run simulations • Order supplies • Compare experiment • Write proposals with simulation • Write reports • Search the literature • … Source: Ian Foster, UoChicago 8
  • 9.
    SaaS services inaction: The XSEDE vision Academic institution = Standard interface XUAS Globus Online: Hosted persistent services User Team Catalog Transfer Compute ... 2 InCommon ... Open Commercial Data Science XSEDE service provider provider provider Grid Source: Ian Foster, UoChicago 9
  • 10.
    The real costof campus computing • HPC represents 15-20% of campus electrical energy at many Canadian universities* • Closet clusters consume 5-10% of campus electricity* • Universities collectively spending millions of dollars on capital cost and electrical energy of computing Belady, C., “In the Data Center, Power and Cooling Costs More than IT Equipment it Supports”, Electronics Cooling Magazine (February 2007) Source: Christian Belady * Studies undertaken by CANARIE of 4 universities: UBC, Dalhousie, Ottawa U, UoAlberta
  • 11.
    Research Computing Pyramid Compute, compute, compute Petascale/Exascale/… 102 National HPC infrastructure Capable Users Compute Canada University HPC infrastructure Role for cloud computing Closet clusters Mobile/Desktop 109 computing Data, data, data Source: Dan Reed, PCAST 11
  • 12.
    USA & Europeprograms -commercial clouds to support research • US Government $200 million “Big Data for Research and Discovery” research universities, government labs and commercial cloud providers – For example 1000 person genome project stored on Amazon with free access to researchers – Grants available to researcher to use Amazon tools to undertake computation • European public –private clouds for research partnership – “European Cloud Partnership” – CERN, European Space Agency, European Molecular Laboratory plus several Internet companies • Network organizations in USA, UK , Netherlands etc are brokering commercial cloud services for research and education to significantly reduce costs 12
  • 13.
    Other Canadian initiatives •CANARIE + Compute Canada – “Integrated Digital Infrastructure” – Integrating networks and HPC • Research directions being determined by the infrastructure? • Workshop in Saskatoon in June 13
  • 14.
    Questions for attendees 1. Should Canada pursue a research cyber-infrastructure and/or Big Data strategy? 2. Do we need an organization or leadership council to promote a cyber- infrastructure or Big Data strategy in Canada? 3. Given Canada is so far behind, should we partner with international groups such as XSEDE, NeCTAR, etc 4. Should we focus on those who need the most help – small and medium science in Canada? 5. Who should lead cyber-infrastructure in Canada? Researchers, infrastructure providers, funding councils, VPRs, CIOs, Government? 6. Is it the role of universities to operate 1 MW power plants and massive compute facilities that are identical to commercial facilities? 14

Editor's Notes

  • #2 The need to deal with and benefit from large quantities of data is not a new concept: it has been noted in many policy reports, particularly in the US and UK, over the past several years.
  • #9 So let’s look at that list again.I and my colleagues started an effort a little while ago aimed at applying SaaS to one of these tasks …