Every research field is now a data
science field
Last
few decades
Thousand
years ago
Today and the FutureLast few
hundred years
2
2
2.
3
4
a
cG
a
a










Simulation of
complex phenomena
Newton’s laws,
Maxwell’s equations…
Description of natural
phenomena
Unify theory, experiment and
simulation with large
multidisciplinary Data
Using data exploration and
data mining
(from instruments, sensors,
humans…)
Distributed Communities
7
Melbourne
Sydney
IT PAC
application
building
blocks
Web Portal
User
Browser
Task
Queue2
1
Executable / Data
(Windows Azure Storage)
Compute Nodes
3
5
4
6
7
ChronoZoom: An infinite canvas in time
• Many Examples
• The Challenge: sustainability
• Manage locality
• Keep the hot data local on cloud disk
• Manage the working set over time
• The rest is archival
Data
Acquisition &
modelling
Collaboration
and
visualisation
Analysis &
data mining
Dissemination
& sharing
Archiving and
preserving
• A core library for science in the
cloud.
• Built on community tools
• Ipython Notebook, Python, NumPy,
SciPy, Scikit-Learn, biopython
• Standard community tools
• Deploy as VM library
• Deploy data collections.
• Genomic libraries, medical image
libraries, geophysics, astronomy, etc.
• Build community resources
• The Genetic Causes of Disease
(David Heckerman)
• Wellcome Trust for a GWAS for a large
population
• Looking for causes for seven common
diseases (bipolar, r. arthritis, coronary,
hypertension, ….)
• Confounding is a problem. Needed a
new algorithm.
• Ran on Azure cloud using 35,000 cores
in 3 weeks.
Inputs (training data)
Labels
Hidden layers
Input dataDetected featuresMona Lisa
The Windows Azure for Research program:
·
Free access to Windows Azure cloud computing and storage
(submit proposals for Windows Azure Research Awards)
· Windows Azure for Research training classes (20 classes
worldwide. )
· Support and technical resources
azure4research.com.
Keynote IEEE International Workshop on Cloud Analytics. Dennis  Gannon
Keynote IEEE International Workshop on Cloud Analytics. Dennis  Gannon

Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon

  • 3.
    Every research fieldis now a data science field
  • 4.
    Last few decades Thousand years ago Todayand the FutureLast few hundred years 2 2 2. 3 4 a cG a a           Simulation of complex phenomena Newton’s laws, Maxwell’s equations… Description of natural phenomena Unify theory, experiment and simulation with large multidisciplinary Data Using data exploration and data mining (from instruments, sensors, humans…) Distributed Communities
  • 7.
  • 8.
  • 10.
  • 20.
    Web Portal User Browser Task Queue2 1 Executable /Data (Windows Azure Storage) Compute Nodes 3 5 4 6 7
  • 22.
  • 25.
    • Many Examples •The Challenge: sustainability • Manage locality • Keep the hot data local on cloud disk • Manage the working set over time • The rest is archival Data Acquisition & modelling Collaboration and visualisation Analysis & data mining Dissemination & sharing Archiving and preserving
  • 27.
    • A corelibrary for science in the cloud. • Built on community tools • Ipython Notebook, Python, NumPy, SciPy, Scikit-Learn, biopython • Standard community tools • Deploy as VM library • Deploy data collections. • Genomic libraries, medical image libraries, geophysics, astronomy, etc. • Build community resources
  • 32.
    • The GeneticCauses of Disease (David Heckerman) • Wellcome Trust for a GWAS for a large population • Looking for causes for seven common diseases (bipolar, r. arthritis, coronary, hypertension, ….) • Confounding is a problem. Needed a new algorithm. • Ran on Azure cloud using 35,000 cores in 3 weeks.
  • 33.
    Inputs (training data) Labels Hiddenlayers Input dataDetected featuresMona Lisa
  • 36.
    The Windows Azurefor Research program: · Free access to Windows Azure cloud computing and storage (submit proposals for Windows Azure Research Awards) · Windows Azure for Research training classes (20 classes worldwide. ) · Support and technical resources azure4research.com.