Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
gannon@Indiana.edu 
Dennis.gannon@outlook.com
IT PAC
Melbourne 
Sydney 
Brazil 
Beijing
Programming tools: Scala, IPython, Azure ML, … 
Frameworks: Spark, Hadoop, Yarn, HDInsight, Reef, Twister, Brisk 
Software...
http://tce.technion.ac.il/files/2012/06/Scott-shenker.pdfwww.opennetsummit.org/pdf/2013/presentations/albert_greenberg.pdf...
The Science Perspective
Every research field is now a data science field
Last 
few decades 
Thousand 
years ago 
Last few Today and the Future 
hundred years 
2 
2 
2 
. 
3 
4 
a 
G c 
a 
a 
 ...
Video Link
Inputs (training data) 
Labels 
Hidden layers 
Input data 
Detected features 
Mona Lisa
•The Genetic Causes of Disease (David Heckerman) 
•WellcomeTrust for a GWAS for a large population 
•Looking for causes fo...
Chameleon Cloud 
SDN 
NIH data commons
Mesos 
Tachyon 
Docker 
Spark 
Data Analytics and ML programming tools 
Reef 
Twister
•Many Examples 
•The Challenge: sustainability 
Data 
Acquisition & modelling 
Collaboration and visualisation 
Analysis &...
Cloud hpc-bigdata-challenges
Cloud hpc-bigdata-challenges
Cloud hpc-bigdata-challenges
Cloud hpc-bigdata-challenges
Cloud hpc-bigdata-challenges
Cloud hpc-bigdata-challenges
Cloud hpc-bigdata-challenges
Cloud hpc-bigdata-challenges
Cloud hpc-bigdata-challenges
Cloud hpc-bigdata-challenges
Cloud hpc-bigdata-challenges
Cloud hpc-bigdata-challenges
Cloud hpc-bigdata-challenges
Cloud hpc-bigdata-challenges
Cloud hpc-bigdata-challenges
Upcoming SlideShare
Loading in …5
×

0

Share

Download to read offline

Cloud hpc-bigdata-challenges

Download to read offline

presentation for keynote at NSF Cloud Workshop for Chameleon and Cloudlab

  • Be the first to like this

Cloud hpc-bigdata-challenges

  1. 1. gannon@Indiana.edu Dennis.gannon@outlook.com
  2. 2. IT PAC
  3. 3. Melbourne Sydney Brazil Beijing
  4. 4. Programming tools: Scala, IPython, Azure ML, … Frameworks: Spark, Hadoop, Yarn, HDInsight, Reef, Twister, Brisk Software Defined Storage Software Defined Networks Hardware Abstraction/Virtualization
  5. 5. http://tce.technion.ac.il/files/2012/06/Scott-shenker.pdfwww.opennetsummit.org/pdf/2013/presentations/albert_greenberg.pdfhttp://www.cs.princeton.edu/~jrex/papers/pyretic-login13.pdf
  6. 6. The Science Perspective
  7. 7. Every research field is now a data science field
  8. 8. Last few decades Thousand years ago Last few Today and the Future hundred years 2 2 2 . 3 4 a G c a a               Simulation of complex phenomena Newton’s laws, Maxwell’s equations… Description of natural phenomena Unify theory, experiment and simulation with large multidisciplinary Data Using data exploration and data mining (from instruments, sensors, humans…) Distributed Communities
  9. 9. Video Link
  10. 10. Inputs (training data) Labels Hidden layers Input data Detected features Mona Lisa
  11. 11. •The Genetic Causes of Disease (David Heckerman) •WellcomeTrust for a GWAS for a large population •Looking for causes for seven common diseases (bipolar, r. arthritis, coronary, hypertension, ….) •Confounding is a problem. Needed a new algorithm. •Ran on Azure cloud using 35,000 cores in 3 weeks.
  12. 12. Chameleon Cloud SDN NIH data commons
  13. 13. Mesos Tachyon Docker Spark Data Analytics and ML programming tools Reef Twister
  14. 14. •Many Examples •The Challenge: sustainability Data Acquisition & modelling Collaboration and visualisation Analysis & data mining Dissemination & sharing Archiving and preserving

presentation for keynote at NSF Cloud Workshop for Chameleon and Cloudlab

Views

Total views

591

On Slideshare

0

From embeds

0

Number of embeds

23

Actions

Downloads

13

Shares

0

Comments

0

Likes

0

×