gannon@Indiana.edu
Dennis.gannon@outlook.com
gannon@Indiana.edu
Dennis.gannon@outlook.com
Every research field is now a data
science field
Last
few decades
Thousand
years ago
Today and the FutureLast few
hundred years
2
2
2.
3
4
a
cG
a
a










Simulation of
complex phenomena
Newton’s laws,
Maxwell’s equations…
Description of natural
phenomena
Unify theory, experiment and
simulation with large
multidisciplinary Data
Using data exploration and
data mining
(from instruments, sensors,
humans…)
Distributed Communities
The Long Tail of Science
Programming tools: Scala, IPython, Azure ML, …
Frameworks: Spark, Hadoop, Yarn, HDInsight, Reef, Twister, Brisk
Software Defined Storage
Software Defined Networks
Hardware Abstraction/Virtualization
Container and Distributed Cluster OS (Docker, Mesosphere, K8S)
http://tce.technion.ac.il/files/2012/06/Scott-shenker.pdf
www.opennetsummit.org/pdf/2013/presentations/albert_greenberg.pdf
http://www.cs.princeton.edu/~jrex/papers/pyretic-login13.pdf
The IPython notebook deployed on a VM.
1. Create a coreos VM and open the https port 443
2. Login and issue this one command
$docker run -d -p 443:8888 -e "PASSWORD=****" ipython/scipyserver
3. Go to https://yourVMaddress and log in.
compute
compute
compute
compute
compute
compute
compute
compute
compute
compute
compute
compute
Vowpal Wabbit? Datumbox?
Marathon
Master
node
Master
Backup
Worker
node
Worker
node
Worker
node
Mesos
Zookeeper
Cloud
SDN
NIH data
commons
• Many Examples
• Challenges:
• I only talked about the Analysis … but it there is more
• Sustainability
• Sharing
• Reproducible Science
Data
Acquisition &
modelling
Collaboration
and
visualisation
Analysis &
data mining
Dissemination
& sharing
Archiving and
preserving
ieee cloud 2015 keynote talk
ieee cloud 2015 keynote talk
ieee cloud 2015 keynote talk

ieee cloud 2015 keynote talk