Immersive Teaching
and Research
in Data Sciences
via Cloud Computing
Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk...
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese an...
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the sprea...
22
Outline
 Introduction
 The Rise of Data Science
 What is missing on the Cloud for Research and
Education?
 Elastic-...
33
Introduction
 Science and the 4th Paradigm
 The promises of Grid Computing for e-Research
 The e-Science program (UK...
4
Introduction
The Dancing Bear
The townspeople gather to see the wondrous
sight as the massive, lumbering beast shambles
...
55
Introduction
 The Data Deluge and the challenges of Big Data
* http://www.slideshare.net/AmazonWebServices/20120620-aw...
6
The Rise of Data Science
* http://en.wikipedia.org/wiki/Data_science
Data Scientists Led by NASA Star Most Sought for Ce...
77
What is missing on the Cloud
for Research and Education?
 Science-as-a-Service: (SCEs) Scientific Computing
Environmen...
8
Elastic-R: The Next Generation
Data Science Platform
Arduino / Raspberry pi
Democratizing Electronics
Elastic-R
Democrat...
99
Elastic-R: The Next Generation
Data Science Platform
 Elastic-R is:
 A link between the data scientist/the educator/t...
1010
Elastic-R: The Next Generation
Data Science Platform
 Elastic-R is:
 A hyper-dropbox allowing to run a large spectr...
11
Elastic-R: Design
and Technologies Overview
12
Elastic-R: Design
and Technologies Overview
Robot submarine dives to the deepest part
of the ocean controlled by a 7-mi...
13
Elastic-R: Design and
Technologies Overview
 Remote Java/R
Processes
 Events-driven Remote
Objects/Engines
 R, Pytho...
14
Elastic-R: Design
and Technologies Overview
15
Elastic-R: Design
and Technologies Overview
Elastic-R
AMI 1
R 2.10
BioC 2.5
Elastic-R
AMI 2
R 2.9
BioC 2.3
Elastic-R
AM...
1616
Modes of access
 Individuals with AWS accounts
 Standard AMIs (Amazon Machine Images): paid-per-use to Amazon.
For ...
1717
Rethinking Virtual Research and
Teaching
 Free researchers from the IT service dictatorship.
 Give researchers self...
1818
Rethinking Virtual Research and
Teaching
 Make the cloud an ecosystem for Open Science where
all research artifacts ...
1919
Rethinking Virtual Research and
Teaching
 Provide affordable and reliable tools for remote
education
 High-quality ...
2020
Demo
 Register to Elastic-R academic and trial portal
(www.elastic-r.org )
 Create Data Science Engines using trial...
2121
Conclusion
 Elastic -R unlocks the potential of the cloud for Data
Scientists and Educators
 With Elastic-R, the cl...
2222
What to do Next
 Register to Elastic-R and try the HTML 5 Workbench and
the collaboration
 Download the R package e...
2323
Contact details
 Karim Chine
 karim.chine@cloudera.co.uk
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/cloud-
data-teaching-research
Upcoming SlideShare
Loading in …5
×

Immersive Teaching and Research in Data Sciences via Cloud Computing

655 views

Published on

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1cYd52g.

Karim Chine discusses the meaning of cloud computing to academic teachers and researchers in data sciences and how to take advantage of it. With public cloud computing a new era for research and higher education begins. Filmed at qconnewyork.com.

Karim Chine has been designing and building large scale distributed software for over a decade. Born in Tunisia, he moved to Paris in 1997, graduated from Ecole Polytechnique and Telecom ParisTech and worked several years at Schlumberger and IBM R&D departments. In 2006, Karim moved to Cambridge and worked at EBI and at Imperial College.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
655
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Immersive Teaching and Research in Data Sciences via Cloud Computing

  1. 1. Immersive Teaching and Research in Data Sciences via Cloud Computing Cloud Era Ltd 13 June 2013 karim.chine@cloudera.co.uk Karim Chine
  2. 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /cloud-data-teaching-research
  3. 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  4. 4. 22 Outline  Introduction  The Rise of Data Science  What is missing on the Cloud for Research and Education?  Elastic-R: The Next Generation Data Science Platform  Elastic-R: Design and Technologies Overview  Rethinking Virtual Research and Teaching  Demo  Conclusion
  5. 5. 33 Introduction  Science and the 4th Paradigm  The promises of Grid Computing for e-Research  The e-Science program (UK)  The NSF-funded cyber infrastructure (USA)  e-Infrastructure and ICT Calls (EC) 2 2 2. 3 4 a cG a a           Experimental Science Theoretical Science Computational Science e-Science / Data-intensive Science
  6. 6. 4 Introduction The Dancing Bear The townspeople gather to see the wondrous sight as the massive, lumbering beast shambles and shuffles from paw to paw. The bear is really a terrible dancer, and the wonder isn't that the bear dances well but that the bear dances at all The tragic failure of the Grid paradigm  No democratic access for all scientists, a tool for Particle Physicists and Geeky scientists  No sustainable infrastructures  No SLA, best effort support  No flexibility, no isolation, only admins can install software, X.509 certificates, restrictive policies  No interaction design, no user-centric approach
  7. 7. 55 Introduction  The Data Deluge and the challenges of Big Data * http://www.slideshare.net/AmazonWebServices/20120620-aws-summitberlinbigdataanalyticsonaws
  8. 8. 6 The Rise of Data Science * http://en.wikipedia.org/wiki/Data_science Data Scientists Led by NASA Star Most Sought for Century: Jobs By Aki Ito - Jun 11, 2013 12:01 AM ET (Bloomberg.com) Harvard Business Review last year called this profession “the sexiest job of the 21st century” One measure of demand: Hours billed for work in statistical analysis grew by 522 percent in the first quarter compared with the same period in 2011
  9. 9. 77 What is missing on the Cloud for Research and Education?  Science-as-a-Service: (SCEs) Scientific Computing Environments-as-a-service, Models-as-a-service…  Generalized Real-time Collaboration, including interaction with SCEs and with Data  One consistent platform for homogenous access to Science Clouds and new abstractions for stateful access to remote infrastructure/Data  Reproducibility and traceability of data analysis and computational research  Science Gateways building and publishing for everyone
  10. 10. 8 Elastic-R: The Next Generation Data Science Platform Arduino / Raspberry pi Democratizing Electronics Elastic-R Democratizing Data Science
  11. 11. 99 Elastic-R: The Next Generation Data Science Platform  Elastic-R is:  A link between the data scientist/the educator/the student and the Infrastructure-as-a-Service  A consistent set of concepts, architectures, frameworks and interaction design models to wrap and complete the existing public and private clouds with the missing capabilities to radically improve the data scientist productivity  « Super Glue » between the virtual IT capabilities, the software capabilities, the mathematical/statistical capabilities and the man- machine interaction components  A Data Operating System enabling infinite combinations for analysing data and building rapidly Data Science-centric applications and services
  12. 12. 1010 Elastic-R: The Next Generation Data Science Platform  Elastic-R is:  A hyper-dropbox allowing to run a large spectrum of capabilities next to the centrally stored and ubiquitously accessible data.  A platform using the cloud to introduce traceability and reproducibility to the data science arena.  A federation paradigm for scientific clouds  A framework for building on-the-fly user-defined software-as-a-service
  13. 13. 11 Elastic-R: Design and Technologies Overview
  14. 14. 12 Elastic-R: Design and Technologies Overview Robot submarine dives to the deepest part of the ocean controlled by a 7-mile cable as thin as single human hair
  15. 15. 13 Elastic-R: Design and Technologies Overview  Remote Java/R Processes  Events-driven Remote Objects/Engines  R, Python, Mathematica, Matlab, Scilab, ..  Collaborative Spreadsheets  Collaborative Scientific Graphics Canvas  Collaborative Dashboard with collaborative widgets
  16. 16. 14 Elastic-R: Design and Technologies Overview
  17. 17. 15 Elastic-R: Design and Technologies Overview Elastic-R AMI 1 R 2.10 BioC 2.5 Elastic-R AMI 2 R 2.9 BioC 2.3 Elastic-R AMI 3 R 2.8 BioC 2.0 Elastic-R Amazon Machine Images Elastic-R EBS 1 Data Set XXX Elastic-R EBS 2 Data Set YYY Elastic-R EBS 3 Data Set ZZZ Elastic-R EBS 4 Data Set VVV Elastic-R AMI 2 R 2.9 BioC 2.3 Elastic-R EBS 4 Data Set VVV Amazon Elastic Block Stores Elastic-R AMI 2 R 2.9 BioC 2.3 Elastic-R.org Elastic-R EBS 4 Data Set VVV
  18. 18. 1616 Modes of access  Individuals with AWS accounts  Standard AMIs (Amazon Machine Images): paid-per-use to Amazon. For Academic use and trial purposes  Paid AMI: Paid-per-use software model. For business users  Individuals without AWS accounts  Trial tokens, purchased tokens, tokens granted by other users.  Resources (data science engines) shared by other users  Individual subscribtions  Companies/Educational & Research Institutions  Dedicated Platform and AMIs on an Amazon VPC (Virtual Private Cloud). Paid via subscribtion
  19. 19. 1717 Rethinking Virtual Research and Teaching  Free researchers from the IT service dictatorship.  Give researchers self-service access to the IT resources they need  Allow Researchers to share without restrictions  Make real-time collaboration a free and ubuiquitous service  Allow researchers to produce and publish to the web advanced applications/services without recourse to developers/admins
  20. 20. 1818 Rethinking Virtual Research and Teaching  Make the cloud an ecosystem for Open Science where all research artifacts can be produced, discovered and reused  Allow Researchers to « sell » the software/models/algorithms/techniques they invent seamlessly  Bridge the gap between the different computational research tools: interconnect SCEs, workflow workbenches, Documents editors…
  21. 21. 1919 Rethinking Virtual Research and Teaching  Provide affordable and reliable tools for remote education  High-quality voice and video chat for a large number of users  Self-service collaboration tools: Editors, White boards, IDEs, etc.  Modules for Traceability/reproducibilty  Extend existing on-line courses platforms to include capabilities such as:  Companion software environments in SaaS mode  Collaborative problem solving tools  Interactive courses  Tokens for Ready-to-run e-Learning applications  E-Learning environments’ visual designers
  22. 22. 2020 Demo  Register to Elastic-R academic and trial portal (www.elastic-r.org )  Create Data Science Engines using trial tokens  Work with R, Python and Scientific Spreadsheets in the browser  Share Data Science Engine and Collaborate  Use The Visual and Collaborative Scientific Applications designer to create and publish to the web an interactive dashboard  Connect to the remote Data Science Engine from withing a local R session, push and pull data, execute commands and show impact on the dashboard
  23. 23. 2121 Conclusion  Elastic -R unlocks the potential of the cloud for Data Scientists and Educators  With Elastic-R, the cloud becomes a cyberspace for collaborative research and sharing and an eco-system suited for open Science, open innovation and open education  Elastic-R improves dramatically the productivity of the Data Scientists: The entire Data Science Factory chain, from resources acquisition to services and applications publishing, becomes under their direct control  Elastic-R provides Analytics-as-a-Service platform that can extend any existing portal or application
  24. 24. 2222 What to do Next  Register to Elastic-R and try the HTML 5 Workbench and the collaboration  Download the R package elasticR and use it to access the cloud from local R sessions  Download the Java SDK and try to create your first Analytical application using AWS and the most advanced tools for programming with data.  Get in touch with me to explore potential partnerships and collaborations
  25. 25. 2323 Contact details  Karim Chine  karim.chine@cloudera.co.uk
  26. 26. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/cloud- data-teaching-research

×