Leveraging scriptable
infrastructures,
Towards a paradigm shift in
software for data science
Cloud Era Ltd 14 June 2013 ka...
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese an...
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the sprea...
22
Outline
 Introduction
 Fragmentation and friction in the Data Science arena
 The understated potential of Cloud scri...
33
Introduction
“The next information revolution is well underway. But it is
not happening where information scientists, i...
44
Introduction
 Science and the 4th Paradigm
 The e-Research Dancing Bears
2
2
2.
3
4
a
cG
a
a









...
5
Fragmentation and friction in the
Data Science Arena
www.scilab.org
http://root.cern.ch
www.sagemath.org
www.sas.com
off...
6
The understated potential of
Cloud scriptability
 3D printers are becoming
a common place:
 Creating three-dimensional...
7
Elastic-R: Towards a universal
platform for Data Science
Computational Components
R packages, Wrapped C,C++,Fortran code...
8
Elastic-R: Towards a universal
platform for Data Science
9
Elastic-R: Towards a universal
platform for Data Science
Public
Clouds
Private Cloud
10
Elastic-R: Design and
Technologies Overview
You are here
Your data
is here
11
Elastic-R: Design and
Technologies Overview
 Remote Java/R
Processes
 Events-driven Remote
Objects/Engines
 R, Pytho...
12
Elastic-R: Design and
Technologies Overview
Elastic-R
AMI 1
R 2.10
BioC 2.5
Elastic-R
AMI 2
R 2.9
BioC 2.3
Elastic-R
AM...
1313
Modes of access
 Individuals with AWS accounts
 Standard AMIs (Amazon Machine Images): paid-per-use to Amazon.
For ...
1414
Elastic-R: The scriptability
toolset
Command Line
Web Console
SDK
API
1515
Elastic-R: The scriptability
toolset
Command Line
Web Console
SDK
API
1616
Elastic-R: The scriptability
toolset
WS generator
rws.war
+ mapping.jar
+ pooling framework
+ R Java Bridge
+ JAX-WS
...
1717
Elastic-R: The scriptability
toolset
 API: Soap and Restful Web Services
 Data Analysis Engines control
 Data Anal...
1818
Demo
 Register to Elastic-R academic and trial portal
(www.elastic-r.org)
 Create Data Science Engines using trial ...
1919
Conclusion
 Elastic -R unlocks the potential of the cloud for Data
Scientists and analytical applications developers...
2020
What to do Next
 Register to Elastic-R and try the HTML 5 Workbench and
the collaboration capabilities
 Download th...
2121
Contact details
 Karim Chine
 karim.chine@cloudera.co.uk
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/elastic-r-
data
Upcoming SlideShare
Loading in …5
×

Leveraging Scriptable Infrastructures, Towards a Paradigm Shift in Software for Data Science

530 views
370 views

Published on

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/19tYIEk.

Karim Chine introduces Elastic-R, demonstrating some of its applications in bioinformatics and finance. Filmed at qconnewyork.com.

Karim Chine has been designing and building large scale distributed software for over a decade. Born in Tunisia, he moved to Paris in 1997, graduated from Ecole Polytechnique and Telecom ParisTech and worked several years at Schlumberger and IBM R&D departments. Karim is part of the European commission Experts group for e-Infrastructure and cloud computing projects and the author of Elastic-R.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
530
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Leveraging Scriptable Infrastructures, Towards a Paradigm Shift in Software for Data Science

  1. 1. Leveraging scriptable infrastructures, Towards a paradigm shift in software for data science Cloud Era Ltd 14 June 2013 karim.chine@cloudera.co.uk Karim Chine
  2. 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /elastic-r-data
  3. 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  4. 4. 22 Outline  Introduction  Fragmentation and friction in the Data Science arena  The understated potential of Cloud scriptability  Elastic-R: Towards a universal platform for Data Science  Elastic-R: Design and Technologies Overview  Elastic-R: The scriptability toolset  Demo  Conclusion
  5. 5. 33 Introduction “The next information revolution is well underway. But it is not happening where information scientists, information executives, and the information industry in general are looking for it. It is not a revolution in technology, machinery, techniques, software, or speed. It is a revolution in concepts.” Peter Drucker. Forbes ASAP, August 24, 1998
  6. 6. 44 Introduction  Science and the 4th Paradigm  The e-Research Dancing Bears 2 2 2. 3 4 a cG a a           Experimental Science Theoretical Science Computational Science e-Science / Data-intensive Science The townspeople gather to see the wondrous sight as the massive, lumbering beast shambles and shuffles from paw to paw. The bear is really a terrible dancer, and the wonder isn't that the bear dances well but that the bear dances at all.
  7. 7. 5 Fragmentation and friction in the Data Science Arena www.scilab.org http://root.cern.ch www.sagemath.org www.sas.com office.microsoft.com www.mathworks.com www.scipy.org www.spss.com www.wolfram.com www.python.org  Open-source (GPL) software environment for statistical computing and graphics  Lingua franca of data analysis.  Repositories of contributed R packages related to a variety of problem domains in life sciences, social sciences, finance, econometrics, chemo metrics, etc. are growing at an exponential rate.  R is Super Glue
  8. 8. 6 The understated potential of Cloud scriptability  3D printers are becoming a common place:  Creating three-dimensional solid object of virtually any shape from a digital model.  Scripting the physical world  Sharing physical reality on Facebook is now as easy as sharing your holiday pictures
  9. 9. 7 Elastic-R: Towards a universal platform for Data Science Computational Components R packages, Wrapped C,C++,Fortran code, Python modules, Matlab Toolkits… Open source or commercial Computational Resources Clusters, grids, private or public clouds Free or pay-per-use Computational GUIs HTML5 and Desktop Workbench Built-in views /Plugins /Collaborative views Open source or commercial Computational Scripts R / Python / Matlab / Groovy Computational APIs Java / SOAP / REST, Stateless and stateful Computational Storage Local, NFS, FTP, Amazon S3, EBS Generated Computational Web Services Stateful or stateless, mapping of R objects/functions Elastic -R
  10. 10. 8 Elastic-R: Towards a universal platform for Data Science
  11. 11. 9 Elastic-R: Towards a universal platform for Data Science Public Clouds Private Cloud
  12. 12. 10 Elastic-R: Design and Technologies Overview You are here Your data is here
  13. 13. 11 Elastic-R: Design and Technologies Overview  Remote Java/R Processes  Events-driven Remote Objects/Engines  R, Python, Mathematica, Matlab, Scilab, ...  Collaborative Spreadsheets  Collaborative Scientific Graphics Canvas  Collaborative Dashboard with collaborative widgets
  14. 14. 12 Elastic-R: Design and Technologies Overview Elastic-R AMI 1 R 2.10 BioC 2.5 Elastic-R AMI 2 R 2.9 BioC 2.3 Elastic-R AMI 3 R 2.8 BioC 2.0 Elastic-R Amazon Machine Images Elastic-R EBS 1 Data Set XXX Elastic-R EBS 2 Data Set YYY Elastic-R EBS 3 Data Set ZZZ Elastic-R EBS 4 Data Set VVV Elastic-R AMI 2 R 2.9 BioC 2.3 Elastic-R EBS 4 Data Set VVV Amazon Elastic Block Stores Eastic-R AMI 2 R 2.9 BioC 2.3 Elastic-R.org Elastic-R EBS 4 Data Set VVV
  15. 15. 1313 Modes of access  Individuals with AWS accounts  Standard AMIs (Amazon Machine Images): paid-per-use to Amazon. For Academic use and trial purposes  Paid AMI: Paid-per-use software model. For business users  Individuals without AWS accounts  Trial tokens, purchased tokens, tokens granted by other users.  Resources (data science engines) shared by other users  Individual subscribtions  Companies/Educational & Research Institutions  Dedicated Platform and AMIs on an Amazon VPC (Virtual Private Cloud). Paid via subscribtion
  16. 16. 1414 Elastic-R: The scriptability toolset Command Line Web Console SDK API
  17. 17. 1515 Elastic-R: The scriptability toolset Command Line Web Console SDK API
  18. 18. 1616 Elastic-R: The scriptability toolset WS generator rws.war + mapping.jar + pooling framework + R Java Bridge + JAX-WS - Servlets - Generated artifacts Eclipse Web Service Client Generator public static void main(String[] args) throws Exception { RGlobalEnvFunctionWeb g=new RGlobalEnvFunctionWebServiceLocator().getrGlobalEnvFunctionWebPort(); RNumeric x=new RNumeric(); x.setValue(new Double[]{6.0}); System.out.println(g.square(x).getValue()[0]); } Web Services Generation square  function(x) {return(x^2) } typeInfo(square)  SimultaneousTypeSpecification( TypedSignature(x = "numeric"), returnType = "numeric") Script / globals.r Script / rjmap.xml <rj> <publish> <functions> <function name="square" forWeb="true"/> </functions> </publish> <scripts> <initScript name="globals.r" embed="true"/> </scripts> </rj> Deploy
  19. 19. 1717 Elastic-R: The scriptability toolset  API: Soap and Restful Web Services  Data Analysis Engines control  Data Analysis Engines Management (Life cycle, etc.)  Virtual appliances/artifacts management  Platform administartion  SDKs: Java, R, Microsoft Office (Vba)  Command line: elasticR package  Html5 Workbench  Elastic-R artifacts management interface  Engines control interface
  20. 20. 1818 Demo  Register to Elastic-R academic and trial portal (www.elastic-r.org)  Create Data Science Engines using trial tokens  Work with R, Python and Scientific Spreadsheets in the browser  Share Data Science Engine and Collaborate  Connect to the remote Data Science Engine from within a local R session, push and pull data, execute commands and show impact on spreadsheets and dashboards
  21. 21. 1919 Conclusion  Elastic -R unlocks the potential of the cloud for Data Scientists and analytical applications developers and improves dramatically their productivity  The Data Science IT Factory chain, from resources acquisition to services and applications design and publication becomes:  Fully scriptable in R  under the direct control of the Data Scientist and the developer  Fully traceable and reproducible  Elastic-R can seamlessly extend any existing application or service and can be used as a collaboration backend  Openness, security and reliability are among the key capabilities delivered
  22. 22. 2020 What to do Next  Register to Elastic-R and try the HTML 5 Workbench and the collaboration capabilities  Download the R package elasticR and use it to access the cloud from local R sessions  Download the Java SDK and create your first analytical application using AWS and advanced tools for programming with data.  Get in touch with me to explore potential partnerships and collaborations
  23. 23. 2121 Contact details  Karim Chine  karim.chine@cloudera.co.uk
  24. 24. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/elastic-r- data

×