PyData NYC 2015
November 10th 2015
Karim Chine
karim.chine@rosettahub.com
Towards a universal platform
for
data science
on
public and private clouds
2
A universal open platform
for data science
Computational Components
R packages, Wrapped C,C++,Fortran code, Python modules, Matlab Toolkits…
Open source or commercial
Computational Resources
Clusters, grids, private or public clouds
Free or pay-per-use
Computational GUIs
HTML5 and Desktop Workbench
Built-in views /Plugins /Collaborative views
Open source or commercial
Computational Scripts
R / Python / Matlab / Groovy
Computational APIs
Java / SOAP / REST, Stateless and stateful
Computational Storage
Local, NFS, FTP, Amazon S3, EBS
Generated Computational Web Services
Stateful or stateless, mapping of R objects/functions
Elastic-
R
3
Infrastructures federation:
rosetta virtual cloud
Public
Clouds
Private Cloud
44
AWS: programmable
infrastructure
Command Line
Web Console
SDK
API
55
Command Line
Web Console
SDK
API
rosettaHUB: programming with
data and infrastructure
6
Google Docs-like real time
collaboration
7
Traceable and Reproducible
data science
Elastic-R
AMI 1
R 2.10 BioC
2.5
Elastic-R
AMI 2
R 2.9 BioC
2.3
Elastic-R
AMI 3
R 2.8
BioC 2.0
Elastic-R Amazon Machine Images
Elastic-R
EBS 1
Data Set
XXX
Elastic-R
EBS 2
Data Set
YYY
Elastic-R
EBS 3
Data Set
ZZZ
Elastic-R
EBS 4
Data Set VVV
Elastic-R
AMI 2
R 2.9
BioC 2.3
Elastic-R EBS
4
Data Set VVV
Amazon Elastic Block Stores
Eastic-R
AMI 2
R 2.9
BioC 2.3
Elastic-R.org
Elastic-R EBS
4
Data Set VVV
8
Architecture
9
Architecture
10
Data science universal engine
 Remote Java/R
Processes
 Events-driven Remote
Objects/Engines
 R, Python, Mathematica,
Matlab, Scilab, ...
 Collaborative Spreadsheets
 Collaborative Scientific
Graphics Canvas
 Collaborative Dashboard with
collaborative widgets
11
www.rosettahub.com

Py datanyc2015

  • 1.
    PyData NYC 2015 November10th 2015 Karim Chine karim.chine@rosettahub.com Towards a universal platform for data science on public and private clouds
  • 2.
    2 A universal openplatform for data science Computational Components R packages, Wrapped C,C++,Fortran code, Python modules, Matlab Toolkits… Open source or commercial Computational Resources Clusters, grids, private or public clouds Free or pay-per-use Computational GUIs HTML5 and Desktop Workbench Built-in views /Plugins /Collaborative views Open source or commercial Computational Scripts R / Python / Matlab / Groovy Computational APIs Java / SOAP / REST, Stateless and stateful Computational Storage Local, NFS, FTP, Amazon S3, EBS Generated Computational Web Services Stateful or stateless, mapping of R objects/functions Elastic- R
  • 3.
    3 Infrastructures federation: rosetta virtualcloud Public Clouds Private Cloud
  • 4.
  • 5.
    55 Command Line Web Console SDK API rosettaHUB:programming with data and infrastructure
  • 6.
    6 Google Docs-like realtime collaboration
  • 7.
    7 Traceable and Reproducible datascience Elastic-R AMI 1 R 2.10 BioC 2.5 Elastic-R AMI 2 R 2.9 BioC 2.3 Elastic-R AMI 3 R 2.8 BioC 2.0 Elastic-R Amazon Machine Images Elastic-R EBS 1 Data Set XXX Elastic-R EBS 2 Data Set YYY Elastic-R EBS 3 Data Set ZZZ Elastic-R EBS 4 Data Set VVV Elastic-R AMI 2 R 2.9 BioC 2.3 Elastic-R EBS 4 Data Set VVV Amazon Elastic Block Stores Eastic-R AMI 2 R 2.9 BioC 2.3 Elastic-R.org Elastic-R EBS 4 Data Set VVV
  • 8.
  • 9.
  • 10.
    10 Data science universalengine  Remote Java/R Processes  Events-driven Remote Objects/Engines  R, Python, Mathematica, Matlab, Scilab, ...  Collaborative Spreadsheets  Collaborative Scientific Graphics Canvas  Collaborative Dashboard with collaborative widgets
  • 11.