SlideShare a Scribd company logo
Data Science in
Ruby? Is it possible?
Is it Fast? Should we
use it?
• Rodrigo Urubatan
• rodrigo@urubatan.dev
• http://urubatan.dev
• http://twitter.com/urubatan
Anyone here work
with Data Science?
• DataScientist?
• DataEngineer?
• Developers of application that uses Data?
• Statisticians?
What exactly
is Data
Science?
The process of extractingmeaning from and interpret
data
The usage of statisticsand machine learning to clean
and manipulate data
The usage of computer software to collect, clean,
manipulate and interpret data
A cool name for the combination of Data Mining and
Business Intelligence (other buzz words thatwere used
for a long time for exactly what we call Data Science
today, but with more expensive tool sets)
Can Ruby do Data
Science?
Can Ruby do
Data Science?
(Long Answer)
• Standing on the shoulders of giants
• pycall — Bridgeinto the Python world.
• rserve-client— Ruby connector
for Rserve, R's binary server.
• Data Manipulation
• kiba — lightweight Ruby ETL (Extract-
Transform-Load) framework.
• jongleur — Workflowmanager using
DAG definitions to execute ETL tasks.
• Distributed Computing
• ruby-spark — Ruby Interface to Apache
Spark 1.x.x.
• Data Structures
• daru — Data Frame and Vector
structures with comprehensive
manipulatingand visualization
methods.
• numo-narray — n-dimensional
Numerical Array for Ruby.
• nmatrix — dense and sparselinear
algebra library for Ruby via SciRuby.
• Datasets
• rdatasets — Data sets available in R
via Rdatasets.
• red-datasets — Growing collectionof publicly
available data sets suchas CIFAR-10,Iris,MNIST
etc
• Statistics
• rb-gsl — Ruby interfacetotheGNU Scientific
Library. [dep: GLS]
• simple_stats — Enumerablepatches for
descriptive statistics.
• enumerable-statistics — fastimplementation of
descriptive statistics for theEnumerablemodule.
• Visualization
• matplotlib — Rubybased wrapper
around matplotlib. [dep: matplotlib]
• mathematical — PNGand MathML renderings for
your equations.
• daru-view — daru-view is interactive plotting
gem for web application (any Ruby web
applicationframework like
Rails/Sinatra/Nanoc/Hanami) &IRubynotebook.
It is a plugin gemfor daru.
• daru-plotly — Plotly basedvisualization for Daru.
Ruby
X
Python
Ruby
Daru
NMatrix/NArray
Python
Pandas
Numpy
The 3 Major
Ruby Data
Science
Projects
SciRuby project
Nmatrix Centric gems
Nmatrix
Daru
GnuplotRB
Stas_sample
Ruby Numo project
Numo:: NArray centric Gems
Numo:: NArray
Numo:: FFTE
Numo:: FFTW
Numo::Gnuplot
RedDataToolsproject
Apache Arrow centric gems
RedArrow
RedChainer
RedArrowGSL
RedArrowNMatrix
RedArrowNumoNArray
Other libraries
• ruby-spark — RubyInterface to Apache Spark 1.x.x.
• The project is almost dead,not commits in ages
• kiba — lightweight RubyETL (Extract-Transform-Load)framework.
• Great frameworkto load and transformdata,great performance
• enumerable-statistics — fast implementation ofdescriptivestatistics for the
Enumerable module.
• Very handyforsmall statisticalcalculations in yourapplication
• iruby — Rubykernel for Jupyter.
• The easiest wayto use Ruby in your Jupyter Notebook
• decisiontree - Decision Tree ID3 Algorithmin pure Ruby
• Easydecision tree implementation,and a very fast to train
Doing data science in
Ruby is Hard!
Ruby and Ruby on Rails are
way better to write business
web applications!
We can even do
really good Machine
Learning with Ruby
(but that is subject
for another
presentation)
And my objective is to
help ruby developers to use
the best tools for each job so
they can solve hard
problems, with less bugs and
have more free time.
pycall to the
rescue
pycall lets you use Python libraries from
your ruby code very naturally, as if you
were calling a Ruby library
pycall consists of one ruby binding
library for libpython.so and an Object-
oriented protocol for communication
between Ruby and Python
Simple pycall
code
What about some
light machine
learning? Do I
need python too?
Ok, so what
are the best
work
patterns?
Python is way better than Ruby for
Data Science
Ruby is better for web business
applications
Best patterns for integration are
(IMHO)
• Pointing both applications to the same
database
• Exchanging data through JSON or some similar
serialization
• Calling Python directly through pycall
References
• Ruby Conf 2017 – Using Ruby in DataScience by Kenta Murata (@mrkn)
• Big Data analysis in Ruby
• Lets do some (Data) Science in Ruby by Dan Carpenter (@dan_alyst)
• Progress of Ruby/Numo: Numerical Computing for Ruby
• SciRuby
• Ruby::Numo
• Ruby Machine Learning resources
• Ruby Data Science Resources
• PyCall
Any questions? Talk to
me!
• @urubatan
• https://urubatan.dev
• rodrigo@urubatan.dev

More Related Content

What's hot

Curse of Cardinality: A History and Evolution of Monitoring at Scale
Curse of Cardinality: A History and Evolution of Monitoring at ScaleCurse of Cardinality: A History and Evolution of Monitoring at Scale
Curse of Cardinality: A History and Evolution of Monitoring at Scale
Michael Goodness
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
Wes McKinney
 
Boost on!!next generation big data platform
Boost on!!next generation big data platformBoost on!!next generation big data platform
Boost on!!next generation big data platform
LINE Corporation
 
Predictive Models at Scale
Predictive Models at ScalePredictive Models at Scale
Predictive Models at Scale
Nikhil Ketkar
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPop
Jason Plurad
 
Data Visualization in Python
Data Visualization in PythonData Visualization in Python
Data Visualization in Python
Jagriti Goswami
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
Xiang Fu
 
Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes
Ziemowit Jankowski
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraph
Jason Plurad
 
Boolan machine learning summit
Boolan machine learning summitBoolan machine learning summit
Boolan machine learning summit
Adam Gibson
 
Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data Scientist
Alexey Zinoviev
 
Implementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache FlinkImplementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache Flink
Márton Balassi
 
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Alexey Zinoviev
 
JanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYCJanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYC
Jason Plurad
 
Exploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraphExploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraph
Jason Plurad
 
F# for Data*
F# for Data*F# for Data*
F# for Data*
Sergey Tihon
 
Airline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseAirline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use Case
Jason Plurad
 
Pycon tw 2013
Pycon tw 2013Pycon tw 2013
Pycon tw 2013
show you
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
Zhenxiao Luo
 

What's hot (20)

Curse of Cardinality: A History and Evolution of Monitoring at Scale
Curse of Cardinality: A History and Evolution of Monitoring at ScaleCurse of Cardinality: A History and Evolution of Monitoring at Scale
Curse of Cardinality: A History and Evolution of Monitoring at Scale
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
 
Boost on!!next generation big data platform
Boost on!!next generation big data platformBoost on!!next generation big data platform
Boost on!!next generation big data platform
 
Predictive Models at Scale
Predictive Models at ScalePredictive Models at Scale
Predictive Models at Scale
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPop
 
Data Visualization in Python
Data Visualization in PythonData Visualization in Python
Data Visualization in Python
 
Pinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
 
Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes
 
Graph Computing with JanusGraph
Graph Computing with JanusGraphGraph Computing with JanusGraph
Graph Computing with JanusGraph
 
Boolan machine learning summit
Boolan machine learning summitBoolan machine learning summit
Boolan machine learning summit
 
Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data Scientist
 
Implementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache FlinkImplementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache Flink
 
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness Platform
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
 
JanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYCJanusGraph, Jupyter Meetup NYC
JanusGraph, Jupyter Meetup NYC
 
Exploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraphExploring Graph Use Cases with JanusGraph
Exploring Graph Use Cases with JanusGraph
 
F# for Data*
F# for Data*F# for Data*
F# for Data*
 
Airline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use CaseAirline Reservations and Routing: A Graph Use Case
Airline Reservations and Routing: A Graph Use Case
 
Pycon tw 2013
Pycon tw 2013Pycon tw 2013
Pycon tw 2013
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 

Similar to Data science in ruby, is it possible? is it fast? should we use it?

Session 2
Session 2Session 2
Session 2
HarithaAshok3
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019
Travis Oliphant
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
Travis Oliphant
 
Python ml
Python mlPython ml
Python ml
Shubham Sharma
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
Amanda Casari
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
Georg Heiler
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Benjamin Nussbaum
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
Dony Riyanto
 
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015
Wes McKinney
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
ROMA User-Customizable NoSQL Database in Ruby
ROMA User-Customizable NoSQL Database in RubyROMA User-Customizable NoSQL Database in Ruby
ROMA User-Customizable NoSQL Database in Ruby
Rakuten Group, Inc.
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and R
Radek Maciaszek
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the Cloud
RightScale
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
DataWorks Summit
 
Big data analytics using R
Big data analytics using RBig data analytics using R
Big data analytics using R
Karthik Padmanabhan ( MLE℠)
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
MapR Technologies
 
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
Teradata Partners Conference Oct 2014   Big Data Anti-PatternsTeradata Partners Conference Oct 2014   Big Data Anti-Patterns
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
Douglas Moore
 
"R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)""R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)"
Portland R User Group
 
R, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web ServicesR, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web Services
Portland R User Group
 

Similar to Data science in ruby, is it possible? is it fast? should we use it? (20)

Session 2
Session 2Session 2
Session 2
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
Python ml
Python mlPython ml
Python ml
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
ROMA User-Customizable NoSQL Database in Ruby
ROMA User-Customizable NoSQL Database in RubyROMA User-Customizable NoSQL Database in Ruby
ROMA User-Customizable NoSQL Database in Ruby
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and R
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the Cloud
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Big data analytics using R
Big data analytics using RBig data analytics using R
Big data analytics using R
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
Teradata Partners Conference Oct 2014   Big Data Anti-PatternsTeradata Partners Conference Oct 2014   Big Data Anti-Patterns
Teradata Partners Conference Oct 2014 Big Data Anti-Patterns
 
"R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)""R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)"
 
R, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web ServicesR, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web Services
 

More from Rodrigo Urubatan

Ruby code smells
Ruby code smellsRuby code smells
Ruby code smells
Rodrigo Urubatan
 
2018 the conf put git to work - increase the quality of your rails project...
2018 the conf   put git to work -  increase the quality of your rails project...2018 the conf   put git to work -  increase the quality of your rails project...
2018 the conf put git to work - increase the quality of your rails project...
Rodrigo Urubatan
 
2018 RubyHACK: put git to work - increase the quality of your rails project...
2018 RubyHACK:  put git to work -  increase the quality of your rails project...2018 RubyHACK:  put git to work -  increase the quality of your rails project...
2018 RubyHACK: put git to work - increase the quality of your rails project...
Rodrigo Urubatan
 
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...
Rodrigo Urubatan
 
Your first game with unity3d framework
Your first game with unity3d frameworkYour first game with unity3d framework
Your first game with unity3d framework
Rodrigo Urubatan
 
Tdc Floripa 2017 - 8 falácias da programação distribuída
Tdc Floripa 2017 -  8 falácias da programação distribuídaTdc Floripa 2017 -  8 falácias da programação distribuída
Tdc Floripa 2017 - 8 falácias da programação distribuída
Rodrigo Urubatan
 
Rubyconf2016 - Solving communication problems in distributed teams with BDD
Rubyconf2016 - Solving communication problems in distributed teams with BDDRubyconf2016 - Solving communication problems in distributed teams with BDD
Rubyconf2016 - Solving communication problems in distributed teams with BDD
Rodrigo Urubatan
 
resolvendo problemas de comunicação em equipes distribuídas com bdd
resolvendo problemas de comunicação em equipes distribuídas com bddresolvendo problemas de comunicação em equipes distribuídas com bdd
resolvendo problemas de comunicação em equipes distribuídas com bdd
Rodrigo Urubatan
 
vantagens e desvantagens de trabalhar remoto
vantagens e desvantagens de trabalhar remotovantagens e desvantagens de trabalhar remoto
vantagens e desvantagens de trabalhar remoto
Rodrigo Urubatan
 
Using BDD to Solve communication problems
Using BDD to Solve communication problemsUsing BDD to Solve communication problems
Using BDD to Solve communication problems
Rodrigo Urubatan
 
TDC2015 Porto Alegre - Interfaces ricas com Rails e React.JS
TDC2015  Porto Alegre - Interfaces ricas com Rails e React.JSTDC2015  Porto Alegre - Interfaces ricas com Rails e React.JS
TDC2015 Porto Alegre - Interfaces ricas com Rails e React.JS
Rodrigo Urubatan
 
Interfaces ricas com Rails e React.JS @ Rubyconf 2015
Interfaces ricas com Rails e React.JS @ Rubyconf 2015Interfaces ricas com Rails e React.JS @ Rubyconf 2015
Interfaces ricas com Rails e React.JS @ Rubyconf 2015
Rodrigo Urubatan
 
TDC São Paulo 2015 - Interfaces Ricas com Rails e React.JS
TDC São Paulo 2015  - Interfaces Ricas com Rails e React.JSTDC São Paulo 2015  - Interfaces Ricas com Rails e React.JS
TDC São Paulo 2015 - Interfaces Ricas com Rails e React.JS
Rodrigo Urubatan
 
Full Text Search com Solr, MySQL Full text e PostgreSQL Full Text
Full Text Search com Solr, MySQL Full text e PostgreSQL Full TextFull Text Search com Solr, MySQL Full text e PostgreSQL Full Text
Full Text Search com Solr, MySQL Full text e PostgreSQL Full Text
Rodrigo Urubatan
 
Ruby para programadores java
Ruby para programadores javaRuby para programadores java
Ruby para programadores java
Rodrigo Urubatan
 
Treinamento html5, css e java script apresentado na HP
Treinamento html5, css e java script apresentado na HPTreinamento html5, css e java script apresentado na HP
Treinamento html5, css e java script apresentado na HP
Rodrigo Urubatan
 
Ruby on rails impressione a você mesmo, seu chefe e seu cliente
Ruby on rails  impressione a você mesmo, seu chefe e seu clienteRuby on rails  impressione a você mesmo, seu chefe e seu cliente
Ruby on rails impressione a você mesmo, seu chefe e seu cliente
Rodrigo Urubatan
 
Mini curso rails 3
Mini curso rails 3Mini curso rails 3
Mini curso rails 3
Rodrigo Urubatan
 
Aplicações Hibridas com Phonegap e HTML5
Aplicações Hibridas com Phonegap e HTML5Aplicações Hibridas com Phonegap e HTML5
Aplicações Hibridas com Phonegap e HTML5
Rodrigo Urubatan
 
Git presentation to some coworkers some time ago
Git presentation to some coworkers some time agoGit presentation to some coworkers some time ago
Git presentation to some coworkers some time ago
Rodrigo Urubatan
 

More from Rodrigo Urubatan (20)

Ruby code smells
Ruby code smellsRuby code smells
Ruby code smells
 
2018 the conf put git to work - increase the quality of your rails project...
2018 the conf   put git to work -  increase the quality of your rails project...2018 the conf   put git to work -  increase the quality of your rails project...
2018 the conf put git to work - increase the quality of your rails project...
 
2018 RubyHACK: put git to work - increase the quality of your rails project...
2018 RubyHACK:  put git to work -  increase the quality of your rails project...2018 RubyHACK:  put git to work -  increase the quality of your rails project...
2018 RubyHACK: put git to work - increase the quality of your rails project...
 
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...
TDC2017 - POA - Aprendendo a usar Xamarin para desenvolver aplicações moveis ...
 
Your first game with unity3d framework
Your first game with unity3d frameworkYour first game with unity3d framework
Your first game with unity3d framework
 
Tdc Floripa 2017 - 8 falácias da programação distribuída
Tdc Floripa 2017 -  8 falácias da programação distribuídaTdc Floripa 2017 -  8 falácias da programação distribuída
Tdc Floripa 2017 - 8 falácias da programação distribuída
 
Rubyconf2016 - Solving communication problems in distributed teams with BDD
Rubyconf2016 - Solving communication problems in distributed teams with BDDRubyconf2016 - Solving communication problems in distributed teams with BDD
Rubyconf2016 - Solving communication problems in distributed teams with BDD
 
resolvendo problemas de comunicação em equipes distribuídas com bdd
resolvendo problemas de comunicação em equipes distribuídas com bddresolvendo problemas de comunicação em equipes distribuídas com bdd
resolvendo problemas de comunicação em equipes distribuídas com bdd
 
vantagens e desvantagens de trabalhar remoto
vantagens e desvantagens de trabalhar remotovantagens e desvantagens de trabalhar remoto
vantagens e desvantagens de trabalhar remoto
 
Using BDD to Solve communication problems
Using BDD to Solve communication problemsUsing BDD to Solve communication problems
Using BDD to Solve communication problems
 
TDC2015 Porto Alegre - Interfaces ricas com Rails e React.JS
TDC2015  Porto Alegre - Interfaces ricas com Rails e React.JSTDC2015  Porto Alegre - Interfaces ricas com Rails e React.JS
TDC2015 Porto Alegre - Interfaces ricas com Rails e React.JS
 
Interfaces ricas com Rails e React.JS @ Rubyconf 2015
Interfaces ricas com Rails e React.JS @ Rubyconf 2015Interfaces ricas com Rails e React.JS @ Rubyconf 2015
Interfaces ricas com Rails e React.JS @ Rubyconf 2015
 
TDC São Paulo 2015 - Interfaces Ricas com Rails e React.JS
TDC São Paulo 2015  - Interfaces Ricas com Rails e React.JSTDC São Paulo 2015  - Interfaces Ricas com Rails e React.JS
TDC São Paulo 2015 - Interfaces Ricas com Rails e React.JS
 
Full Text Search com Solr, MySQL Full text e PostgreSQL Full Text
Full Text Search com Solr, MySQL Full text e PostgreSQL Full TextFull Text Search com Solr, MySQL Full text e PostgreSQL Full Text
Full Text Search com Solr, MySQL Full text e PostgreSQL Full Text
 
Ruby para programadores java
Ruby para programadores javaRuby para programadores java
Ruby para programadores java
 
Treinamento html5, css e java script apresentado na HP
Treinamento html5, css e java script apresentado na HPTreinamento html5, css e java script apresentado na HP
Treinamento html5, css e java script apresentado na HP
 
Ruby on rails impressione a você mesmo, seu chefe e seu cliente
Ruby on rails  impressione a você mesmo, seu chefe e seu clienteRuby on rails  impressione a você mesmo, seu chefe e seu cliente
Ruby on rails impressione a você mesmo, seu chefe e seu cliente
 
Mini curso rails 3
Mini curso rails 3Mini curso rails 3
Mini curso rails 3
 
Aplicações Hibridas com Phonegap e HTML5
Aplicações Hibridas com Phonegap e HTML5Aplicações Hibridas com Phonegap e HTML5
Aplicações Hibridas com Phonegap e HTML5
 
Git presentation to some coworkers some time ago
Git presentation to some coworkers some time agoGit presentation to some coworkers some time ago
Git presentation to some coworkers some time ago
 

Recently uploaded

Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 

Recently uploaded (20)

Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 

Data science in ruby, is it possible? is it fast? should we use it?

  • 1. Data Science in Ruby? Is it possible? Is it Fast? Should we use it? • Rodrigo Urubatan • rodrigo@urubatan.dev • http://urubatan.dev • http://twitter.com/urubatan
  • 2. Anyone here work with Data Science? • DataScientist? • DataEngineer? • Developers of application that uses Data? • Statisticians?
  • 3. What exactly is Data Science? The process of extractingmeaning from and interpret data The usage of statisticsand machine learning to clean and manipulate data The usage of computer software to collect, clean, manipulate and interpret data A cool name for the combination of Data Mining and Business Intelligence (other buzz words thatwere used for a long time for exactly what we call Data Science today, but with more expensive tool sets)
  • 4. Can Ruby do Data Science?
  • 5. Can Ruby do Data Science? (Long Answer) • Standing on the shoulders of giants • pycall — Bridgeinto the Python world. • rserve-client— Ruby connector for Rserve, R's binary server. • Data Manipulation • kiba — lightweight Ruby ETL (Extract- Transform-Load) framework. • jongleur — Workflowmanager using DAG definitions to execute ETL tasks. • Distributed Computing • ruby-spark — Ruby Interface to Apache Spark 1.x.x. • Data Structures • daru — Data Frame and Vector structures with comprehensive manipulatingand visualization methods. • numo-narray — n-dimensional Numerical Array for Ruby. • nmatrix — dense and sparselinear algebra library for Ruby via SciRuby. • Datasets • rdatasets — Data sets available in R via Rdatasets. • red-datasets — Growing collectionof publicly available data sets suchas CIFAR-10,Iris,MNIST etc • Statistics • rb-gsl — Ruby interfacetotheGNU Scientific Library. [dep: GLS] • simple_stats — Enumerablepatches for descriptive statistics. • enumerable-statistics — fastimplementation of descriptive statistics for theEnumerablemodule. • Visualization • matplotlib — Rubybased wrapper around matplotlib. [dep: matplotlib] • mathematical — PNGand MathML renderings for your equations. • daru-view — daru-view is interactive plotting gem for web application (any Ruby web applicationframework like Rails/Sinatra/Nanoc/Hanami) &IRubynotebook. It is a plugin gemfor daru. • daru-plotly — Plotly basedvisualization for Daru.
  • 7. The 3 Major Ruby Data Science Projects SciRuby project Nmatrix Centric gems Nmatrix Daru GnuplotRB Stas_sample Ruby Numo project Numo:: NArray centric Gems Numo:: NArray Numo:: FFTE Numo:: FFTW Numo::Gnuplot RedDataToolsproject Apache Arrow centric gems RedArrow RedChainer RedArrowGSL RedArrowNMatrix RedArrowNumoNArray
  • 8. Other libraries • ruby-spark — RubyInterface to Apache Spark 1.x.x. • The project is almost dead,not commits in ages • kiba — lightweight RubyETL (Extract-Transform-Load)framework. • Great frameworkto load and transformdata,great performance • enumerable-statistics — fast implementation ofdescriptivestatistics for the Enumerable module. • Very handyforsmall statisticalcalculations in yourapplication • iruby — Rubykernel for Jupyter. • The easiest wayto use Ruby in your Jupyter Notebook • decisiontree - Decision Tree ID3 Algorithmin pure Ruby • Easydecision tree implementation,and a very fast to train
  • 9. Doing data science in Ruby is Hard!
  • 10. Ruby and Ruby on Rails are way better to write business web applications!
  • 11. We can even do really good Machine Learning with Ruby (but that is subject for another presentation)
  • 12. And my objective is to help ruby developers to use the best tools for each job so they can solve hard problems, with less bugs and have more free time.
  • 13. pycall to the rescue pycall lets you use Python libraries from your ruby code very naturally, as if you were calling a Ruby library pycall consists of one ruby binding library for libpython.so and an Object- oriented protocol for communication between Ruby and Python
  • 15. What about some light machine learning? Do I need python too?
  • 16. Ok, so what are the best work patterns? Python is way better than Ruby for Data Science Ruby is better for web business applications Best patterns for integration are (IMHO) • Pointing both applications to the same database • Exchanging data through JSON or some similar serialization • Calling Python directly through pycall
  • 17. References • Ruby Conf 2017 – Using Ruby in DataScience by Kenta Murata (@mrkn) • Big Data analysis in Ruby • Lets do some (Data) Science in Ruby by Dan Carpenter (@dan_alyst) • Progress of Ruby/Numo: Numerical Computing for Ruby • SciRuby • Ruby::Numo • Ruby Machine Learning resources • Ruby Data Science Resources • PyCall
  • 18. Any questions? Talk to me! • @urubatan • https://urubatan.dev • rodrigo@urubatan.dev