Dataiku Data Science Studio
BP Town Hall
20 March 2019
Norman.Poh@bjss.com
What is dataiku?
An online data science platform
Run on server or local VM. Work uniquely
with http://127.0.0.1:10000
For business users (‘clickers’):
Doing data science with very little coding
experience
Another attempt to democratise machine
learning
For data scientists (‘coders’):
Define a data (wrangling) pipeline once and
use everywhere
More efficient codes
Focus on algorithm development
Easy scoring (provided you use one of the
algorithms)
The platform
Data import
Project contains Flow, Lab, Dashboard,…
Flow
Produced by
Lab
RecipesRecipesRecipes
Produced by
Visual Analyses
or Labs
Recipe
(Lab)
Where you define your ‘function’
or ‘procedure’
(A Lab produces a Recipe)
What about coders?
My opinion?
Another effort to democratise machine learning technologies
• Good news for anyone who want to access ML technologies
• Embrace the technology and adapt quickly
• Go T-shape – Be industry-specific whilst remaining generalist
• Versatile – Team working: data engineers and analytics developers
https://www.forbes.com/sites/forbestechcouncil/2019/03/01/radical-
change-is-coming-to-data-science-jobs/#1696e1a4dfcc
“I believe the job of data scientist as we know it today will be
barely recognizable in five to 10 years.”
Examples of similar efforts
reducing tedious data
preparation work:
• Trifacta, Element Analytics, Kylo
automating algorithm selection
and parameter tuning
• Auto-sklearn, DataRobot
doing data science with
graphical user interface:
• Orange, KNIME, SPSS Modeler, Azure ML
Studio
working with massive data:
• Kafka, Spark, Hive/Impala
building dashboards quickly
• Tableau, Power BI, QlikView

Dataiku data science studio

  • 1.
    Dataiku Data ScienceStudio BP Town Hall 20 March 2019 Norman.Poh@bjss.com
  • 2.
    What is dataiku? Anonline data science platform Run on server or local VM. Work uniquely with http://127.0.0.1:10000 For business users (‘clickers’): Doing data science with very little coding experience Another attempt to democratise machine learning For data scientists (‘coders’): Define a data (wrangling) pipeline once and use everywhere More efficient codes Focus on algorithm development Easy scoring (provided you use one of the algorithms)
  • 3.
  • 4.
  • 5.
    Project contains Flow,Lab, Dashboard,…
  • 6.
  • 7.
    Recipe (Lab) Where you defineyour ‘function’ or ‘procedure’ (A Lab produces a Recipe)
  • 8.
  • 9.
    My opinion? Another effortto democratise machine learning technologies • Good news for anyone who want to access ML technologies • Embrace the technology and adapt quickly • Go T-shape – Be industry-specific whilst remaining generalist • Versatile – Team working: data engineers and analytics developers https://www.forbes.com/sites/forbestechcouncil/2019/03/01/radical- change-is-coming-to-data-science-jobs/#1696e1a4dfcc “I believe the job of data scientist as we know it today will be barely recognizable in five to 10 years.”
  • 10.
    Examples of similarefforts reducing tedious data preparation work: • Trifacta, Element Analytics, Kylo automating algorithm selection and parameter tuning • Auto-sklearn, DataRobot doing data science with graphical user interface: • Orange, KNIME, SPSS Modeler, Azure ML Studio working with massive data: • Kafka, Spark, Hive/Impala building dashboards quickly • Tableau, Power BI, QlikView