DATA	
  
SCIENTIST	
  
is	
  	
  
NOT	
  
a	
  defined	
  term	
   This	
  is	
  not…	
  a	
  Data	
  Scien9st	
  	
  
www.dataiku.com	
  	
  -­‐	
  @dataiku	
  -­‐	
  @baAymarc	
  
MACHINE	
  
LEARNING	
  
EXPERT	
  
DATA	
  
CLEANER	
  
DATA	
  LEAK	
  FIXER	
  
END	
  OF	
  	
  
(HADOOP)	
  JOB	
  	
  
DATA	
  
WAITER	
  
How	
  can	
  we	
  	
  
HELP	
  	
  
DATA	
  SCIENTISTS	
  
to	
  	
  
FOCUS	
  
on	
  the	
  	
  
REAL	
  PROBLEMS	
  ?	
  	
  
Pain	
  points	
  
•  Data	
  prepara9on	
  is	
  9me-­‐consuming	
  
	
  
•  Machine	
  learning	
  is	
  hard	
  to	
  understand	
  
•  Insights	
  and	
  models	
  (almost)	
  never	
  reach	
  
produc9on	
  
Data	
  Science	
  Studio	
  
•  A	
  democra9c	
  &	
  ready	
  to	
  use	
  Data	
  Science	
  
Studio	
  to	
  start	
  innova9ng	
  with	
  data!	
  
Ready	
  to	
  Use	
  Data	
  
Science	
  PlaYorm	
  
Common	
  playground	
  for	
  
innova9on	
  
Accessible	
  Sta9s9cs	
  &	
  
Machine	
  Learning	
  for	
  
everyone	
  
Handle	
  real-­‐life	
  data	
  
Data	
  Science	
  Studio	
  
Visual	
  and	
  Interac9ve	
  Data	
  
Prepara9on	
  
For	
  Data	
  Cleaners	
  
Guided	
  Machine	
  Learning	
  
For	
  non	
  Machine	
  Learning	
  Experts	
  
Produc9on	
  ready	
  
For	
  Data	
  Leak	
  Fixers	
  
Visual	
  Data	
  
Prepara9on	
  
Visual	
  Data	
  Prepara9on	
  
•  Interac9ve	
  UI	
  with	
  instant	
  feedback	
  and	
  
sugges9ons	
  
•  Reversibility	
  of	
  the	
  script,	
  data	
  integrity	
  
•  Explora9on	
  of	
  data:	
  quick	
  analysis,	
  facets	
  
•  Cleansing:	
  missing	
  values,	
  outliers,	
  parsing	
  
•  Enrichment:	
  GeoIP,	
  Holidays,	
  joins	
  
•  Produc9on-­‐ready:	
  integra9on	
  within	
  a	
  flow	
  
Guided	
  Machine	
  	
  
Learning	
  
Produc9on	
  
&	
  orchestra9on	
  
Data	
  Science	
  Studio:	
  	
  
benefits	
  
•  Real-­‐9me	
  and	
  interac9ve	
  
–  Transforma9on	
  effects	
  can	
  be	
  previsualized	
  in	
  real-­‐9me	
  
	
  
•  Transparent	
  and	
  traceable	
  
–  Keep	
  the	
  full	
  history	
  of	
  your	
  data	
  transforma9on	
  logics	
  and	
  
model	
  designs	
  
•  Easy	
  access	
  to	
  machine	
  learning	
  
–  Get	
  started	
  with	
  our	
  app	
  templates,	
  bootstrap	
  your	
  model	
  
and	
  features	
  selec9ons,	
  then	
  go	
  further!	
  
	
  
•  Scalable	
  and	
  Produc9on	
  Ready	
  
–  Apply	
  your	
  recipes	
  on	
  your	
  cluster	
  on	
  terabytes	
  of	
  data	
  
Dataiku	
  at	
  a	
  glance	
  
•  Founded	
  in	
  2013	
  by	
  Data	
  and	
  Search	
  Engine	
  veterans	
  
•  From	
  “data”	
  and	
  “haïku”	
  
“data	
  can	
  be	
  big	
  	
  
solu;on	
  would	
  be	
  small	
  
feel	
  the	
  hot	
  wind”	
  
•  1	
  goal:	
  make	
  Data	
  Science	
  accessible	
  to	
  anyone!	
  
	
  
	
  
	
  
	
  
	
  
	
  
Contact:	
  marc.baAy@dataiku.com	
  -­‐	
  @baAymarc	
  -­‐	
  github.com/dataiku	
  

Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013

  • 1.
    DATA   SCIENTIST   is     NOT   a  defined  term   This  is  not…  a  Data  Scien9st     www.dataiku.com    -­‐  @dataiku  -­‐  @baAymarc  
  • 2.
  • 3.
  • 4.
  • 5.
    END  OF     (HADOOP)  JOB     DATA   WAITER  
  • 6.
    How  can  we     HELP     DATA  SCIENTISTS   to     FOCUS   on  the     REAL  PROBLEMS  ?    
  • 7.
    Pain  points   • Data  prepara9on  is  9me-­‐consuming     •  Machine  learning  is  hard  to  understand   •  Insights  and  models  (almost)  never  reach   produc9on  
  • 8.
    Data  Science  Studio   •  A  democra9c  &  ready  to  use  Data  Science   Studio  to  start  innova9ng  with  data!   Ready  to  Use  Data   Science  PlaYorm   Common  playground  for   innova9on   Accessible  Sta9s9cs  &   Machine  Learning  for   everyone   Handle  real-­‐life  data  
  • 9.
    Data  Science  Studio   Visual  and  Interac9ve  Data   Prepara9on   For  Data  Cleaners   Guided  Machine  Learning   For  non  Machine  Learning  Experts   Produc9on  ready   For  Data  Leak  Fixers  
  • 10.
  • 11.
    Visual  Data  Prepara9on   •  Interac9ve  UI  with  instant  feedback  and   sugges9ons   •  Reversibility  of  the  script,  data  integrity   •  Explora9on  of  data:  quick  analysis,  facets   •  Cleansing:  missing  values,  outliers,  parsing   •  Enrichment:  GeoIP,  Holidays,  joins   •  Produc9on-­‐ready:  integra9on  within  a  flow  
  • 12.
    Guided  Machine     Learning  
  • 13.
  • 14.
    Data  Science  Studio:     benefits   •  Real-­‐9me  and  interac9ve   –  Transforma9on  effects  can  be  previsualized  in  real-­‐9me     •  Transparent  and  traceable   –  Keep  the  full  history  of  your  data  transforma9on  logics  and   model  designs   •  Easy  access  to  machine  learning   –  Get  started  with  our  app  templates,  bootstrap  your  model   and  features  selec9ons,  then  go  further!     •  Scalable  and  Produc9on  Ready   –  Apply  your  recipes  on  your  cluster  on  terabytes  of  data  
  • 15.
    Dataiku  at  a  glance   •  Founded  in  2013  by  Data  and  Search  Engine  veterans   •  From  “data”  and  “haïku”   “data  can  be  big     solu;on  would  be  small   feel  the  hot  wind”   •  1  goal:  make  Data  Science  accessible  to  anyone!               Contact:  marc.baAy@dataiku.com  -­‐  @baAymarc  -­‐  github.com/dataiku