Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
The document outlines the challenges faced by data scientists, such as time-consuming data preparation and the complexity of machine learning. It introduces a data science studio that provides a user-friendly platform for visualization and guided machine learning, aimed at making data science more accessible. The studio offers real-time feedback, data integrity, and production-ready solutions, promoting innovation and efficiency in data handling.
Explores the differing roles of Data Scientists including Machine Learning Expert, Data Cleaner, Data Leak Fixer, and the Data Waiter.
Identifies challenges faced by Data Scientists like time-consuming data preparation, difficulty in machine learning, and issues in production insights.
Presents the Data Science Studio as an accessible platform for innovation, guiding non-experts in data preparation and machine learning.
Discusses features of visual data preparation such as interactive UI, data integrity, data cleansing, and production readiness.
Highlights the benefits of the Data Science Studio, including real-time insights, transparency, scalability, and user-friendly access to machine learning.
Provides a brief history of Dataiku, founded in 2013, with the goal of making Data Science accessible to everyone.
How
can
we
HELP
DATA
SCIENTISTS
to
FOCUS
on
the
REAL
PROBLEMS
?
7.
Pain
points
• Data
prepara9on
is
9me-‐consuming
• Machine
learning
is
hard
to
understand
• Insights
and
models
(almost)
never
reach
produc9on
8.
Data
Science
Studio
• A
democra9c
&
ready
to
use
Data
Science
Studio
to
start
innova9ng
with
data!
Ready
to
Use
Data
Science
PlaYorm
Common
playground
for
innova9on
Accessible
Sta9s9cs
&
Machine
Learning
for
everyone
Handle
real-‐life
data
9.
Data
Science
Studio
Visual
and
Interac9ve
Data
Prepara9on
For
Data
Cleaners
Guided
Machine
Learning
For
non
Machine
Learning
Experts
Produc9on
ready
For
Data
Leak
Fixers
Visual
Data
Prepara9on
• Interac9ve
UI
with
instant
feedback
and
sugges9ons
• Reversibility
of
the
script,
data
integrity
• Explora9on
of
data:
quick
analysis,
facets
• Cleansing:
missing
values,
outliers,
parsing
• Enrichment:
GeoIP,
Holidays,
joins
• Produc9on-‐ready:
integra9on
within
a
flow
Data
Science
Studio:
benefits
• Real-‐9me
and
interac9ve
– Transforma9on
effects
can
be
previsualized
in
real-‐9me
• Transparent
and
traceable
– Keep
the
full
history
of
your
data
transforma9on
logics
and
model
designs
• Easy
access
to
machine
learning
– Get
started
with
our
app
templates,
bootstrap
your
model
and
features
selec9ons,
then
go
further!
• Scalable
and
Produc9on
Ready
– Apply
your
recipes
on
your
cluster
on
terabytes
of
data
15.
Dataiku
at
a
glance
• Founded
in
2013
by
Data
and
Search
Engine
veterans
• From
“data”
and
“haïku”
“data
can
be
big
solu;on
would
be
small
feel
the
hot
wind”
• 1
goal:
make
Data
Science
accessible
to
anyone!
Contact:
marc.baAy@dataiku.com
-‐
@baAymarc
-‐
github.com/dataiku