A data layer in clojure

A data layer in
Clojure
@sbelak
simon@goopti.com

• Started in machine learning
• Turned to data science and
helped 20+ companies become
data-driven
• Now leading data science
department at GoOpti

Self-service infrastructure
for data scientists

The analytics chasm
Ideal. Almost real-time, can
be done during brainstorming
without disrupting ﬂow
< 2min < 20min project
squeeze in
somewhere
in the day
fail
roadmap 
ahoy!

My goto architecture
KafkaDB Events
Onyx Onyx
Onyx
Persist all events to S3
• time travel
• query with AWS Athena

Onyxa masterless, cloud scale, fault tolerant, high
performance distributed computation system
… written entirely in Clojure

Clojure at a glance
• Lisp running on JVM
• Functional, dynamic, immutable
• Excellent concurrency and state management
support
• Unparalleled data manipulation
• Good Java interoperability

Onyx at
• In production for almost a year  
• ETL
• online machine learning
• ofﬂine (batch) machine learning
• ad-hoc analysis

Job =
[[:input :processing-1]
[:input :processing-2]
[:processing-1 :output-1]
[:processing-2 :output-2]]
[{:flow/from :input-stream
:flow/to [:process-adults]
:flow/predicate :my.ns/adult?
:flow/doc "Emits segment if an adult.”}]
workﬂow
+ ﬂow conditions
+ catalogue[{:onyx/name :add-5
:onyx/fn :my/adder
:onyx/type :function
:my/n 5
:onyx/params [:my/n]}
{:onyx/name :in
:onyx/plugin :onyx.plugin.core-async/input
:onyx/type :input
:onyx/medium :core.async
:onyx/batch-size batch-size
:onyx/max-peers 1
:onyx/doc "Reads segments from a core.async channel"}
{:onyx/name :out
:onyx/plugin :onyx.plugin.core-async/output
:onyx/type :output
:onyx/doc "Writes segments to a core.async channel"}]

Catalogue
[{:onyx/name :add-5
:onyx/fn :my/adder
:onyx/type :function
:my/n 5
:onyx/params [:my/n]}
{:onyx/name :in
:onyx/plugin :onyx.plugin.core-async/input
:onyx/type :input
:onyx/batch-size batch-size
:onyx/max-peers 1
:onyx/doc "Reads segments from a core.async channel"}
{:onyx/name :out
:onyx/plugin :onyx.plugin.core-async/output
:onyx/type :output
:onyx/doc "Writes segments to a core.async channel"}]
Vanilla Clojure function
 
(defn adder [n {:keys [x] :as segment}]
(assoc segment :x (+ n x))))
Plugins (I/O)
seq, async, Kafka,
Datomic, SQL, S3,
SQS, …
parameter
self-documenting

Computation entirely
described with data
data
is
code!

Everything can be run
locally!

Resilience and handling
state
• Activity log
• Window and trigger states checkpointed
• Resume points
• Conﬁgurable ﬂux policies

It’s not about scaling,
but clean architecture

Machine learning with Onyx
• Hyperparameter server build on top of Onyx
parameters
• Batch & streaming mode
• Monitoring: performance metrics, side channels for
partial results/introspection into computiation
• Everything is data so easy to build tools around

Putting “data is code”
to work

Describing data with
clojure.spec
composing smaller
parts into the whole}
code
is
data!

Queryable data
descriptions
Turn spec into a graph
A fully interactive and open type system!
order
promo code
user
account age
country
always always
alwaysmaybe

“Composition is about
decomposing.”
— E. Normand

Case study: autogenerating materialised views
Kafka
Materialised
views
Events
External data
Automatic view generation
• Event & attribute ontology
• Manual (via spec)
• Inferred
• Statistical analysis (seasonality
detection, outlier removal, …)
Onyx Onyx
Onyx

Automatic view generation
1. Walk spec registry
2. Apply rules
1. Deﬁne new view (spec)
2. Trigger Onyx job that creates the view
⤾

Everything should be
live and interactive

Computation graphs are
a great way to structure
data processing code

Queryable data and
computation descriptions
supercharge interactive
development and are a
great building block for
automation

Questions
@sbelak
simon@goopti.com

viebel.github.io/klipse/examples/onyx.html
onyxplatform.org
onyxplatform.org/jekyll/update/2017/02/08/Pyroclast-
Preview-Simulation.html

A data layer in clojure

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A data layer in clojure

Similar to A data layer in clojure (20)

More from Simon Belak

More from Simon Belak (20)

Recently uploaded

Recently uploaded (20)

A data layer in clojure