Onyx

Onyx
Distributed Computation for the Cloud

Different Computation Engines
● Apache Spark
○ Large Scale Data Processing
○ Streaming Support
○ Micro Batch Support
○ Written in Scala
○ Supports Python and Java too

● Apache Storm
○ Stream Data Processing
○ Written in Java

● Apache Flink
○ Stream first and then batch
○ Streaming Data Flow Engine
○ Written in JAVA and Scala

Why Clojure?
Clojure is.....
● A dialect of Lisp
● Interop
● Emphasizes functional programming
● Runs on the Java Virtual Machine
● Designed for Concurrency

Truthiness
What is truthy?
● true
● “False”
● [ ]
● (= 1 1)
● 0
● 1
What is falsey?
● false
● nil
● ( = 1 2 )

Clojure Collections
Lists
● Singly linked lists
● First item in calling
position
● Heterogeneous
elements
‘( 1 2 3 4 “foo” :bar )
Vectors
● Simply evaluate each
item in order.
● Fast looks ups
● Heterogeneous
elements
[ 1 2 3 4 “foo” :bar ]

Clojure Collections
Maps
● Maps store Keys and
Values
{:name “abhishek”}
Sets
● Store zero or more
unique items
#{ :a :b 1 2 3 }

Clojure Functions
● First Class
● Higher-Order
● Pure Functions
def or defn?
● Both bind to symbol or
name
● def is only evaluated
once
● defn is evaluated every
time it is called

What Onyx brings?
● Clojure Philosophy
● How to rethink about data?
● How to program your programs in Distributed Systems

What Onyx is?
● Masterless
● Cloud Scale
● Fault Tolerant
● High Performance Distributed Computation System
● Batch and Stream hybrid processing model
● Written in Clojure

Onyx Program
● Read data from the source
● Transform the data into various sources
● Write the data into the target

WorkFlow
:in
|
:split-sentence
|
:count-words
|
:output
(def workflow
[[:in :split-sentence]
[:split-sentence :count-words]
[:count-words :out]])

Segment
● Segments are Clojure maps
{ :increment 42}

Task
● input
● processing
● output

Catalog
(def catalog
[{:onyx/name :in
:onyx/plugin :onyx.plugin.core-async/input
:onyx/type :input
:onyx/medium :core.async
:onyx/batch-size batch-size
:onyx/max-peers 1
:onyx/doc "Reads segments from a core.async channel"}

Catalog
{:onyx/name :split-sentence
:onyx/fn :aggregation.core/split-sentence
:onyx/type :function
:onyx/batch-size batch-size}

Catalog
{:onyx/name :count-words
:onyx/fn :clojure.core/identity
:onyx/type :function
:onyx/group-by-key :word
:onyx/flux-policy :kill
:onyx/min-peers 1
:onyx/batch-size 1000}

Catalog
{:onyx/name :out
:onyx/plugin :onyx.plugin.core-async/output
:onyx/type :output
:onyx/medium :core.async
:onyx/max-peers 1
:onyx/batch-size batch-size
:onyx/doc "Writes segments to a core.async channel"}])

Flow Conditions
[{:flow/from :input-stream
:flow/to [:process-children]
:my/max-child-age 17
:flow/predicate [:my.ns/child? :my/max-child-age]
:flow/doc "Emits segment if this segment is a child."}]

Windows
● Fixed
● Sliding
● Global
● Session

Triggers
● :timer
● :segment
● :punctuation
● :watermark

Job
{:workflow workflow
:catalog catalog
:lifecycles lifecycles
:windows windows
:triggers triggers
:task-scheduler :onyx.task-scheduler/balanced}

Onyx Architecture Overview
High Level Components
● Peer
● Zookeeper
● Aeron

Peer
● Peer is the node in cluster that does the task.

Sneak Peak into Zookeeper
● Apache Zookeeper is open source tool from Apache.
● Originally developed at Yahoo.
● Zookeeper is written in Java and it is platform
independent.
● Zookeeper service can run in 2 mode
○ Standalone
○ Quorum

How to interact with Zookeeper?
● Zookeeper CLI
○ create /avengers "infinitywar"
○ get /avengers
○ get /avengers [watch] 1
○ set /avengers endgame
○ delete /avengers
○ ls /
○ stat /avengers

How to interact with Zookeeper?
● Exhibitor
○ git@github.com:soabase/exhibitor.git

Aeron
● Messaging layer in Onyx
● Takes care of transfer of segments between peers.

Job Schedulers
● Greedy
● Balance
● Percentage

Task Schedulers
● Balanced
● Percentage
● Colocation

Tags
{...
:onyx/tenancy-id "my-cluster"
:onyx.peer/tags [:datomic]
...
}

Official Plugins (in/out)
● onyx-seq
● onyx-durable-queue
● onyx-elasticsearch
● onyx-http
● onyx-amazon-sqs
● onyx-amazon-s3

Official Plugins (in/out)
● onyx-core-async
● onyx-kafka
● onyx-kafka-0.8
● onyx-datomic
● onyx-redis
● onyx-sql
● onyx-bookkeeper

Onyx Deployment
● Docker
● Kubernetes
● Apache Mesos
● DCOS
● Shared File System like AWS S3
● Any Cloud VM

About Me
(def about-me
{:name “Abhishek Anand Amralkar”
:shortname “@aaa”
:from “Talentica Software Pvt. Ltd”
:social { :blog “https://medium.com/@aamralkar”
:twitter “https://twitter.com/aamralkar”
:github “https://github.com/abhishekamralkar”} })

Onyx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Onyx

Similar to Onyx (20)

More from Abhishek Amralkar

More from Abhishek Amralkar (6)

Recently uploaded

Recently uploaded (20)

Onyx

Editor's Notes