DataTorrent Presentation @ Big Data Application Meetup

Thomas Weise <thomas@datatorrent.com>
Dec 2nd, 2015
Introduction to Open Source Unified Streaming and Fast Batch Platform
Apache Apex (incubating)

© 2015 DataTorrent
Apex Platform Overview
2

© 2015 DataTorrent
Apache Malhar Library
3

© 2015 DataTorrent
Native Hadoop Integration
4
• YARN is
the
resource
manager
• HDFS used
for storing
any
persistent
state

© 2015 DataTorrent
Application Programming Model
5
 A Stream is a sequence of data tuples
 An Operator takes one or more input streams, performs computations & emits one or more output streams
• Each Operator is YOUR custom business logic in java, or built-in operator from our open source library
• Operator has many instances that run in parallel and each instance in single-threaded
 Directed Acyclic Graph (DAG) is made up of operations and streams
Directed Acyclic Graph (DAG)
Output StreamTuple Tuple
er
Operator
er
Operator
er
Operator
er
Operator

© 2015 DataTorrent
Application Specification
6

© 2015 DataTorrent
Partitioning and Scaling Out
7
• Operators can be dynamically
scaled
• Flexible Streams split
• Parallel partitioning
• MxN partitioning
• Unifiers

© 2015 DataTorrent
Advanced Windowing Support
8
 Application window
 Sliding window and tumbling window
 Checkpoint window
 No artificial latency

© 2015 DataTorrent
Guarantees and Performance
9
Stateful Fault Tolerance Processing Semantics Data Locality
 Supported out of the box
– Application state
– Application master state
– No data loss
 Automatic recovery
 Lunch test
 Buffer server
 At least once
 At most once
 Exactly once
 Stream locality for placement of
operators
 Rack local – Distributed
deployment
 Node local – Data does
not traverse NIC
 Container local – Data
doesn’t need to be
serialized
 Thread local – Operators
run in same thread
 Data locality

© 2015 DataTorrent
Dynamic Updates
10
 Dynamic topology updates
– Properties of operators can be changed
– New operators can be added

© 2015 DataTorrent
Resources
15
Apache Apex Community Page - http://apex.incubator.apache.org/
Apache Apex LinkedIn Group

DataTorrent Presentation @ Big Data Application Meetup

More Related Content

What's hot

Viewers also liked

Similar to DataTorrent Presentation @ Big Data Application Meetup

Recently uploaded

DataTorrent Presentation @ Big Data Application Meetup