4th Athens Big Data Meetup - 1st Talk - Big Data Streaming Processing Using Apache Storm‏

Big Data - Meetup
Big Data
Stream processing
using
Apache Storm
Athens - May 2016

Who we are?
● Adrianos Dadis (@qiozas )
● Patroclos Christou (@christoupat )
● Eleftheria Chavelia
● Sofia Nomikou

Agenda
● Apache Storm
● Apache Kafka
● Streaming application demo

Why stream processing?
● Increasement of available real-time data
● Extract actionable intelligence on real-time
● Act on real-time

Use Cases examples
● Fraud detection
● Network monitoring
● Smart order routing
● E-commerce
● Bandwith allocation optimization
● Algorithmic trading

End-to-End Deployment
Real-Time
Data Stream
Streamimg Processing
Solution
Dashboards
Data Store
Applications
Alerts
Batch
Processing

Apache Storm
● Creator: Nathan Marz (2011)
● Distributed real-time computation system for processing
large volumes of high-velocity data
● Characteristics:
– Fast
– Scalable
– Fault-tolerant
– Reliable
– Easy to operate
– Easy to develop

Storm core concepts
● Tuple : Storm uses tuples as its data model
● Stream : An unbounded sequence of tuples
● Spout : A source of streams in a topology
● Bolt : All processing in topologies is done in bolts
● Topology : DAG of Spout and Bolts

Storm Architecture
Nimbus
Zookeper
Supervisor
Worker
Worker
Zookeper
Zookeper
Supervisor
Worker
Worker
Supervisor
Worker
Worker
Supervisor
Worker
Worker
Master
Node
Cluster
Coordination
Node
Coordination
Processing
Worker
Nimbus
Nimbus

Worker Internal Messaging
Worker Receiver
Thread
Router
Inbound Queue
Disruptor
Outbound Queue
Disruptor
Task
Executor Thread
Send
Thread
List<Tuple>
Transfer Buffer
List<Tuple>
Receiver Buffer
Worker Transfer
Thread
Worker
Port Worker
Port

Stream Grouping
● Shuffle
● LocalOrShuffle
● All
● Global
● Field
● Partial Key
● Direct

Reliable Processing
{A} {B}
{D}
{F}
{C}
{E}
{H}
{X}
{G}
● Acking
● Anchoring
● Failures
ACK
FAIL

Streaming Windows
● Sliding Windows
● Tumbling Windows
{...}{...}{...}{...}{...}{...}{...}{...}{...}{...}
Time
{...}{...}{...}{...}{...}{...}{...}{...}{...}{...}
Time

Storm Trident
● High level abstraction on top of Storm
● Micro Batching
● Stateful
● Built-in support:
– Functions
– Fliters
– Merges and Joins
– Aggregations
– Grouping

Storm 1.x Features
● HA Nimbus
● Distributed Cache API
● Pacemaker - Heartbeat Server
● Automatic Backpressure
● Resource Aware Scheduler
● State Management
● Native Streaming Windows

Storm Integrations
● Kafka
● Redis
● Hive, HDFS
● HBase, Cassandra
● MongoDB
● Elasticsearch, Solr
● JDBC
● MQTT

Storm modes
● One-at-a-time processing (pure Storm)
– Very low latency
– Very simple development model
– At-Most-Once and At-Least-Once semantics
● Micro batch processing (Storm Trident)
– Increased latency on event
– Better throughput for large rates
– More complex development model
– Exactly-Once semantics

Messaging Systems
● Core needs:
– Decouple processing from data producers
– Buffer unprocessed messages
● Models:
– Queuing
– Publish-Subscribe
● Frameworks
– Kafka
– RabbitMQ
– ActiveMQ

Apache Kafka
● Distributed, partitioned, replicated commit log service
● Publish-Subscribe model
● Maintains feeds of messages in Topics
● Automatic Replication and Retention
● Brokers

Apache Kafka
● Offset uniquely identifies each message within the partition
● Consumers coordinate what to read
● Consumer & Consumer Group

Implementing Big Data Apps
● Design for scalability from day one
● Queries drive schema design
● Failure (HW or data) is a normal case
● Continuous Integration
● Metrics & Monitoring from day one
● Appropriate people

Sentiment Analysis Demo
Random
Sentence
Spout
Stemming
Bolt
Positive
Scoring
Bolt
Negative
Scoring
Bolt
Final
Scoring
Bolt
Persistence
Bolt
Kafka
Topic
Kafka
Spout
Kafka
Topic
NoSQL
src => https://github.com/qiozas/sentiment-analysis-storm

Athens Big Data - Meetup - 2016
THANK YOU :-)
[ Updates / Questions / Comments ]
@qiozas
@christoupat

4th Athens Big Data Meetup - 1st Talk - Big Data Streaming Processing Using Apache Storm‏

More Related Content

What's hot

Similar to 4th Athens Big Data Meetup - 1st Talk - Big Data Streaming Processing Using Apache Storm‏

More from Athens Big Data

Recently uploaded

4th Athens Big Data Meetup - 1st Talk - Big Data Streaming Processing Using Apache Storm‏