SMACK Stack 1.1

Elodina is a big data as a service platform built on top
of open source software.
The Elodina platform solves today’s data
analytics needs by providing the tools and
support necessary to utilize open source
technologies.
http://www.elodina.net/

Whats SMACK Stack?
SMACK stack 1.0 has been traditionally Spark, Mesos, Akka, Cassandra and
Kafka lots https://dzone.com/articles/smack-stack-guide and lots lots more https:
//www.google.com/webhp?q=smack%20stack
Now we are going to introduce SMACK Stack 1.1 and talk more about dynamic
compute, micro services, orchestration, micro segmentation all part of what you
can do now with Streaming, Mesos, Analytics, Cassandra and Kafka

The free lunch is over!
http://www.gotw.ca/publications/concurrency-ddj.htm

Many industries still don’t get it
XML is everywhere but we have alternatives!
We can support XML interface but don’t have to take on the burden of the extra
data. You can save A LOT of overheard just by having a pre-processing step
taking the XML, turning it into Avro and processing and storing that.
It works https://github.com/elodina/xml-avro
You can even process the response in Avro but return the result in XML, more on
that later though!

You need to be running Mesos. Lots of options here!
What is most important is that you abstract your “Provider” from your “Grid”.
What is “The Grid”?
It is your PaaS layer you deploy too that runs your software. (aka your new
awesome super computer)
The grid is your mesos cluster. You are likely going to have more than one so plan
accordingly. Think of it as immutable infrastructure, the computer does.
Step 1

“Provider” of compute resources

The Grid … 2.0 ...
https://github.com/elodina/sawfly/blob/master/cloud-deploy-grid.md
Program against your datacenter like it’s a single pool of resources Apache Mesos abstracts CPU,
memory, storage, and other compute resources away from machines (physical or virtual), enabling
fault-tolerant and elastic distributed systems to easily be built and run effectively. Mesosphere’s Data
Center Operating System (DCOS) is an operating system that spans all of the machines in a datacenter
or cloud and treats them as a single computer, providing a highly elastic and highly scalable way of
deploying applications, services, and big data infrastructure on shared resources. DCOS is based on
Apache Mesos and includes a distributed systems kernel with enterprise-grade security.

But there is more!
● Provisioning
● Micro Segmentation
● Orchestration
● Configuration Management
● Service Discovery
● Deployment Isolation and Identification
● Telemetry, Tracing, Ops Stuff, Etc
● Oh My!
It boils back down into stacks! https://github.com/elodina/stack-deploy and how
you are working with your schedulers in your cluster ultimatlly.

In the Grid you need Schedulers!
● Kafka – Producer/Consumer-based message queue management
● Exhibitor – Supervisor for distributed persistence (like ZooKeeper)
● Cassandra/DSE – HA, scalable, distributed NoSQL data storage
● Storm – Topology-based Real-time distributed data streaming
● Monarch – Distributed Remote Procedure Calls, Kafka REST interface and schema repository
● Zipkin – Configure, launch and manage Zipkin distributed trace on Mesos
● HDFS – Configure, launch and manage HDFS on Mesos (coming soon)
● Stockpile – Consumer to “stock pile” data into persistent storage (mesos scheduler only for c* now)
● MirrorMaker – Consumer to make a mirror copy of data to destination
● StatsD – Producer to pump StatsD on Mesos into Kafka for consumption, preserves layers
● SysLog – Producer to pump Syslog on Mesos into Kafka for consumption, preserves layers
https://github.com/elodina/

Virtual Telemetry “Data Center” In the Grid
ZipkinQATeamBuild92
● 1x Exhibitor-Mesos
● 1x Exhibitor
● 1x DSE-Mesos
● 1x Cassandra node
● 1x Kafka-Mesos
● 1x Kafka 0.8 broker
● 1x Zipkin-Mesos
● 1x Zipkin Collector
● 1x Zipkin Query
● 1x Zipkin Web
“cluster”
“zone”
“Stack” - defaultSimpleZipkinFull
“data center”

Stack Deploy In Action
./stack-deploy addlayer --file stacks/cassandra_dc.stack --level datacenter
./stack-deploy addlayer --file stacks/cassandra_cluster.stack --level cluster --parent cassandra_dc
./stack-deploy addlayer --file stacks/cassandra_zone1.stack --level zone --parent cassandra_cluster
./stack-deploy addlayer --file stacks/cassandra_zone2.stack --level zone --parent cassandra_cluster
./stack-deploy add --file stacks/cassandra.stack
./stack-deploy run cassandra --zone cassandra_zone1

Casandra https://github.com/elodina/datastax-enterprise-mesos

Apache Kafka
• Apache Kafka
o http://kafka.apache.org
• Apache Kafka Source Code
o https://github.com/apache/kafka
• Documentation
o http://kafka.apache.org/documentation.html
• Wiki
o https://cwiki.apache.org/confluence/display/KAFKA/Index

It often starts with just one data pipeline

Reuse of data pipelines for new producers

Reuse of existing providers for new consumers

Eventually the solution becomes the problem

Kafka decouples data-pipelines

A high-throughput distributed messaging system
rethought as a distributed commit log.

Mesos Kafka http://github.com/mesos/kafka

Streaming & Analytics
● The landscape of streaming is about to get more fragmented and harder to
navigate. This is not all bad news and it is not much different than where we
were with NoSQL 6 years ago or so.
● Different systems are getting really (really (really)) good at different things.
○ Dag based systems
○ Event based systems
○ Query & Execution Engines
○ Streaming Engines
○ Etc!

Storm (and Storm Topology based systems)

Storm Nimbus
{
"id": "storm-nimbus",
"cmd": "cp storm.yaml storm-mesos-0.9.6/conf && cd storm-mesos-0.9.6 && ./bin/storm-mesos nimbus -c mesos.master.url=zk:
//zookeeper.service:2181/mesos -c storm.zookeeper.servers="["zookeeper.service"]" -c nimbus.thrift.port=$PORT0 -c topology.
mesos.worker.cpu=0.5 -c topology.mesos.worker.mem.mb=615 -c worker.childopts=-Xmx512m -c topology.mesos.executor.cpu=0.1 -c
topology.mesos.executor.mem.mb=160 -c supervisor.childopts=-Xmx128m -c mesos.executor.uri=http://repo.elodina.s3.amazonaws.
com/storm-mesos-0.9.6.tgz -c storm.log.dir=$(pwd)/logs",
"cpus": 1.0,
"mem": 1024,
"ports": [31056],
"requirePorts": true,
"instances": 1,
"uris": [
"http://repo.elodina.s3.amazonaws.com/storm-mesos-0.9.6.tgz",
"http://repo.elodina.s3.amazonaws.com/storm.yaml"
]
}

Storm UI
{
"id": "storm-ui",
"cmd": "cp storm.yaml storm-mesos-0.9.6/conf && cd storm-mesos-0.9.6 && ./bin/storm ui -c ui.port=$PORT0 -c nimbus.thrift.port=31056 -c nimbus.
host=storm-nimbus.service -c storm.log.dir=$(pwd)/logs",
"cpus": 0.2,
"mem": 512,
"ports": [31067],
"requirePorts": true,
"instances": 1,
"uris": [
"http://repo.elodina.s3.amazonaws.com/storm-mesos-0.9.6.tgz",
"http://repo.elodina.s3.amazonaws.com/storm.yaml"
],
"healthChecks": [
{
"protocol": "HTTP",
"portIndex": 0,
"path": "/",
"gracePeriodSeconds": 120,
"intervalSeconds": 20,
"maxConsecutiveFailures": 3
}
]
}

Storm Kafka - new spouts & bolts for Kafka 8, 9, ...

Go Kafka Client - Fan Out Processing
https://github.com/elodina/go-kafka-client-mesos
● Dynamic Kafka Log workers
● Blue/Green Deploy Support
● Fan Out Processing
● Auditable
● Batches
● Scalable/Auto-Scalable

Questions?
http://www.elodina.net

SMACK Stack 1.1

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to SMACK Stack 1.1

Similar to SMACK Stack 1.1 (20)

More from Joe Stein

More from Joe Stein (20)

Recently uploaded

Recently uploaded (20)

SMACK Stack 1.1