Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kick-Start with SMACK Stack

1,730 views

Published on

SMACK is a combination of Spark, Mesos, Akka, Cassandra and Kafka. It is used for pipelined data architecture which is required for the real time data analysis and to integrate all the technology at the right place to efficient data pipeline.

Published in: Software
  • Be the first to comment

Kick-Start with SMACK Stack

  1. 1. Kick-Start with SMACK Stack Sandeep Purohit Software Consultant Knoldus Software LLP
  2. 2. Agenda: ● What is SMACK? ● Why SMACK? ● Brief introduction of technologies ● How to Integrate all the technologies to create the data pipeline ● Demo
  3. 3. What is SMACK? ● Spark :Apache Spark is a fast and general-purpose cluster computing system. ● Mesos :Cluster resource management system that provide efficient resource allocation. ● Akka :Akka is a toolkit and runtime for building highly concurrent, distributed, and resilient message-driven applications on the JVM. ● Cassandra :The Apache Cassandra database is the right choice when you need scalability and high availability. ● Kafka :distributed messaging system for handling real time data.
  4. 4. Why SMACK? ● Smack is used for pipelined data architecture which is required for the real time data analysis. ● Smack is use to integrate all the technology at the right place to efficient data pipeline. ● Smack is use to linearly scale your whole cluster without any hassle
  5. 5. SMACK Pipeline Architecture
  6. 6. Why Spark? ● Its general purpose big data processing engine which have 4 main components spark core, spark streaming, spark ml, spark graphx ● So we can process our data which any of the component at real time. ● Its provide fault tolerant for real time application.
  7. 7. Why Cassandra? ● Cassandra implements “no single points of failure ● Cassandra Write-path is so fast so it can handle real-time data easily ● It will support Datacenter architecture so we can easily use different DC for different things. Ingestion DC Analysis DC Cassandra Cluster
  8. 8. Why Mesos? Mesos Master Mesos Master Standby Mesos Master Standby Zookepeer Mesos Slave Mesos Slave Mesos Slave
  9. 9. Models in SMACK ● In SMACK models are Scala and AKKA. ● We can use models to write highly concurrent and parallel applications. ● Example: We can use akka modules according to our use case like akka-http, akka-scheduler, akka priority mailboxes etc.
  10. 10. Models use in SMACK Akka-Http Akka-Scheduler
  11. 11. Why Kafka ● streams of data efficiently and in real time ● Use Kafka for fault tolerance. ● To create bridge between two applications. Streaming Source Kafka Broker Spark Receiver
  12. 12. Architecture of Spark and cassandra Cassandra Cluster Spark Worker Spark Worker Spark Worker Spark Worker Spark worker nodes will get the data on local node so it will avoid latency
  13. 13. Spark, Mesos, Cassandra Mesos Slaves and cassandra nodes are collocated to enforce the better data locality for spark. Driver Program Mesos Master Mesos slave Cassandra node Mesos slave Cassandra node Mesos slave Cassandra node
  14. 14. Demo Application Architecture Tweets Store tweets in kafka topic Retrieve hashtags Evaluate Top hashtag in every 10 seconds Store tweets in cassandra table
  15. 15. Demo SMACK_Tweets
  16. 16. Thank You!!

×