SlideShare a Scribd company logo
1 of 29
www.eleks.comwww.eleks.com
Azure Real-Time Analytics And Kappa Architecture
with Kafka and Cassandra clusters
Vitalii Bondarenko
vitaliy.bondarenko@eleks.com
Agenda
 Streaming analytics in Azure
 Apache Cassandra
 Apache Kafka
 Confluent Platform and Kafka Streams
 Examples
Big Data Approach
RDBMS Approach
• Massive Parallel Processing (Scalability)
• In-memory DB (Streaming and
compressing)
• Colum stores (BI)
Big Data Approach
• Hadoop (HDFS + MapReduce)
• SQL on HDFS
• Scalable NoSQL
• Batch issue
Lambda architecture
Batch & Stream Processing
• Batch layer
• Stores master dataset
• Compute arbitrary views
• Horizontally Scalable
• Speed layer (Streaming)
• Fast, incremental algorithms
• Batch layer eventually overrides speed
layer
• Serving layer
• Random access to batch views
• Updated by batch and Streaming layer
Kappa architecture
Stream Processing with Scalable Storages
• Everything is a stream
• Immutable unstructured data sources
• Single analytics framework
• Windows on Streaming Layer
• Linearly scalable Serving Layer
• Interactive querying
Azure Streaming Analytics
• Easy to use
• Scalable
• Connectivity
• SQL, UDF, Reference Data
Streaming processing
Windows
Stateful
Stateless
Fault Tolerance
Scalability
Low Latency
SELECT
Make,
System.TimeStamp AS Time,
COUNT(*) AS [Count]
INTO
AlertOutput
FROM
Input TIMESTAMP BY Time
GROUP BY
Make,
TumblingWindow(second, 10)
HAVING
[Count] >= 3
Demo: Azure Streaming Analytics
• Data from Event Hub
• Geo-Analytics on Streaming
• Visualization on PowerBI
• Demo for streaming analytics projects
Fast Data Platform
• Real-time processing
• Raw Data fast writing
• Scalabale
• Distributed
Demo: Azure Streaming Analytics
• Demo for streaming analytics projects
• Platform deployment
Apache Cassandra
• Multi-master, low-latency, shared nothing
• Distributed
• No single point of failure
• Linearly Scalable
• Multi-datacenter configuration
• AP with tunable consistency
Nodes and distributions
• Distributed by Tokens from -2^63 to 2^63-1
• Hash from partition key. Murmur3
• Virtual Nodes
• Data Centers and Racks, Gossip (each 1 sec)
• Replication Strategy (SimpleStrategy, NetworkTopologyStrategy)
• Replication factor (usually 3), Gossip and Coordinators
• Tunable consistency, strong and eventual
• Consistency Levels (One, Two, Three, Any, All, Quorum,
Local_Quorum, Local_One…)
• (R +W) > N
Cassandra Objects
Column, which is a name/value pair
Row, which is a container for columns referenced by
a primary key
Table, which is a container for rows
Keyspace, which is a container for tables
Cluster, which is a container for keyspaces that
spans one or more nodes Tombstones for deleted rows
TTL for deleting rows
Compaction for merging SSTables
Secondary Indexes for filtering
CQL (Cassandra Query Language)
CQL (Cassandra Query Language)
• Similar to SQL
• No Joins, Counters, Static Columns
• Keyspaces with replication factor
• SET, LIST, MAP, Tuples
• TTL INSERT INTO myTable (id, myField) VALUES (2, 9) USING TTL 86400; /*24H*/
• Ordering and Filtering is not working sometimes (always use partition key)
CREATE TABLE loads (
machine inet,
cpu int,
mtime timeuuid,
load float,
PRIMARY KEY ((machine, cpu), mtime)
) WITH CLUSTERING ORDER BY (mtime DESC);
/* Select Data within a range */
SELECT * FROM myTable WHERE myField > 5000
AND myField < 100000;
Bad Request: Cannot execute this query as it might involve data
filtering and thus may have unpredictable performance. If you want
to execute this query despite the performance unpredictability,
use ALLOW FILTERING.
Data Modeling: Query-First Design
RDBMS: Data > Models > Application
Cassandra: Application > Models > Data
RDBMS
Device
User
Location
Values (Timestamp, Values)
Cassandra (no joins)
raw_values(Timestamp, Device, User, Location, Values)
day_values
hour_location_values
hour_location_device_values
Demo: Apache Cassandra in use
• Demo for Apache Cassandra
Apache Kafka
• From LinkedIn, Open Source from 2012
• Service Bus
• Small messages (events)
• Scalable Broker System
• Durable and Distributed
• Very fast (parallelism on partitions)
• No removes from queue, retention
• Streaming processing capability
LinkedIn:
• 1400 brokers
• 13M+ messages/sec
• 2.75GB per second
Brokers, Topics
• Distributed Service Bus
• Broker as virtual servers
• Topics as logical data storage
Writes and reads
• Append Only
• Commit log
• Consumer offset (from beginning)
• Commit read to Kafka topic _consumer_offsets
• Commit when read data
• Retention period 7 days
Partitions, Replication, Zookeeper
• Workers, tasks, statuses
• Find leaders for a partition
• Distribute tasks
• Monitor results
• Configuration
• Health statuses
• Group memberships (elections)
• Durable and Scalable
Message (timestamp, id, Payload (binary))
Replication Factor
Partition for each broker
Partition = commit log
Replica Leader elected and saved to Zookeeper
Consumer Groups • Continues Polling the brokers for topic
• Consumers Group.Id for parallelism and scalability
• Consumers registered in Zookeper
• Coordinators assign partitions to consumers
• Coordinator rebalance partitions between consumers
• Amount of Consumers = Amount of Partitions
Demo: Apache Kafka in use
• Apache Kafka CLI
• Fault Tolerance
Confluent Platform
• Schema registry
• Kafka Connect
• REST Proxy
• Kafka Stream
• Confluent is the contributor
• Streaming platform
• Open Source main parts
Kafka Streams
• Easy to deploy and maintain
• Integrated with Confluent Platform
• Process topology
• Micro-services
• State store
KStream and KTable
• Execution topology for process
• Statefull, no need to go to DB for every event
• KStream and KTables
• KTable is local distributed database
• Data Locality, No network roundtrips
• Elastic and Scalable
Demo: Streaming App Examples
• Cassandra Shema
• Connectors
• Kafka Streams
• DS/OS
Cluster and results
Kafka Cluster
Nodes: 9
Amazon instance type: m4.2xlarge
CPU: 8
Memory: 32 Gb
SSD: 100Gb
Topics: 3
Partitions: 6 per topic
Replication Factor: 3
Producers: 6
Average message size: 1Kb
440 000 messages / second
Cassandra Cluster
Nodes: 12
Amazon instance type: m4.2xlarge
CPU: 8
Memory: 32 Gb
SSD: 800Gb
Replication Factor: 3
Average write latency: 9 ms
Average read latency: 52 ms
Lessons Learned
• Amazing Capabilities, more than 1M/sec
• New mindset of streaming processing
• Time is important
• SQL Interface for streaming is not ready
• Difficulties in management and scalability
• Difficult to debug
• Lack of documentation and community support
• Design you DB carefully at the beginning for queries
• Cassandra is not RDBMS, select by partition keys
• Eventual consistency
• Very Expensive! (lots of nodes)
www.eleks.comwww.eleks.com
Q&A
Vitalii Bondarenko
vitaliy.bondarenko@eleks.com

More Related Content

What's hot

Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Evan Chan
 
20150627 bigdatala
20150627 bigdatala20150627 bigdatala
20150627 bigdatalagethue
 
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life ExampleKafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Exampleconfluent
 
HBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay SearchHBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay SearchCloudera, Inc.
 
Kafka website activity architecture
Kafka website activity architectureKafka website activity architecture
Kafka website activity architectureOmid Vahdaty
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data Omid Vahdaty
 
Kafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringKafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringAnant Rustagi
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideMohammed Fazuluddin
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Lucidworks
 
Kafka spark cassandra webinar feb 16 2016
Kafka spark cassandra   webinar feb 16 2016 Kafka spark cassandra   webinar feb 16 2016
Kafka spark cassandra webinar feb 16 2016 Hiromitsu Komatsu
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelinprajods
 
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiSMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiCodemotion Dubai
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkVinoth Chandar
 
Cassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUGCassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUGgdusbabek
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaBack-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaAkara Sucharitakul
 
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...Omid Vahdaty
 
Cassandra - Tips And Techniques
Cassandra - Tips And TechniquesCassandra - Tips And Techniques
Cassandra - Tips And TechniquesKnoldus Inc.
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsKeeyong Han
 

What's hot (20)

Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
 
20150627 bigdatala
20150627 bigdatala20150627 bigdatala
20150627 bigdatala
 
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life ExampleKafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
 
HBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay SearchHBaseCon 2013: Near Real Time Indexing for eBay Search
HBaseCon 2013: Near Real Time Indexing for eBay Search
 
Kafka website activity architecture
Kafka website activity architectureKafka website activity architecture
Kafka website activity architecture
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
 
Kafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroringKafka Lambda architecture with mirroring
Kafka Lambda architecture with mirroring
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
Kafka spark cassandra webinar feb 16 2016
Kafka spark cassandra   webinar feb 16 2016 Kafka spark cassandra   webinar feb 16 2016
Kafka spark cassandra webinar feb 16 2016
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
 
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiSMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion Dubai
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on Spark
 
Cassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUGCassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUG
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
Apache Spark Components
Apache Spark ComponentsApache Spark Components
Apache Spark Components
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaBack-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
 
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
AWS Big Data Demystified #3 | Zeppelin + spark sql, jdbc + thrift, ganglia, r...
 
Cassandra - Tips And Techniques
Cassandra - Tips And TechniquesCassandra - Tips And Techniques
Cassandra - Tips And Techniques
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data Analytics
 

Similar to Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with Kafka and Cassandra clusters” AI&BigDataDay 2017

Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
 
cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_finalSergioBruno21
 
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen ApplicationsUsing Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen ApplicationsData Con LA
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataChen Robert
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real WorldJeremy Hanna
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introductionfardinjamshidi
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overviewPritamKathar
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsAsis Mohanty
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductEvans Ye
 
NoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application EnablementNoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application EnablementDATAVERSITY
 
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud DatabaseAzure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud DatabaseBizTalk360
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real WorldJeremy Hanna
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdfhothyfa
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Emprovise
 

Similar to Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with Kafka and Cassandra clusters” AI&BigDataDay 2017 (20)

Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_final
 
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen ApplicationsUsing Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Best of re:Invent
Best of re:InventBest of re:Invent
Best of re:Invent
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture Patterns
 
Using the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data ProductUsing the SDACK Architecture to Build a Big Data Product
Using the SDACK Architecture to Build a Big Data Product
 
NoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application EnablementNoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application Enablement
 
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud DatabaseAzure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf04-Introduction-to-CassandraDB-.pdf
04-Introduction-to-CassandraDB-.pdf
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
 

More from Lviv Startup Club

Nikita Artemchuk: Навчання та розвиток продакт менеджера (UA)
Nikita Artemchuk: Навчання та розвиток продакт менеджера (UA)Nikita Artemchuk: Навчання та розвиток продакт менеджера (UA)
Nikita Artemchuk: Навчання та розвиток продакт менеджера (UA)Lviv Startup Club
 
Mykyta Melnyk: Досвід провадження AI Driven Development, кейси використання т...
Mykyta Melnyk: Досвід провадження AI Driven Development, кейси використання т...Mykyta Melnyk: Досвід провадження AI Driven Development, кейси використання т...
Mykyta Melnyk: Досвід провадження AI Driven Development, кейси використання т...Lviv Startup Club
 
Oleksandr Marchenko: Складності росту продуктових команд у сучасних умовах (UA)
Oleksandr Marchenko: Складності росту продуктових команд у сучасних умовах (UA)Oleksandr Marchenko: Складності росту продуктових команд у сучасних умовах (UA)
Oleksandr Marchenko: Складності росту продуктових команд у сучасних умовах (UA)Lviv Startup Club
 
Kseniya Leshchenko: Shared development support service model as the way to ma...
Kseniya Leshchenko: Shared development support service model as the way to ma...Kseniya Leshchenko: Shared development support service model as the way to ma...
Kseniya Leshchenko: Shared development support service model as the way to ma...Lviv Startup Club
 
Valeriy Kozlov: Taming the Startup Chaos: GTD for Founders & Small Teams (UA)
Valeriy Kozlov: Taming the Startup Chaos: GTD for Founders & Small Teams (UA)Valeriy Kozlov: Taming the Startup Chaos: GTD for Founders & Small Teams (UA)
Valeriy Kozlov: Taming the Startup Chaos: GTD for Founders & Small Teams (UA)Lviv Startup Club
 
Veronica Rodionova: Подолання опору впровадженню Agile процесів у командах (UA)
Veronica Rodionova: Подолання опору впровадженню Agile процесів у командах (UA)Veronica Rodionova: Подолання опору впровадженню Agile процесів у командах (UA)
Veronica Rodionova: Подолання опору впровадженню Agile процесів у командах (UA)Lviv Startup Club
 
Iryna Koberniuk: Implementing Major Changes: How to Effectively Update a Prod...
Iryna Koberniuk: Implementing Major Changes: How to Effectively Update a Prod...Iryna Koberniuk: Implementing Major Changes: How to Effectively Update a Prod...
Iryna Koberniuk: Implementing Major Changes: How to Effectively Update a Prod...Lviv Startup Club
 
Hanna Klimushka: Прокачка продуктового мислення для проєктного менеджера (UA)
Hanna Klimushka: Прокачка продуктового мислення для проєктного менеджера (UA)Hanna Klimushka: Прокачка продуктового мислення для проєктного менеджера (UA)
Hanna Klimushka: Прокачка продуктового мислення для проєктного менеджера (UA)Lviv Startup Club
 
Ihor Pavlenko: PMO Risk Management (UA).
Ihor Pavlenko: PMO Risk Management (UA).Ihor Pavlenko: PMO Risk Management (UA).
Ihor Pavlenko: PMO Risk Management (UA).Lviv Startup Club
 
Roman Humeniuk: Формула ненасильницької комунікації та інші техніки для якісн...
Roman Humeniuk: Формула ненасильницької комунікації та інші техніки для якісн...Roman Humeniuk: Формула ненасильницької комунікації та інші техніки для якісн...
Roman Humeniuk: Формула ненасильницької комунікації та інші техніки для якісн...Lviv Startup Club
 
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...Lviv Startup Club
 
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...Lviv Startup Club
 
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...Lviv Startup Club
 
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...Lviv Startup Club
 
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)Lviv Startup Club
 
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)Lviv Startup Club
 
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...Lviv Startup Club
 
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...Lviv Startup Club
 
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...Lviv Startup Club
 
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...Lviv Startup Club
 

More from Lviv Startup Club (20)

Nikita Artemchuk: Навчання та розвиток продакт менеджера (UA)
Nikita Artemchuk: Навчання та розвиток продакт менеджера (UA)Nikita Artemchuk: Навчання та розвиток продакт менеджера (UA)
Nikita Artemchuk: Навчання та розвиток продакт менеджера (UA)
 
Mykyta Melnyk: Досвід провадження AI Driven Development, кейси використання т...
Mykyta Melnyk: Досвід провадження AI Driven Development, кейси використання т...Mykyta Melnyk: Досвід провадження AI Driven Development, кейси використання т...
Mykyta Melnyk: Досвід провадження AI Driven Development, кейси використання т...
 
Oleksandr Marchenko: Складності росту продуктових команд у сучасних умовах (UA)
Oleksandr Marchenko: Складності росту продуктових команд у сучасних умовах (UA)Oleksandr Marchenko: Складності росту продуктових команд у сучасних умовах (UA)
Oleksandr Marchenko: Складності росту продуктових команд у сучасних умовах (UA)
 
Kseniya Leshchenko: Shared development support service model as the way to ma...
Kseniya Leshchenko: Shared development support service model as the way to ma...Kseniya Leshchenko: Shared development support service model as the way to ma...
Kseniya Leshchenko: Shared development support service model as the way to ma...
 
Valeriy Kozlov: Taming the Startup Chaos: GTD for Founders & Small Teams (UA)
Valeriy Kozlov: Taming the Startup Chaos: GTD for Founders & Small Teams (UA)Valeriy Kozlov: Taming the Startup Chaos: GTD for Founders & Small Teams (UA)
Valeriy Kozlov: Taming the Startup Chaos: GTD for Founders & Small Teams (UA)
 
Veronica Rodionova: Подолання опору впровадженню Agile процесів у командах (UA)
Veronica Rodionova: Подолання опору впровадженню Agile процесів у командах (UA)Veronica Rodionova: Подолання опору впровадженню Agile процесів у командах (UA)
Veronica Rodionova: Подолання опору впровадженню Agile процесів у командах (UA)
 
Iryna Koberniuk: Implementing Major Changes: How to Effectively Update a Prod...
Iryna Koberniuk: Implementing Major Changes: How to Effectively Update a Prod...Iryna Koberniuk: Implementing Major Changes: How to Effectively Update a Prod...
Iryna Koberniuk: Implementing Major Changes: How to Effectively Update a Prod...
 
Hanna Klimushka: Прокачка продуктового мислення для проєктного менеджера (UA)
Hanna Klimushka: Прокачка продуктового мислення для проєктного менеджера (UA)Hanna Klimushka: Прокачка продуктового мислення для проєктного менеджера (UA)
Hanna Klimushka: Прокачка продуктового мислення для проєктного менеджера (UA)
 
Ihor Pavlenko: PMO Risk Management (UA).
Ihor Pavlenko: PMO Risk Management (UA).Ihor Pavlenko: PMO Risk Management (UA).
Ihor Pavlenko: PMO Risk Management (UA).
 
Roman Humeniuk: Формула ненасильницької комунікації та інші техніки для якісн...
Roman Humeniuk: Формула ненасильницької комунікації та інші техніки для якісн...Roman Humeniuk: Формула ненасильницької комунікації та інші техніки для якісн...
Roman Humeniuk: Формула ненасильницької комунікації та інші техніки для якісн...
 
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
Artem Bykovets: 4 Вершники апокаліпсису робочих стосунків (+антидоти до них) ...
 
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
Dmytro Khudenko: Challenges of implementing task managers in the corporate an...
 
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
Sergii Melnichenko: Лідерство в Agile командах: ТОП-5 основних психологічних ...
 
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
Mariia Rashkevych: Підвищення ефективності розроблення та реалізації освітніх...
 
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
Mykhailo Hryhorash: What can be good in a "bad" project? (UA)
 
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
Oleksii Kyselov: Що заважає ПМу зростати? Розбір практичних кейсів (UA)
 
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
Yaroslav Osolikhin: «Неідеальний» проєктний менеджер: People Management під ч...
 
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
Mariya Yeremenko: Вплив Генеративного ШІ на сучасний світ та на особисту ефек...
 
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
Petro Nikolaiev & Dmytro Kisov: ТОП-5 методів дослідження клієнтів для успіху...
 
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
Maksym Stelmakh : Державні електронні послуги та сервіси: чому бізнесу варто ...
 

Recently uploaded

Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptSourabh Kumar
 
size separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceuticssize separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceuticspragatimahajan3
 
How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17Celine George
 
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General QuizPragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General QuizPragya - UEM Kolkata Quiz Club
 
Liberal & Redical Feminism presentation.pptx
Liberal & Redical Feminism presentation.pptxLiberal & Redical Feminism presentation.pptx
Liberal & Redical Feminism presentation.pptxRizwan Abbas
 
slides CapTechTalks Webinar May 2024 Alexander Perry.pptx
slides CapTechTalks Webinar May 2024 Alexander Perry.pptxslides CapTechTalks Webinar May 2024 Alexander Perry.pptx
slides CapTechTalks Webinar May 2024 Alexander Perry.pptxCapitolTechU
 
The Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. HenryThe Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. HenryEugene Lysak
 
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...Nguyen Thanh Tu Collection
 
philosophy and it's principles based on the life
philosophy and it's principles based on the lifephilosophy and it's principles based on the life
philosophy and it's principles based on the lifeNitinDeodare
 
2024_Student Session 2_ Set Plan Preparation.pptx
2024_Student Session 2_ Set Plan Preparation.pptx2024_Student Session 2_ Set Plan Preparation.pptx
2024_Student Session 2_ Set Plan Preparation.pptxmansk2
 
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Celine George
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/siemaillard
 
Keeping Your Information Safe with Centralized Security Services
Keeping Your Information Safe with Centralized Security ServicesKeeping Your Information Safe with Centralized Security Services
Keeping Your Information Safe with Centralized Security ServicesTechSoup
 
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...Sayali Powar
 
Gyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptxGyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptxShibin Azad
 
An Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptxAn Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptxCeline George
 
Morse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptxMorse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptxjmorse8
 

Recently uploaded (20)

Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
 
“O BEIJO” EM ARTE .
“O BEIJO” EM ARTE                       .“O BEIJO” EM ARTE                       .
“O BEIJO” EM ARTE .
 
size separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceuticssize separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceutics
 
How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17
 
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General QuizPragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
 
Liberal & Redical Feminism presentation.pptx
Liberal & Redical Feminism presentation.pptxLiberal & Redical Feminism presentation.pptx
Liberal & Redical Feminism presentation.pptx
 
slides CapTechTalks Webinar May 2024 Alexander Perry.pptx
slides CapTechTalks Webinar May 2024 Alexander Perry.pptxslides CapTechTalks Webinar May 2024 Alexander Perry.pptx
slides CapTechTalks Webinar May 2024 Alexander Perry.pptx
 
The Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. HenryThe Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. Henry
 
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
 
Word Stress rules esl .pptx
Word Stress rules esl               .pptxWord Stress rules esl               .pptx
Word Stress rules esl .pptx
 
philosophy and it's principles based on the life
philosophy and it's principles based on the lifephilosophy and it's principles based on the life
philosophy and it's principles based on the life
 
2024_Student Session 2_ Set Plan Preparation.pptx
2024_Student Session 2_ Set Plan Preparation.pptx2024_Student Session 2_ Set Plan Preparation.pptx
2024_Student Session 2_ Set Plan Preparation.pptx
 
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/
 
Keeping Your Information Safe with Centralized Security Services
Keeping Your Information Safe with Centralized Security ServicesKeeping Your Information Safe with Centralized Security Services
Keeping Your Information Safe with Centralized Security Services
 
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
 
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
Operations Management - Book1.p  - Dr. Abdulfatah A. SalemOperations Management - Book1.p  - Dr. Abdulfatah A. Salem
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
 
Gyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptxGyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptx
 
An Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptxAn Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptx
 
Morse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptxMorse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptx
 

Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with Kafka and Cassandra clusters” AI&BigDataDay 2017

  • 1. www.eleks.comwww.eleks.com Azure Real-Time Analytics And Kappa Architecture with Kafka and Cassandra clusters Vitalii Bondarenko vitaliy.bondarenko@eleks.com
  • 2. Agenda  Streaming analytics in Azure  Apache Cassandra  Apache Kafka  Confluent Platform and Kafka Streams  Examples
  • 3. Big Data Approach RDBMS Approach • Massive Parallel Processing (Scalability) • In-memory DB (Streaming and compressing) • Colum stores (BI) Big Data Approach • Hadoop (HDFS + MapReduce) • SQL on HDFS • Scalable NoSQL • Batch issue
  • 4. Lambda architecture Batch & Stream Processing • Batch layer • Stores master dataset • Compute arbitrary views • Horizontally Scalable • Speed layer (Streaming) • Fast, incremental algorithms • Batch layer eventually overrides speed layer • Serving layer • Random access to batch views • Updated by batch and Streaming layer
  • 5. Kappa architecture Stream Processing with Scalable Storages • Everything is a stream • Immutable unstructured data sources • Single analytics framework • Windows on Streaming Layer • Linearly scalable Serving Layer • Interactive querying
  • 6. Azure Streaming Analytics • Easy to use • Scalable • Connectivity • SQL, UDF, Reference Data
  • 7. Streaming processing Windows Stateful Stateless Fault Tolerance Scalability Low Latency SELECT Make, System.TimeStamp AS Time, COUNT(*) AS [Count] INTO AlertOutput FROM Input TIMESTAMP BY Time GROUP BY Make, TumblingWindow(second, 10) HAVING [Count] >= 3
  • 8. Demo: Azure Streaming Analytics • Data from Event Hub • Geo-Analytics on Streaming • Visualization on PowerBI • Demo for streaming analytics projects
  • 9. Fast Data Platform • Real-time processing • Raw Data fast writing • Scalabale • Distributed
  • 10. Demo: Azure Streaming Analytics • Demo for streaming analytics projects • Platform deployment
  • 11. Apache Cassandra • Multi-master, low-latency, shared nothing • Distributed • No single point of failure • Linearly Scalable • Multi-datacenter configuration • AP with tunable consistency
  • 12. Nodes and distributions • Distributed by Tokens from -2^63 to 2^63-1 • Hash from partition key. Murmur3 • Virtual Nodes • Data Centers and Racks, Gossip (each 1 sec) • Replication Strategy (SimpleStrategy, NetworkTopologyStrategy) • Replication factor (usually 3), Gossip and Coordinators • Tunable consistency, strong and eventual • Consistency Levels (One, Two, Three, Any, All, Quorum, Local_Quorum, Local_One…) • (R +W) > N
  • 13. Cassandra Objects Column, which is a name/value pair Row, which is a container for columns referenced by a primary key Table, which is a container for rows Keyspace, which is a container for tables Cluster, which is a container for keyspaces that spans one or more nodes Tombstones for deleted rows TTL for deleting rows Compaction for merging SSTables Secondary Indexes for filtering CQL (Cassandra Query Language)
  • 14. CQL (Cassandra Query Language) • Similar to SQL • No Joins, Counters, Static Columns • Keyspaces with replication factor • SET, LIST, MAP, Tuples • TTL INSERT INTO myTable (id, myField) VALUES (2, 9) USING TTL 86400; /*24H*/ • Ordering and Filtering is not working sometimes (always use partition key) CREATE TABLE loads ( machine inet, cpu int, mtime timeuuid, load float, PRIMARY KEY ((machine, cpu), mtime) ) WITH CLUSTERING ORDER BY (mtime DESC); /* Select Data within a range */ SELECT * FROM myTable WHERE myField > 5000 AND myField < 100000; Bad Request: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING.
  • 15. Data Modeling: Query-First Design RDBMS: Data > Models > Application Cassandra: Application > Models > Data RDBMS Device User Location Values (Timestamp, Values) Cassandra (no joins) raw_values(Timestamp, Device, User, Location, Values) day_values hour_location_values hour_location_device_values
  • 16. Demo: Apache Cassandra in use • Demo for Apache Cassandra
  • 17. Apache Kafka • From LinkedIn, Open Source from 2012 • Service Bus • Small messages (events) • Scalable Broker System • Durable and Distributed • Very fast (parallelism on partitions) • No removes from queue, retention • Streaming processing capability LinkedIn: • 1400 brokers • 13M+ messages/sec • 2.75GB per second
  • 18. Brokers, Topics • Distributed Service Bus • Broker as virtual servers • Topics as logical data storage
  • 19. Writes and reads • Append Only • Commit log • Consumer offset (from beginning) • Commit read to Kafka topic _consumer_offsets • Commit when read data • Retention period 7 days
  • 20. Partitions, Replication, Zookeeper • Workers, tasks, statuses • Find leaders for a partition • Distribute tasks • Monitor results • Configuration • Health statuses • Group memberships (elections) • Durable and Scalable Message (timestamp, id, Payload (binary)) Replication Factor Partition for each broker Partition = commit log Replica Leader elected and saved to Zookeeper
  • 21. Consumer Groups • Continues Polling the brokers for topic • Consumers Group.Id for parallelism and scalability • Consumers registered in Zookeper • Coordinators assign partitions to consumers • Coordinator rebalance partitions between consumers • Amount of Consumers = Amount of Partitions
  • 22. Demo: Apache Kafka in use • Apache Kafka CLI • Fault Tolerance
  • 23. Confluent Platform • Schema registry • Kafka Connect • REST Proxy • Kafka Stream • Confluent is the contributor • Streaming platform • Open Source main parts
  • 24. Kafka Streams • Easy to deploy and maintain • Integrated with Confluent Platform • Process topology • Micro-services • State store
  • 25. KStream and KTable • Execution topology for process • Statefull, no need to go to DB for every event • KStream and KTables • KTable is local distributed database • Data Locality, No network roundtrips • Elastic and Scalable
  • 26. Demo: Streaming App Examples • Cassandra Shema • Connectors • Kafka Streams • DS/OS
  • 27. Cluster and results Kafka Cluster Nodes: 9 Amazon instance type: m4.2xlarge CPU: 8 Memory: 32 Gb SSD: 100Gb Topics: 3 Partitions: 6 per topic Replication Factor: 3 Producers: 6 Average message size: 1Kb 440 000 messages / second Cassandra Cluster Nodes: 12 Amazon instance type: m4.2xlarge CPU: 8 Memory: 32 Gb SSD: 800Gb Replication Factor: 3 Average write latency: 9 ms Average read latency: 52 ms
  • 28. Lessons Learned • Amazing Capabilities, more than 1M/sec • New mindset of streaming processing • Time is important • SQL Interface for streaming is not ready • Difficulties in management and scalability • Difficult to debug • Lack of documentation and community support • Design you DB carefully at the beginning for queries • Cassandra is not RDBMS, select by partition keys • Eventual consistency • Very Expensive! (lots of nodes)