Apache Kafka - Yüksek Performanslı Dağıtık Mesajlaşma Sistemi - TürkçeEmre Akış
Yüksek performanslı ve ölçeklenebilir mesajlaşma sistemleri konusunda açık kaynak kodlu bir çözüm olarak ilk akla gelenlerden biri olan Apache Kafka hakkında bir sunumdur.
Introducing Apache Kafka - a visual overview. Presented at the Canberra Big Data Meetup 7 February 2019. We build a Kafka "postal service" to explain the main Kafka concepts, and explain how consumers receive different messages depending on whether there's a key or not.
Jay Kreps is a Principal Staff Engineer at LinkedIn where he is the lead architect for online data infrastructure. He is among the original authors of several open source projects including a distributed key-value store called Project Voldemort, a messaging system called Kafka, and a stream processing system called Samza. This talk gives an introduction to Apache Kafka, a distributed messaging system. It will cover both how Kafka works, as well as how it is used at LinkedIn for log aggregation, messaging, ETL, and real-time stream processing.
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
Apache Kafka - Yüksek Performanslı Dağıtık Mesajlaşma Sistemi - TürkçeEmre Akış
Yüksek performanslı ve ölçeklenebilir mesajlaşma sistemleri konusunda açık kaynak kodlu bir çözüm olarak ilk akla gelenlerden biri olan Apache Kafka hakkında bir sunumdur.
Introducing Apache Kafka - a visual overview. Presented at the Canberra Big Data Meetup 7 February 2019. We build a Kafka "postal service" to explain the main Kafka concepts, and explain how consumers receive different messages depending on whether there's a key or not.
Jay Kreps is a Principal Staff Engineer at LinkedIn where he is the lead architect for online data infrastructure. He is among the original authors of several open source projects including a distributed key-value store called Project Voldemort, a messaging system called Kafka, and a stream processing system called Samza. This talk gives an introduction to Apache Kafka, a distributed messaging system. It will cover both how Kafka works, as well as how it is used at LinkedIn for log aggregation, messaging, ETL, and real-time stream processing.
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
Apache Kafka's rise in popularity as a streaming platform has demanded a revisit of its traditional at-least-once message delivery semantics.
In this talk, we present the recent additions to Kafka to achieve exactly-once semantics (EoS) including support for idempotence and transactions in the Kafka clients. The main focus will be the specific semantics that Kafka distributed transactions enable and the underlying mechanics which allow them to scale efficiently.
In the last few years, Apache Kafka has been used extensively in enterprises for real-time data collecting, delivering, and processing. In this presentation, Jun Rao, Co-founder, Confluent, gives a deep dive on some of the key internals that help make Kafka popular.
- Companies like LinkedIn are now sending more than 1 trillion messages per day to Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
- Many companies (e.g., financial institutions) are now storing mission critical data in Kafka. Learn how Kafka supports high availability and durability through its built-in replication mechanism.
- One common use case of Kafka is for propagating updatable database records. Learn how a unique feature called compaction in Apache Kafka is designed to solve this kind of problem more naturally.
Kafka Streams is a new stream processing library natively integrated with Kafka. It has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka. We will provide the audience with an overview of Kafka Streams including its design and API, typical use cases, code examples, and an outlook of its upcoming roadmap. We will also compare Kafka Streams' light-weight library approach with heavier, framework-based tools such as Spark Streaming or Storm, which require you to understand and operate a whole different infrastructure for processing real-time data in Kafka.
Kafka Tutorial - introduction to the Kafka streaming platformJean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka?
Introduction to Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have started to expand on the Java examples to correlate with the design discussion of Kafka. We have also expanded on the Kafka design section and added references.
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
Kafka is becoming an ever more popular choice for users to help enable fast data and Streaming. Kafka provides a wide landscape of configuration to allow you to tweak its performance profile. Understanding the internals of Kafka is critical for picking your ideal configuration. Depending on your use case and data needs, different settings will perform very differently. Lets walk through performance essentials of Kafka. Let's talk about how your Consumer configuration, can speed up or slow down the flow of messages to Brokers. Lets talk about message keys, their implications and their impact on partition performance. Lets talk about how to figure out how many partitions and how many Brokers you should have. Let's discuss consumers and what effects their performance. How do you combine all of these choices and develop the best strategy moving forward? How do you test performance of Kafka? I will attempt a live demo with the help of Zeppelin to show in real time how to tune for performance.
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Spark Summit
One of the key challenges in working with real-time and streaming data is that the data format for capturing data is not necessarily the optimal format for ad hoc analytic queries. For example, Avro is a convenient and popular serialization service that is great for initially bringing data into HDFS. Avro has native integration with Flume and other tools that make it a good choice for landing data in Hadoop. But columnar file formats, such as Parquet and ORC, are much better optimized for ad hoc queries that aggregate over large number of similar rows.
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisHostedbyConfluent
"There's little talk about capacity planning Kafka clusters, it's very much learn as you go, every cluster is different. In this talk Kafka DevOps Engineer Jason Bell takes you through the things that will help you, from broker capacity, thinking about topics and how the other Confluent components can affect throughput and performance. With a number of production deployments under his watchful gaze for over six years Jason has plenty of experience, stories and useful information that will help you.
By the end of the talk you'll have a good understanding of designing the cluster for various scenarios, where the points of latency are to watch and monitor. And also how to prevent teams breaking the cluster behind your back.
This talk is designed for everyone, anyone who is just starting to those who are operating Kafka on a daily basis."
Producer Performance Tuning for Apache KafkaJiangjie Qin
Kafka is well known for high throughput ingestion. However, to get the best latency characteristics without compromising on throughput and durability, we need to tune Kafka. In this talk, we share our experiences to achieve the optimal combination of latency, throughput and durability for different scenarios.
Introduction to apache kafka, confluent and why they matterPaolo Castagna
This is a short and introductory presentation on Apache Kafka (including Kafka Connect APIs, Kafka Streams APIs, both part of Apache Kafka) and other open source components part of the Confluent platform (such as KSQL).
This was the first Kafka Meetup in South Africa.
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...ScyllaDB
Doing performance tuning on a massively distributed database is never an easy task. This is especially true for TiDB, an open-source, cloud-native NewSQL database for elastic scale and real-time analytics, because it consists of multiple components and each component has plenty of metrics.
Like many distributed systems, TiDB uses Prometheus to store the monitoring and performance metrics and Grafana to visualize these metrics. Thanks to these two open source projects, it is easy for TiDB developers to add monitoring and performance metrics. However, as the metrics increase, the learning curve becomes steeper for TiDB users to gain performance insights. In this talk, we will share how we measure latency in a distributed system using a top-down (holistic) approach, and why we introduced "tuning by database time" and "tuning by color" into TiDB. The new methodologies and Grafana dashboard help reduce the time and the requirement of expertise in performance tuning by orders of magnitude.
Apache Kafka's rise in popularity as a streaming platform has demanded a revisit of its traditional at-least-once message delivery semantics.
In this talk, we present the recent additions to Kafka to achieve exactly-once semantics (EoS) including support for idempotence and transactions in the Kafka clients. The main focus will be the specific semantics that Kafka distributed transactions enable and the underlying mechanics which allow them to scale efficiently.
In the last few years, Apache Kafka has been used extensively in enterprises for real-time data collecting, delivering, and processing. In this presentation, Jun Rao, Co-founder, Confluent, gives a deep dive on some of the key internals that help make Kafka popular.
- Companies like LinkedIn are now sending more than 1 trillion messages per day to Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
- Many companies (e.g., financial institutions) are now storing mission critical data in Kafka. Learn how Kafka supports high availability and durability through its built-in replication mechanism.
- One common use case of Kafka is for propagating updatable database records. Learn how a unique feature called compaction in Apache Kafka is designed to solve this kind of problem more naturally.
Kafka Streams is a new stream processing library natively integrated with Kafka. It has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka. We will provide the audience with an overview of Kafka Streams including its design and API, typical use cases, code examples, and an outlook of its upcoming roadmap. We will also compare Kafka Streams' light-weight library approach with heavier, framework-based tools such as Spark Streaming or Storm, which require you to understand and operate a whole different infrastructure for processing real-time data in Kafka.
Kafka Tutorial - introduction to the Kafka streaming platformJean-Paul Azar
Why is Kafka so fast? Why is Kafka so popular? Why Kafka?
Introduction to Kafka streaming platform. Covers Kafka Architecture with some small examples from the command line. Then we expand on this with a multi-server example. Lastly, we added some simple Java client examples for a Kafka Producer and a Kafka Consumer. We have started to expand on the Java examples to correlate with the design discussion of Kafka. We have also expanded on the Kafka design section and added references.
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
Kafka is becoming an ever more popular choice for users to help enable fast data and Streaming. Kafka provides a wide landscape of configuration to allow you to tweak its performance profile. Understanding the internals of Kafka is critical for picking your ideal configuration. Depending on your use case and data needs, different settings will perform very differently. Lets walk through performance essentials of Kafka. Let's talk about how your Consumer configuration, can speed up or slow down the flow of messages to Brokers. Lets talk about message keys, their implications and their impact on partition performance. Lets talk about how to figure out how many partitions and how many Brokers you should have. Let's discuss consumers and what effects their performance. How do you combine all of these choices and develop the best strategy moving forward? How do you test performance of Kafka? I will attempt a live demo with the help of Zeppelin to show in real time how to tune for performance.
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Spark Summit
One of the key challenges in working with real-time and streaming data is that the data format for capturing data is not necessarily the optimal format for ad hoc analytic queries. For example, Avro is a convenient and popular serialization service that is great for initially bringing data into HDFS. Avro has native integration with Flume and other tools that make it a good choice for landing data in Hadoop. But columnar file formats, such as Parquet and ORC, are much better optimized for ad hoc queries that aggregate over large number of similar rows.
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisHostedbyConfluent
"There's little talk about capacity planning Kafka clusters, it's very much learn as you go, every cluster is different. In this talk Kafka DevOps Engineer Jason Bell takes you through the things that will help you, from broker capacity, thinking about topics and how the other Confluent components can affect throughput and performance. With a number of production deployments under his watchful gaze for over six years Jason has plenty of experience, stories and useful information that will help you.
By the end of the talk you'll have a good understanding of designing the cluster for various scenarios, where the points of latency are to watch and monitor. And also how to prevent teams breaking the cluster behind your back.
This talk is designed for everyone, anyone who is just starting to those who are operating Kafka on a daily basis."
Producer Performance Tuning for Apache KafkaJiangjie Qin
Kafka is well known for high throughput ingestion. However, to get the best latency characteristics without compromising on throughput and durability, we need to tune Kafka. In this talk, we share our experiences to achieve the optimal combination of latency, throughput and durability for different scenarios.
Introduction to apache kafka, confluent and why they matterPaolo Castagna
This is a short and introductory presentation on Apache Kafka (including Kafka Connect APIs, Kafka Streams APIs, both part of Apache Kafka) and other open source components part of the Confluent platform (such as KSQL).
This was the first Kafka Meetup in South Africa.
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...ScyllaDB
Doing performance tuning on a massively distributed database is never an easy task. This is especially true for TiDB, an open-source, cloud-native NewSQL database for elastic scale and real-time analytics, because it consists of multiple components and each component has plenty of metrics.
Like many distributed systems, TiDB uses Prometheus to store the monitoring and performance metrics and Grafana to visualize these metrics. Thanks to these two open source projects, it is easy for TiDB developers to add monitoring and performance metrics. However, as the metrics increase, the learning curve becomes steeper for TiDB users to gain performance insights. In this talk, we will share how we measure latency in a distributed system using a top-down (holistic) approach, and why we introduced "tuning by database time" and "tuning by color" into TiDB. The new methodologies and Grafana dashboard help reduce the time and the requirement of expertise in performance tuning by orders of magnitude.
OpenStack Türkiye 14. Meetup Ankara: Yeni Başlayanlar için OpenStackHuseyin Cotuk
Konu: OpenStack Bulut Platformu Nedir? Nerelerde Kullanılır?
Sunum: Dr. Hüseyin ÇOTUK
İçerik:
• Dünyada Bulut Dönüşümü
• Neden Bulut?
• OpenStack Nedir?
• OpenStack Bileşenleri
• Servis Olarak Verilebilen Hizmetler
• Neden OpenStack?
• Dünyada OpenStack Kullanımı
• OpenStack'te Depolama Alternatifleri
• Neden OpenStack ve Ceph?
• Demo
• Soru / Cevap
Blockchain : Decentralized Application Development (Turkish)Cihan Özhan
www.cihanozhan.com
*It is the presentation of my blockchain event that I presented in 2019.
Teknopark Istanbul Announcement : https://www.teknoparkistanbul.com.tr/egitimler/blockchain-decentralized-uygulama-gelistirme-sunumu
Hepsistream real time click-stream data analytics platformHepsiburada
Hepsistream veri analitik platformu, Hepsiburada platformuna desktop, mobile, mobile-site kanalları üzerinden erişen kullanıcıların gerçekleştirdikleri ürün görüntüleme, sayfa görüntüleme, sepete ekleme vs. gibi aksiyonları gerçek zamanlı olarak toplayıp, lambda mimarisi ile büyük veri altyapısı üzerinde işlemektedir. Hepsistream büyük veri altyapısına değinilerek, Efsane Cuma gibi büyük bir ölçekte gerçek zamanlı veri keşif ve izleme aracının geliştirilmesi sürecinde kullanılan teknolojiler ve kazanılan deneyimler sunulmustur.
PHP ve NATS ile Mikroservis Mesajlaşma SistemiErhan Yakut
NATS ile mikroservis iletişim kanalının nasıl oluşturulabileceğini ve bu kanala PHP ile nasıl bağlanılarak mesaj alınıp, gönderilebileceğiniz anlatıldığı sunum.
Abstract:
• Kubernetes Nedir?
• Çalışma prensipleri nelerdir?
• Neler sunuyor?
• Alternatifleri nelerdir?
Bio:
Ahmet Üstün, Comind isimli yapay zeka startup'ının kurucularından. Öncesinde 3 yıl Cybersoft Ar-Ge takımında bulut sistemleri konusunda çalıştı. Bu sürede; Docker, Kubernetes, Openstack ve Aws üzerine yoğunlaştı, projeler geliştirdi. Yüksek lisansını ODTÜ'de yapay zeka ve doğal dil işleme üzerine yaptı. Halen doktora öğrencisi olarak ODTÜ'de araştırmalar yapmaya devam ediyor.
Replication in the wild ankara cloud meetup - feb 2017AnkaraCloud
Replication, büyük verilere performanslı bir şekilde erişmek ve hata durumlarında veri kayıplarını önlemek için kullanılan bir tekniktir. Bu sunumda, özellikle NoSQL veritabanlarında sıkça kullanılan replication metodlarına göz atacağız. Replication metodlarını temel niteliklerine göre sınıflandırıp birbirlerine karşı avantaj / dezavantajlarını, hangi ihtiyaçlara uygun olduklarını, hangi problemleri çözüp hangi problemleri ortaya çıkardıklarını inceleyeceğiz. Sunum, tutorial havasında adım adım ilerleyen, takip etmesi kolay bir içeriğe sahiptir
3. • Büyük Veri
• Gerçek Zamanlı Veri - data streams
• Olaylar - alarm, press, tick gibi tetikleme sonucu
• Sensörler
• IoT
• Dağıtık Sistemler - ölçeklenebilir
Neden Kullanalım?
4. • Açık kaynak - Apache, Confluent
• Mesajlaşma sistemi, mq, akan veri, gerçek zamanlı
• Dağıtık (distributed)
• Parçalanabilen (partitioned)
• Çoklanabilen (replicated)
• Pub-sub
• Kümelenebilir (cluster): >= 1 sunucu
• Fault-tolerant
• Kayıtları ‘topic’ler halinde tutar
Nedir?
kafka.apache.org
5. • Topic: Producer tarafından yayınlanan mesajlar ( key, value, timestamp)
• Producer: Kayıtlandığı ‘topic’ için mesaj üretir
• Consumer: Kayıtlandığı ‘topic’den mesaj okur
• Broker: Kafka kümesini oluşturan kafka sunucuları ( >=1 )
• Partition: Sıralı, değişmez (immutable), sona eklemeli kayıtlar dizisi.
İçerisindeki her kayıt bir ‘offset’ değerine sahip. (Paralelleme,
ölçeklendirme)
• Replica: Partition kopyası
Nedir?
10. Nasıl Çalışır?
• Retention (alı koyma, saklama)
• Offset: Geçmiş Şimdi
• Consumer
• Consumer group
kafka.apache.org kafka.apache.org
11. Yani?
• Mesajlaşma && depolama && gerçek zamanlı (streaming) veri
• DFS, geçmişe ait veri (historical data)
• ‘subscribe’, gelecek veri (future data)
• (Streaming data pipeline)
12. • Diğer sistemler ile entegrasyon
• Veri akışına yeni sistemler ekleme
• i) Confluent. ii) Certified. iii) Community.
KafkaConnect
Data
Source
K
C
o
n
n
e
c
t
Kafka
Cluster
Data
Sink
K
C
o
n
n
e
c
t
App