In this Cartoon Style Visual Introduction to Apache Kafka we’re going to build a “Postal Service” to deliver party invitations to two groups, Nerds and Pugsters – find out who goes to the party. Along the way we’ll learn about Kafka Producers, Consumers, Groups, Topics, Partitions, Keys, Records, Delivery Semantics (Guaranteed delivery, and who gets what messages). We’ll also have a quick look at Streams (mail sorting) and Connectors (how does mail get delivered between post offices).
Presented at Open Source 101 2022: https://opensource101.com/sessions/a-visual-introduction-to-apache-kafka/
Video: https://youtu.be/NUnsHFn52sE
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
In this session, Netflix provides an overview of Keystone, their new data pipeline. The session covers how Netflix migrated from Suro to Keystone, including the reasons behind the transition and the challenges of zero loss while processing over 400 billion events daily. The session covers in detail how they deploy, operate, and scale Kafka, Samza, Docker, and Apache Mesos in AWS to manage 8 million events & 17 GB per second during peak.
Doug Bateman, a principal data engineering instructor at Databricks, presented on how to build a Lakehouse architecture. He began by introducing himself and his background. He then discussed the goals of describing key Lakehouse features, explaining how Delta Lake enables it, and developing a sample Lakehouse using Databricks. The key aspects of a Lakehouse are that it supports diverse data types and workloads while enabling using BI tools directly on source data. Delta Lake provides reliability, consistency, and performance through its ACID transactions, automatic file consolidation, and integration with Spark. Bateman concluded with a demo of creating a Lakehouse.
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLSeveralnines
To operate PostgreSQL efficiently, you need to have insight into database performance and make sure it is at optimal levels.
With that in mind, we dive into monitoring PostgreSQL for performance in this webinar replay.
PostgreSQL offers many metrics through various status overviews and commands, but which ones really matter to you? How do you trend and alert on them? What is the meaning behind the metrics? And what are some of the most common causes for performance problems in production?
We discuss this and more in ordinary, plain DBA language. We also have a look at some of the tools available for PostgreSQL monitoring and trending; and we’ll show you how to leverage ClusterControl’s PostgreSQL metrics, dashboards, custom alerting and other features to track and optimize the performance of your system.
AGENDA
- PostgreSQL architecture overview
- Performance problems in production
- Common causes
- Key PostgreSQL metrics and their meaning
- Tuning for performance
- Performance monitoring tools
- Impact of monitoring on performance
- How to use ClusterControl to identify performance issues
- Demo
SPEAKER
Sebastian Insausti, Support Engineer at Severalnines, has loved technology since his childhood, when he did his first computer course (Windows 3.11). And from that moment he was decided on what his profession would be. He has since built up experience with MySQL, PostgreSQL, HAProxy, WAF (ModSecurity), Linux (RedHat, CentOS, OL, Ubuntu server), Monitoring (Nagios), Networking and Virtualization (VMWare, Proxmox, Hyper-V, RHEV).
Prior to joining Severalnines, Sebastian worked as a consultant to state companies in security, database replication and high availability scenarios. He’s also a speaker and has given a few talks locally on InnoDB Cluster and MySQL Enterprise together with an Oracle team. Previous to that, he worked for a Mexican company as chief of sysadmin department as well as for a local ISP (Internet Service Provider), where he managed customers' servers and connectivity.
This webinar builds upon a related blog post by Sebastian: https://severalnines.com/blog/performance-cheat-sheet-postgresql.
This document discusses different architectures for big data systems, including traditional, streaming, lambda, kappa, and unified architectures. The traditional architecture focuses on batch processing stored data using Hadoop. Streaming architectures enable low-latency analysis of real-time data streams. Lambda architecture combines batch and streaming for flexibility. Kappa architecture avoids duplicating processing logic. Finally, a unified architecture trains models on batch data and applies them to real-time streams. Choosing the right architecture depends on use cases and available components.
Dustin Vannoy presented on using Delta Lake with Azure Databricks. He began with an introduction to Spark and Databricks, demonstrating how to set up a workspace. He then discussed limitations of Spark including lack of ACID compliance and small file problems. Delta Lake addresses these issues with transaction logs for ACID transactions, schema enforcement, automatic file compaction, and performance optimizations like time travel. The presentation included demos of Delta Lake capabilities like schema validation, merging, and querying past versions of data.
Ranger’s pluggable architecture allows resource access policy administration and enforcement for standard and custom services from a “single pane of glass”. Apache Ranger has a rich Authorization Model, which provides the mechanism to author Policy in a Ranger Admin Server and serves as policy decision and audit point in authorizing user’s resource access within various components of Hadoop ecosystem.
This session will provide a deep dive into Ranger framework and a cook-book for extending Ranger to do authorization / auditing on resource access to external applications, including technical details of Rest APIs, Ranger policy engine and enriching authorization requests, with a demo of a sample application.We will then demonstrate a real-world example of how Ranger has simplified security enforcement for Hadoop-native MPP SQL engine like Apache HAWQ (incubating),which previously used its built-in Postgres-like authorization mechanisms. The integration design includes a Ranger Plugin Service that allows transparent authorization API calls between C-based Apache HAWQ and Java-based Apache Ranger.
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
In this session, Netflix provides an overview of Keystone, their new data pipeline. The session covers how Netflix migrated from Suro to Keystone, including the reasons behind the transition and the challenges of zero loss while processing over 400 billion events daily. The session covers in detail how they deploy, operate, and scale Kafka, Samza, Docker, and Apache Mesos in AWS to manage 8 million events & 17 GB per second during peak.
Doug Bateman, a principal data engineering instructor at Databricks, presented on how to build a Lakehouse architecture. He began by introducing himself and his background. He then discussed the goals of describing key Lakehouse features, explaining how Delta Lake enables it, and developing a sample Lakehouse using Databricks. The key aspects of a Lakehouse are that it supports diverse data types and workloads while enabling using BI tools directly on source data. Delta Lake provides reliability, consistency, and performance through its ACID transactions, automatic file consolidation, and integration with Spark. Bateman concluded with a demo of creating a Lakehouse.
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLSeveralnines
To operate PostgreSQL efficiently, you need to have insight into database performance and make sure it is at optimal levels.
With that in mind, we dive into monitoring PostgreSQL for performance in this webinar replay.
PostgreSQL offers many metrics through various status overviews and commands, but which ones really matter to you? How do you trend and alert on them? What is the meaning behind the metrics? And what are some of the most common causes for performance problems in production?
We discuss this and more in ordinary, plain DBA language. We also have a look at some of the tools available for PostgreSQL monitoring and trending; and we’ll show you how to leverage ClusterControl’s PostgreSQL metrics, dashboards, custom alerting and other features to track and optimize the performance of your system.
AGENDA
- PostgreSQL architecture overview
- Performance problems in production
- Common causes
- Key PostgreSQL metrics and their meaning
- Tuning for performance
- Performance monitoring tools
- Impact of monitoring on performance
- How to use ClusterControl to identify performance issues
- Demo
SPEAKER
Sebastian Insausti, Support Engineer at Severalnines, has loved technology since his childhood, when he did his first computer course (Windows 3.11). And from that moment he was decided on what his profession would be. He has since built up experience with MySQL, PostgreSQL, HAProxy, WAF (ModSecurity), Linux (RedHat, CentOS, OL, Ubuntu server), Monitoring (Nagios), Networking and Virtualization (VMWare, Proxmox, Hyper-V, RHEV).
Prior to joining Severalnines, Sebastian worked as a consultant to state companies in security, database replication and high availability scenarios. He’s also a speaker and has given a few talks locally on InnoDB Cluster and MySQL Enterprise together with an Oracle team. Previous to that, he worked for a Mexican company as chief of sysadmin department as well as for a local ISP (Internet Service Provider), where he managed customers' servers and connectivity.
This webinar builds upon a related blog post by Sebastian: https://severalnines.com/blog/performance-cheat-sheet-postgresql.
This document discusses different architectures for big data systems, including traditional, streaming, lambda, kappa, and unified architectures. The traditional architecture focuses on batch processing stored data using Hadoop. Streaming architectures enable low-latency analysis of real-time data streams. Lambda architecture combines batch and streaming for flexibility. Kappa architecture avoids duplicating processing logic. Finally, a unified architecture trains models on batch data and applies them to real-time streams. Choosing the right architecture depends on use cases and available components.
Dustin Vannoy presented on using Delta Lake with Azure Databricks. He began with an introduction to Spark and Databricks, demonstrating how to set up a workspace. He then discussed limitations of Spark including lack of ACID compliance and small file problems. Delta Lake addresses these issues with transaction logs for ACID transactions, schema enforcement, automatic file compaction, and performance optimizations like time travel. The presentation included demos of Delta Lake capabilities like schema validation, merging, and querying past versions of data.
Ranger’s pluggable architecture allows resource access policy administration and enforcement for standard and custom services from a “single pane of glass”. Apache Ranger has a rich Authorization Model, which provides the mechanism to author Policy in a Ranger Admin Server and serves as policy decision and audit point in authorizing user’s resource access within various components of Hadoop ecosystem.
This session will provide a deep dive into Ranger framework and a cook-book for extending Ranger to do authorization / auditing on resource access to external applications, including technical details of Rest APIs, Ranger policy engine and enriching authorization requests, with a demo of a sample application.We will then demonstrate a real-world example of how Ranger has simplified security enforcement for Hadoop-native MPP SQL engine like Apache HAWQ (incubating),which previously used its built-in Postgres-like authorization mechanisms. The integration design includes a Ranger Plugin Service that allows transparent authorization API calls between C-based Apache HAWQ and Java-based Apache Ranger.
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
This presentation describes the reasons why Facebook decided to build yet another key-value store, the vision and architecture of RocksDB and how it differs from other open source key-value stores. Dhruba describes some of the salient features in RocksDB that are needed for supporting embedded-storage deployments. He explains typical workloads that could be the primary use-cases for RocksDB. He also lays out the roadmap to make RocksDB the key-value store of choice for highly-multi-core processors and RAM-speed storage devices.
Ingesting Data at Blazing Speed Using Apache OrcDataWorks Summit
Big SQL is a SQL engine for Hadoop that excels at performance and scalability at high concurrency. Big SQL complements and integrates with Apache Hive for both data and metadata. An architecture that separates compute from storage allows Big SQL to support multiple open data formats natively. Until recently, Parquet provided a significant performance advantage over other data formats for SQL on Hadoop. The landscape changed when ORC became a top level Apache project independent from Hive. Gone were the days of reading ORC files using slow, single-row-at-a-time Hive Serdes. The new vectorized APIs in the Apache ORC libraries make it possible to ingest ORC data at blazing speed. This talk is about the journey leading to ORC taking the crown of best performing data format for Big SQL away from Parquet. We'll have a look under the hood at the architecture of Big SQL ORC readers, and how to tune them. We'll share lessons learned in walking the fine line between maximizing performance at scale and avoiding dreaded Java OOMs . You'll learn the techniques that SQL engines use for fast data ingestion, so that you can leverage the full potential of Apache ORC in any application.
Speaker:
Gustavo Arocena, Big Data Architect, IBM
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Apache Iceberg - A Table Format for Hige Analytic Datasets
Speaker:
Ryan Blue, Netflix
For more Alluxio events: https://www.alluxio.io/events/
Flink Forward Berlin 2017: Piotr Nowojski - "Hit me, baby, just one time" - B...Flink Forward
Getting data in and out of Flink is by far the most important aspect, and an everyday typical requirement of building Flink applications. Doing so in an end-to-end exactly-once manner, however, can be tricky. Being able to reliably consume data from the outside world without any duplicate processing and guaranteeing consistent distributed state, and at the same time provide computed results back to the outside world also without introducing duplicates, is crucial for the consistency and correctness of applications built upon stream processors. In this talk, we will talk about how end-to-end exactly-once guarantees can be achieved with Apache Flink. We will talk about Flink’s checkpointing mechanism, and how exactly to leverage it when consuming and producing data from your Flink streaming pipelines. In particular, we will be having a detailed review on how our supported connectors do so, with the aim to provide reference implementations for your own custom consumers and sinks.
主に論文 "Weak Consistency: A Generalized Theory and Optimistic Implementations for Distributed Transactions" の紹介。
https://pmg.csail.mit.edu/pubs/adya99__weak_consis-abstract.html
Delta Lake brings reliability, performance, and security to data lakes. It provides ACID transactions, schema enforcement, and unified handling of batch and streaming data to make data lakes more reliable. Delta Lake also features lightning fast query performance through its optimized Delta Engine. It enables security and compliance at scale through access controls and versioning of data. Delta Lake further offers an open approach and avoids vendor lock-in by using open formats like Parquet that can integrate with various ecosystems.
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataJignesh Shah
This document discusses PostgreSQL performance on multi-core systems with multi-terabyte data. It covers current market trends towards more cores and larger data sizes. Benchmark results show that PostgreSQL scales well on inserts up to a certain number of clients/cores but struggles with OLTP and TPC-E workloads due to lock contention. Issues are identified with sequential scans, index scans, and maintenance tasks like VACUUM as data sizes increase. The document proposes making PostgreSQL utilities and tools able to leverage multiple cores/processes to improve performance on modern hardware.
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
The document summarizes Quantcast File System (QFS), an alternative to HDFS that provides petabyte storage at half the disk space of HDFS. QFS offers significantly faster I/O than HDFS through the use of Reed-Solomon encoding, requiring only 1.5x disk space compared to HDFS's 3x. It has been production hardened at Quantcast under massive processing loads and is fully compatible with Apache Hadoop. Benchmark results show QFS writes are half the disk I/O of HDFS writes and reads require accessing the network versus HDFS's focus on data locality.
This document introduces two operational support tools for making Cloud Spanner more convenient: Spanner Warming Guy and Spanner Up-Down Guy.
Spanner Warming Guy is a tool that performs warmup of Cloud Spanner before releases. It can add load to Spanner through SELECT queries alone and automatically adjusts the number of SELECT bots.
Spanner Up-Down Guy is a tool that auto-scales the number of nodes in Cloud Spanner. It periodically executes Cloud Functions to adjust the node count according to a pre-defined schedule or the load situation. It can gradually decrease nodes to reduce costs.
This document summarizes Corda's cryptographic agility features:
- Corda supports multiple signature schemes so it can adapt to new cryptographic standards over time without requiring a global upgrade. This allows "cryptographic agility".
- It also allows for custom cryptographic implementations through "pluggability" but custom algorithms that are not part of the base specification would result in invalid transactions.
- The document discusses various signature schemes like RSA, ECDSA, and Sphincs-256 that Corda supports and their tradeoffs in terms of speed, security, and other factors. It provides recommendations on which schemes to use based on different criteria.
Making Cassandra more capable, faster, and more reliable (at ApacheCon@Home 2...Scalar, Inc.
Cassandra is widely adopted in real-world applications and used by large and sometimes mission-critical applications because of its high performance, high availability and high scalability. However, there is still some room for improvement to take Cassandra to the next level. We have been contributing to Cassandra to make it more capable, faster, and more reliable by, for example, proposing a non-invasive ACID transaction library, adding GroupCommitLogService, and maintaining and conducting Jepsen testing for lightweight transactions. This talk will present the contributions we have done including the latest updates in more detail, and the reasons why we made such contributions. This talk will be one of the good starting points for discussing the next generation Cassandra.
Delta from a Data Engineer's PerspectiveDatabricks
This document describes the Delta architecture, which unifies batch and streaming data processing. Delta achieves this through a continuous data flow model using structured streaming. It allows data engineers to read consistent data while being written, incrementally read large tables at scale, rollback in case of errors, replay and process historical data along with new data, and handle late arriving data without delays. Delta uses transaction logging, optimistic concurrency, and Spark to scale metadata handling for large tables. This provides a simplified solution to common challenges data engineers face.
Real-time Analytics with Trino and Apache PinotXiang Fu
Trino summit 2021:
Overview of Trino Pinot Connector, which bridges the flexibility of Trino's full SQL support to the power of Apache Pinot's realtime analytics, giving you the best of both worlds.
Hello, kafka! (an introduction to apache kafka)Timothy Spann
Hello ApacheKafka
An Introduction to Apache Kafka with Timothy Spann and Carolyn Duby Cloudera Principal engineers.
We also demo Flink SQL, SMM, SSB, Schema Registry, Apache Kafka, Apache NiFi and Public Cloud - AWS.
Direct mail is still an important fundraising tool but is changing with new digital technologies. Variable data printing (VDP) allows direct mail to be highly personalized with different content and images for each recipient. This integrated direct mail can then drive recipients to engage with the organization across other channels through personalized QR codes, text-to-give campaigns, and individualized landing pages. VDP printing can also combine mailing and printing into one process for reduced costs and improved postal discounts.
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
This presentation describes the reasons why Facebook decided to build yet another key-value store, the vision and architecture of RocksDB and how it differs from other open source key-value stores. Dhruba describes some of the salient features in RocksDB that are needed for supporting embedded-storage deployments. He explains typical workloads that could be the primary use-cases for RocksDB. He also lays out the roadmap to make RocksDB the key-value store of choice for highly-multi-core processors and RAM-speed storage devices.
Ingesting Data at Blazing Speed Using Apache OrcDataWorks Summit
Big SQL is a SQL engine for Hadoop that excels at performance and scalability at high concurrency. Big SQL complements and integrates with Apache Hive for both data and metadata. An architecture that separates compute from storage allows Big SQL to support multiple open data formats natively. Until recently, Parquet provided a significant performance advantage over other data formats for SQL on Hadoop. The landscape changed when ORC became a top level Apache project independent from Hive. Gone were the days of reading ORC files using slow, single-row-at-a-time Hive Serdes. The new vectorized APIs in the Apache ORC libraries make it possible to ingest ORC data at blazing speed. This talk is about the journey leading to ORC taking the crown of best performing data format for Big SQL away from Parquet. We'll have a look under the hood at the architecture of Big SQL ORC readers, and how to tune them. We'll share lessons learned in walking the fine line between maximizing performance at scale and avoiding dreaded Java OOMs . You'll learn the techniques that SQL engines use for fast data ingestion, so that you can leverage the full potential of Apache ORC in any application.
Speaker:
Gustavo Arocena, Big Data Architect, IBM
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Apache Iceberg - A Table Format for Hige Analytic Datasets
Speaker:
Ryan Blue, Netflix
For more Alluxio events: https://www.alluxio.io/events/
Flink Forward Berlin 2017: Piotr Nowojski - "Hit me, baby, just one time" - B...Flink Forward
Getting data in and out of Flink is by far the most important aspect, and an everyday typical requirement of building Flink applications. Doing so in an end-to-end exactly-once manner, however, can be tricky. Being able to reliably consume data from the outside world without any duplicate processing and guaranteeing consistent distributed state, and at the same time provide computed results back to the outside world also without introducing duplicates, is crucial for the consistency and correctness of applications built upon stream processors. In this talk, we will talk about how end-to-end exactly-once guarantees can be achieved with Apache Flink. We will talk about Flink’s checkpointing mechanism, and how exactly to leverage it when consuming and producing data from your Flink streaming pipelines. In particular, we will be having a detailed review on how our supported connectors do so, with the aim to provide reference implementations for your own custom consumers and sinks.
主に論文 "Weak Consistency: A Generalized Theory and Optimistic Implementations for Distributed Transactions" の紹介。
https://pmg.csail.mit.edu/pubs/adya99__weak_consis-abstract.html
Delta Lake brings reliability, performance, and security to data lakes. It provides ACID transactions, schema enforcement, and unified handling of batch and streaming data to make data lakes more reliable. Delta Lake also features lightning fast query performance through its optimized Delta Engine. It enables security and compliance at scale through access controls and versioning of data. Delta Lake further offers an open approach and avoids vendor lock-in by using open formats like Parquet that can integrate with various ecosystems.
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataJignesh Shah
This document discusses PostgreSQL performance on multi-core systems with multi-terabyte data. It covers current market trends towards more cores and larger data sizes. Benchmark results show that PostgreSQL scales well on inserts up to a certain number of clients/cores but struggles with OLTP and TPC-E workloads due to lock contention. Issues are identified with sequential scans, index scans, and maintenance tasks like VACUUM as data sizes increase. The document proposes making PostgreSQL utilities and tools able to leverage multiple cores/processes to improve performance on modern hardware.
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
The document summarizes Quantcast File System (QFS), an alternative to HDFS that provides petabyte storage at half the disk space of HDFS. QFS offers significantly faster I/O than HDFS through the use of Reed-Solomon encoding, requiring only 1.5x disk space compared to HDFS's 3x. It has been production hardened at Quantcast under massive processing loads and is fully compatible with Apache Hadoop. Benchmark results show QFS writes are half the disk I/O of HDFS writes and reads require accessing the network versus HDFS's focus on data locality.
This document introduces two operational support tools for making Cloud Spanner more convenient: Spanner Warming Guy and Spanner Up-Down Guy.
Spanner Warming Guy is a tool that performs warmup of Cloud Spanner before releases. It can add load to Spanner through SELECT queries alone and automatically adjusts the number of SELECT bots.
Spanner Up-Down Guy is a tool that auto-scales the number of nodes in Cloud Spanner. It periodically executes Cloud Functions to adjust the node count according to a pre-defined schedule or the load situation. It can gradually decrease nodes to reduce costs.
This document summarizes Corda's cryptographic agility features:
- Corda supports multiple signature schemes so it can adapt to new cryptographic standards over time without requiring a global upgrade. This allows "cryptographic agility".
- It also allows for custom cryptographic implementations through "pluggability" but custom algorithms that are not part of the base specification would result in invalid transactions.
- The document discusses various signature schemes like RSA, ECDSA, and Sphincs-256 that Corda supports and their tradeoffs in terms of speed, security, and other factors. It provides recommendations on which schemes to use based on different criteria.
Making Cassandra more capable, faster, and more reliable (at ApacheCon@Home 2...Scalar, Inc.
Cassandra is widely adopted in real-world applications and used by large and sometimes mission-critical applications because of its high performance, high availability and high scalability. However, there is still some room for improvement to take Cassandra to the next level. We have been contributing to Cassandra to make it more capable, faster, and more reliable by, for example, proposing a non-invasive ACID transaction library, adding GroupCommitLogService, and maintaining and conducting Jepsen testing for lightweight transactions. This talk will present the contributions we have done including the latest updates in more detail, and the reasons why we made such contributions. This talk will be one of the good starting points for discussing the next generation Cassandra.
Delta from a Data Engineer's PerspectiveDatabricks
This document describes the Delta architecture, which unifies batch and streaming data processing. Delta achieves this through a continuous data flow model using structured streaming. It allows data engineers to read consistent data while being written, incrementally read large tables at scale, rollback in case of errors, replay and process historical data along with new data, and handle late arriving data without delays. Delta uses transaction logging, optimistic concurrency, and Spark to scale metadata handling for large tables. This provides a simplified solution to common challenges data engineers face.
Real-time Analytics with Trino and Apache PinotXiang Fu
Trino summit 2021:
Overview of Trino Pinot Connector, which bridges the flexibility of Trino's full SQL support to the power of Apache Pinot's realtime analytics, giving you the best of both worlds.
Hello, kafka! (an introduction to apache kafka)Timothy Spann
Hello ApacheKafka
An Introduction to Apache Kafka with Timothy Spann and Carolyn Duby Cloudera Principal engineers.
We also demo Flink SQL, SMM, SSB, Schema Registry, Apache Kafka, Apache NiFi and Public Cloud - AWS.
Direct mail is still an important fundraising tool but is changing with new digital technologies. Variable data printing (VDP) allows direct mail to be highly personalized with different content and images for each recipient. This integrated direct mail can then drive recipients to engage with the organization across other channels through personalized QR codes, text-to-give campaigns, and individualized landing pages. VDP printing can also combine mailing and printing into one process for reduced costs and improved postal discounts.
MuleSoft Meetup Singapore #8 March 2021Julian Douch
The document summarizes an agenda for the Singapore MuleSoft Meetup #8 on March 30th 2021. It includes:
- Welcome and introductions from the co-host Julian Douch and co-organizer Royston Lobo
- Two technical presentations on "MuleSoft & Kafka" by Senthil Ramu and "IoT & MuleSoft" by Giap Hui Tan
- An events roundup and quiz
- Details are provided on future meetups and opportunities for participants to propose presentation topics.
Open Source Telecom Software Survey 2022, Alan QuayleAlan Quayle
The survey gathered responses from 120 participants in 2022 compared to 114 in 2021. It covered general topics such as DDoS attacks, security practices, STIR/SHAKEN implementation, IPv6 deployment, and expectations for major tech companies over the next two years. Key findings included that around half of participants have experienced a DDoS attack in 2022 and application-level attacks were as common as volumetric attacks. Most organizations take both reactive and proactive security approaches but more so reactive. STIR/SHAKEN implementation is ongoing with international carriers needing it to terminate traffic in North America. IPv6 deployment remains steady with no major differences between regions. This survey provides insights into trends in the open source tele
This document proposes moving the Access Grid (AG) to a standards-based, message-oriented architecture using the XMPP protocol and Jabber instant messaging technology. The current AGTK2 implementation has problems including being proprietary, slow, and having limited communication abilities. By mapping the AG to open XMPP standards and an existing Jabber server, it could gain improved scalability, integration, and interoperability without being tied to a specific software implementation. The author has developed a prototype XMPP client called "ShutUp" to demonstrate this new messaging-oriented approach and how the AG's features could be supported through XMPP extensions and protocols.
Building a Streaming Microservices Architecture - Data + AI Summit EU 2020Databricks
Jules Damji and Denny Lee from Databricks Developer Relations will recap some keynote highlights, and each will briefly present personal picks from sessions that resonated well with them. Next, Jacek Laskowski, an independent consultant, will speak about Spark 3.0 internals, and Scott Haines from Twilio, Inc. will give a talk about structured streaming microservice architectures. This live coding session and technical deep dive are not to be missed!
TADSummit EMEA 2019, Challenges Consuming Programmable Telecoms from the Deve...Alan Quayle
Sebastian Schumann, Technology & Innovation at Deutsche Telekom
App development: I just want to make a call
Consuming APIs: It’s all fine as long as you know what you want and have done it before
App development vs. Telecom App development
Challenges Consuming Programmable Telecoms from the Developer’s PerspectiveSebastian Schumann
Challenges Consuming Programmable Telecoms from the Developer’s Perspective
- App development: I just want to make a call
- Consuming APIs: It’s all fine as long as you know what you want and have done it before
- App development vs. Telecom App development
Presented 2019-11-19 at TADSummit in London, UK
The document discusses potential disruption to technology supply chains from the rise of cloud computing. It notes that cloud threatens existing revenue streams and profitability for vendors. The relationship between vendors and the sales channel needs to address how cloud impacts current and future business. The document explores how cloud could create new business models and partnerships outside of traditional IT players. It raises questions about how different industries may enter the IT market by reselling cloud services to their existing customers.
The document discusses the future of television becoming "all-on-demand" where all video and audio content can be watched whenever requested. It mentions that fiber optic networks will become more important and that Google is providing 1GB internet speeds in Kansas City. Interactive experiences and sensing technologies will become key factors in differentiating services. The author is Netflix founder Reed Hastings and the source is Digital Times from July 2011.
IBM MQ and Kafka, what is the difference?David Ware
Message queueing solutions used to be the one general purpose tool used for all asynchronous application patterns, then along came event streaming as an application model. To support this effectively needed a whole new approach to how messages are handled by the messaging technology. Now the tables are turned and many are wondering if an event streaming solution can be used for all their asynchronous application patterns from now on. But just as message queueing solutions work in a way to optimize for their core use cases, so do event streaming solutions, and these behaviors directly affect the applications that use them. This session picks IBM MQ and Kafka to look at how they compare and, more importantly, differ in their behavior so that you can decide which application scenarios are best suited by each. Spoiler -they're both good in their own way!
The document discusses how digital publishing and new formats are changing the publishing industry. It summarizes codeMantra's services including:
- Hosted digital asset management (DAM) system called collectionPoint that allows publishers to organize and distribute content in various formats.
- Production facilities in India that can convert files to formats like ePub, PDF, and more.
- Metadata services to help publishers distribute content to partners.
- Add-on applications that can generate content samples, catalogs, and more using the DAM system.
Cloud Computing-The Challenges for Data Networks-Final PosterCharles Edwards
The document discusses challenges facing data networks as cloud computing demand increases. It notes that the communications industry expects data transfer demands to rise 100-300% annually. Researchers are examining competing network technologies to help advance capabilities. Future-proofing networks will require investments like expanding fibre infrastructure, improving spectral efficiency, and developing new standards like 5G. However, network operators face difficulties investing as much as needed due to falling revenues despite higher consumer demands and expectations. Overall network capacity will need to greatly increase to accommodate growing uses of cloud services, online content, and business/education needs.
This document summarizes strategies for digital marketing and mobile websites. It discusses the advantages of mobile websites such as immediacy, compatibility, upgradability, findability, shareability, and broader reach compared to mobile apps. Mobile websites are also easier and less expensive to create and maintain than mobile apps. However, mobile apps are preferable when interactivity, regular usage, complex calculations, native device functionality or offline usage are required. The document provides examples of SMS marketing strategies and discusses options for SMS gateways and beacon technologies.
Cubeacon Smart Retail Industry with iBeacon TechnologyAvianto Tiyo
In today's modern era, million people have smartphone. High mobility makes em feel hungry for e facilities and conveniences.
Mobile application is what people need most ese days. It can access web from eir mobile device and facilitate eir daily activities. Their expectations for more relevant information will continue to evolve. Not much of developers who finally success. No matter how beautifully designed, todays app must be combined wi a 'powerful tool' to connect directly to e customer for more effective connectivity.
Consumers no longer want to rely on emails and expect eir apps to notify em e moment relevant events occur, and not a second later.
A Set-top-Box (STB) is a very common name heard in the consumer electronics market. It is a device that is attached to a Television for enhancing its functions or the quality of its functions. On the other side, the STB is connected to an external source of signal, like satellite, cable, terrestrial or internet. The STB processes the signal it receives, turns it into content, which is then displayed on the television screen or other display device. There are different types of STBs based on what kind of signals it can receive and what kind of processing it can do. The most widely used STBs are DVB STBs, which receive DVB (Digital Video Broadcast) transmission.
VMblog - 2018 Containers Predictions from 16 Industry Expertsvmblog
Find out what's going on in the world of #containers in 2018. Read #predictions from 16 of the industry's leading experts to learn more about Docker, Kubernetes, Microservices and more! Hear from industry thought leaders from companies like Datadog, Hedvig, ManageEngine, Mirantis, Red Hat, SUSE and more. Make sure to also read the more than 280+ other expert predictions from technologies across virtualization, cloud computing, hyperconverged, IoT, security, etc. here: http://bit.ly/2DQi2OT
Data Con LA 2022 - Data Streaming with KafkaData Con LA
Jie Chen, Manager Advisory, KPMG
Data is the new oil. However, many organizations have fragmented data in siloed line of businesses. In this topic, we will focus on identifying the legacy patterns and their limitations and introducing the new patterns packed by Kafka's core design ideas. The goal is to tirelessly pursue better solutions for organizations to overcome the bottleneck in data pipelines and modernize the digital assets for ready to scale their businesses. In summary, we will walk through three uses cases, recommend Dos and Donts, Take aways for Data Engineers, Data Scientist, Data architect in developing forefront data oriented skills.
The document discusses an outage of electronic communications at the author's company that temporarily disrupted business operations. It summarizes:
1) The author's company relies heavily on email and internet for business but recently experienced a few days where their electronic systems were down, disrupting work.
2) While electronic communications have made their work more efficient, the outage highlighted the risks of overreliance on technology and how "antiquated" backup methods like phone and fax had fallen out of regular use.
3) The author recommends companies have up-to-date backup contact info printed for inevitable future outages, since correspondences default to electronic over time.
(rev) PT Link Net Tbk - FY21 Link Net Company Presentation.pdfBradJared
The document provides definitions and explanations of key terms used by PT Link Net Tbk, an Indonesian broadband provider, in their FY21 company presentation. It defines terms like homes passed, gross subscribers, net subscribers, backbone, last mile, hybrid fiber coaxial (HFC) network, and fiber to the home (FTTH) network. It also provides details on Link Net's network infrastructure, including that their backbone is completely fiber and their last mile is a mixture of HFC and FTTH technologies. The total length of Link Net's cables is 34,724kms, with 18,499kms being fiber and 16,225kms being HFC.
Similar to A Visual Introduction to Apache Kafka.pdf (20)
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...alexjohnson7307
Predictive maintenance is a proactive approach that anticipates equipment failures before they happen. At the forefront of this innovative strategy is Artificial Intelligence (AI), which brings unprecedented precision and efficiency. AI in predictive maintenance is transforming industries by reducing downtime, minimizing costs, and enhancing productivity.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Tatiana Kojar
Skybuffer AI, built on the robust SAP Business Technology Platform (SAP BTP), is the latest and most advanced version of our AI development, reaffirming our commitment to delivering top-tier AI solutions. Skybuffer AI harnesses all the innovative capabilities of the SAP BTP in the AI domain, from Conversational AI to cutting-edge Generative AI and Retrieval-Augmented Generation (RAG). It also helps SAP customers safeguard their investments into SAP Conversational AI and ensure a seamless, one-click transition to SAP Business AI.
With Skybuffer AI, various AI models can be integrated into a single communication channel such as Microsoft Teams. This integration empowers business users with insights drawn from SAP backend systems, enterprise documents, and the expansive knowledge of Generative AI. And the best part of it is that it is all managed through our intuitive no-code Action Server interface, requiring no extensive coding knowledge and making the advanced AI accessible to more users.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
A Comprehensive Guide to DeFi Development Services in 2024Intelisync
DeFi represents a paradigm shift in the financial industry. Instead of relying on traditional, centralized institutions like banks, DeFi leverages blockchain technology to create a decentralized network of financial services. This means that financial transactions can occur directly between parties, without intermediaries, using smart contracts on platforms like Ethereum.
In 2024, we are witnessing an explosion of new DeFi projects and protocols, each pushing the boundaries of what’s possible in finance.
In summary, DeFi in 2024 is not just a trend; it’s a revolution that democratizes finance, enhances security and transparency, and fosters continuous innovation. As we proceed through this presentation, we'll explore the various components and services of DeFi in detail, shedding light on how they are transforming the financial landscape.
At Intelisync, we specialize in providing comprehensive DeFi development services tailored to meet the unique needs of our clients. From smart contract development to dApp creation and security audits, we ensure that your DeFi project is built with innovation, security, and scalability in mind. Trust Intelisync to guide you through the intricate landscape of decentralized finance and unlock the full potential of blockchain technology.
Ready to take your DeFi project to the next level? Partner with Intelisync for expert DeFi development services today!
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframePrecisely
Inconsistent user experience and siloed data, high costs, and changing customer expectations – Citizens Bank was experiencing these challenges while it was attempting to deliver a superior digital banking experience for its clients. Its core banking applications run on the mainframe and Citizens was using legacy utilities to get the critical mainframe data to feed customer-facing channels, like call centers, web, and mobile. Ultimately, this led to higher operating costs (MIPS), delayed response times, and longer time to market.
Ever-changing customer expectations demand more modern digital experiences, and the bank needed to find a solution that could provide real-time data to its customer channels with low latency and operating costs. Join this session to learn how Citizens is leveraging Precisely to replicate mainframe data to its customer channels and deliver on their “modern digital bank” experiences.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.