Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Event-first thinking and streaming help organizations transition from followers to leaders in the market. A reliable, scalable, and economical streaming architecture helps them get there.
This talk first explores the ""classic streaming stack,"" based on the Lambda architecture, its origin, and why it didn't pick up amongst data-driven organizations. The modern streaming stack (MSS) is a lean, cloud-native, and economical alternative to classic streaming architectures, where it aims to make event-driven real-time applications viable for organizations.
The second half of the talk explores the MSS in detail, including its core components, their purposes, and how Kappa architecture has influenced it. Moreover, the talk lays out a few considerations before planning a new streaming application within an organization. The talk concludes by discussing the challenges in the streaming world and how vendors are trying to overcome them in the future.
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
Tackling the challenge of designing a machine learning model and putting it into production is the key to getting value back – and the roadblock that stops many promising machine learning projects. After the data scientists have done their part, engineering robust production data pipelines has its own set of challenges. Syncsort software helps the data engineer every step of the way.
Building on the process of finding and matching duplicates to resolve entities, the next step is to set up a continuous streaming flow of data from data sources so that as the sources change, new data automatically gets pushed through the same transformation and cleansing data flow – into the arms of machine learning models.
Some of your sources may already be streaming, but the rest are sitting in transactional databases that change hundreds or thousands of times a day. The challenge is that you can’t affect performance of data sources that run key applications, so putting something like database triggers in place is not the best idea. Using Apache Kafka or similar technologies as the backbone to moving data around doesn’t solve the problem of needing to grab changes from the source pushing them into Kafka and consuming the data from Kafka to be processed. If something unexpected happens – like connectivity is lost on either the source or the target side, you don’t want to have to fix it or start over because the data is out of sync.
View this 15-minute webcast on-demand to learn how to tackle these challenges in large scale production implementations.
Introduction of streaming data, difference between batch processing and stream processing, Research issues in streaming data processing, Performance evaluation metrics , tools for stream processing.
As part of this session, I will be giving an introduction to Data Engineering and Big Data. It covers up to date trends.
* Introduction to Data Engineering
* Role of Big Data in Data Engineering
* Key Skills related to Data Engineering
* Role of Big Data in Data Engineering
* Overview of Data Engineering Certifications
* Free Content and ITVersity Paid Resources
Don't worry if you miss the video - you can click on the below link to go through the video after the schedule.
https://youtu.be/dj565kgP1Ss
* Upcoming Live Session - Overview of Big Data Certifications (Spark Based) - https://www.meetup.com/itversityin/events/271739702/
Relevant Playlists:
* Apache Spark using Python for Certifications - https://www.youtube.com/playlist?list=PLf0swTFhTI8rMmW7GZv1-z4iu_-TAv3bi
* Free Data Engineering Bootcamp - https://www.youtube.com/playlist?list=PLf0swTFhTI8pBe2Vr2neQV7shh9Rus8rl
* Join our Meetup group - https://www.meetup.com/itversityin/
* Enroll for our labs - https://labs.itversity.com/plans
* Subscribe to our YouTube Channel for Videos - http://youtube.com/itversityin/?sub_confirmation=1
* Access Content via our GitHub - https://github.com/dgadiraju/itversity-books
* Lab and Content Support using Slack
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Digital Business Transformation in the Streaming EraAttunity
Enterprises are rapidly adopting stream computing backbones, in-memory data stores, change data capture, and other low-latency approaches for end-to-end applications. As businesses modernize their data architectures over the next several years, they will begin to evolve toward all-streaming architectures. In this webcast, Wikibon, Attunity, and MemSQL will discuss how enterprise data professionals should migrate their legacy architectures in this direction. They will provide guidance for migrating data lakes, data warehouses, data governance, and transactional databases to support all-streaming architectures for complex cloud and edge applications. They will discuss how this new architecture will drive enterprise strategies for operationalizing artificial intelligence, mobile computing, the Internet of Things, and cloud-native microservices.
Link to the Wikibon report - wikibon.com/wikibons-2018-big-data-analytics-trends-forecast
Link to Attunity Streaming CDC Book Download - http://www.bit.ly/cdcbook
Link to MemSQL's Free Data Pipeline Book - http://go.memsql.com/oreilly-data-pipelines
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the data streams. Products for doing event processing, such as Oracle Event Processing or Esper, are available for quite a long time and used to be called Complex Event Processing (CEP). In the past few years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Flink, Kafka Streams as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Stream Processing, discuss the core properties a Stream Processing platform should provide and highlight what differences you might find between the more traditional CEP and the more modern Stream Processing solutions.
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
Tackling the challenge of designing a machine learning model and putting it into production is the key to getting value back – and the roadblock that stops many promising machine learning projects. After the data scientists have done their part, engineering robust production data pipelines has its own set of challenges. Syncsort software helps the data engineer every step of the way.
Building on the process of finding and matching duplicates to resolve entities, the next step is to set up a continuous streaming flow of data from data sources so that as the sources change, new data automatically gets pushed through the same transformation and cleansing data flow – into the arms of machine learning models.
Some of your sources may already be streaming, but the rest are sitting in transactional databases that change hundreds or thousands of times a day. The challenge is that you can’t affect performance of data sources that run key applications, so putting something like database triggers in place is not the best idea. Using Apache Kafka or similar technologies as the backbone to moving data around doesn’t solve the problem of needing to grab changes from the source pushing them into Kafka and consuming the data from Kafka to be processed. If something unexpected happens – like connectivity is lost on either the source or the target side, you don’t want to have to fix it or start over because the data is out of sync.
View this 15-minute webcast on-demand to learn how to tackle these challenges in large scale production implementations.
Introduction of streaming data, difference between batch processing and stream processing, Research issues in streaming data processing, Performance evaluation metrics , tools for stream processing.
As part of this session, I will be giving an introduction to Data Engineering and Big Data. It covers up to date trends.
* Introduction to Data Engineering
* Role of Big Data in Data Engineering
* Key Skills related to Data Engineering
* Role of Big Data in Data Engineering
* Overview of Data Engineering Certifications
* Free Content and ITVersity Paid Resources
Don't worry if you miss the video - you can click on the below link to go through the video after the schedule.
https://youtu.be/dj565kgP1Ss
* Upcoming Live Session - Overview of Big Data Certifications (Spark Based) - https://www.meetup.com/itversityin/events/271739702/
Relevant Playlists:
* Apache Spark using Python for Certifications - https://www.youtube.com/playlist?list=PLf0swTFhTI8rMmW7GZv1-z4iu_-TAv3bi
* Free Data Engineering Bootcamp - https://www.youtube.com/playlist?list=PLf0swTFhTI8pBe2Vr2neQV7shh9Rus8rl
* Join our Meetup group - https://www.meetup.com/itversityin/
* Enroll for our labs - https://labs.itversity.com/plans
* Subscribe to our YouTube Channel for Videos - http://youtube.com/itversityin/?sub_confirmation=1
* Access Content via our GitHub - https://github.com/dgadiraju/itversity-books
* Lab and Content Support using Slack
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Digital Business Transformation in the Streaming EraAttunity
Enterprises are rapidly adopting stream computing backbones, in-memory data stores, change data capture, and other low-latency approaches for end-to-end applications. As businesses modernize their data architectures over the next several years, they will begin to evolve toward all-streaming architectures. In this webcast, Wikibon, Attunity, and MemSQL will discuss how enterprise data professionals should migrate their legacy architectures in this direction. They will provide guidance for migrating data lakes, data warehouses, data governance, and transactional databases to support all-streaming architectures for complex cloud and edge applications. They will discuss how this new architecture will drive enterprise strategies for operationalizing artificial intelligence, mobile computing, the Internet of Things, and cloud-native microservices.
Link to the Wikibon report - wikibon.com/wikibons-2018-big-data-analytics-trends-forecast
Link to Attunity Streaming CDC Book Download - http://www.bit.ly/cdcbook
Link to MemSQL's Free Data Pipeline Book - http://go.memsql.com/oreilly-data-pipelines
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analyzed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the data streams. Products for doing event processing, such as Oracle Event Processing or Esper, are available for quite a long time and used to be called Complex Event Processing (CEP). In the past few years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Flink, Kafka Streams as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Stream Processing, discuss the core properties a Stream Processing platform should provide and highlight what differences you might find between the more traditional CEP and the more modern Stream Processing solutions.
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindAvere Systems
While cloud computing offers virtually unlimited capacity, harnessing that capacity in an efficient, cost effective fashion can be cumbersome and difficult at the workload level. At the organizational level, it can quickly become chaos.
You must make choices around cloud deployment, and these choices could have a long-lasting impact on your organization. It is important to understand your options and avoid incomplete, complicated, locked-in scenarios. Data management and placement challenges make having the ability to automate workflows and processes across multiple clouds a requirement.
In this webinar, you will:
• Learn how to leverage cloud services as part of an overall computation approach
• Understand data management in a cloud-based world
• Hear what options you have to orchestrate HPC in the cloud
• Learn how cloud orchestration works to automate and align computing with specific goals and objectives
• See an example of an orchestrated HPC workload using on-premises data
From computational research to financial back testing, and research simulations to IoT processing frameworks, decisions made now will not only impact future manageability, but also your sanity.
(Big) Data Processing for Next Generation Business Value. Presented at the Leaders Buildings Leaders Conference, held at Union College on April 3, 2015.
https://www.ucollege.edu/academics/business-and-computer-science/leaders-building-leaders
This presentation is a semi-technical overview of big data and related use-cases, the Apache Hadoop software stack, and some example data-science / analysis models.
Most data visualisation solutions today still work on data sources which are stored persistently in a data store, using the so called “data at rest” paradigms. More and more data sources today provide a constant stream of data, from IoT devices to Social Media streams. These data stream publish with high velocity and messages often have to be processed as quick as possible. For the processing and analytics on the data, so called stream processing solutions are available. But these only provide minimal or no visualisation capabilities. One was is to first persist the data into a data store and then use a traditional data visualisation solution to present the data.
If latency is not an issue, such a solution might be good enough. An other question is which data store solution is necessary to keep up with the high load on write and read. If it is not an RDBMS but an NoSQL database, then not all traditional visualisation tools might already integrate with the specific data store. An other option is to use a Streaming Visualisation solution. They are specially built for streaming data and often do not support batch data. A much better solution would be to have one tool capable of handling both, batch and streaming data. This talk presents different architecture blueprints for integrating data visualisation into a fast data solution and highlights some of the products available to implement these blueprints.
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...Memoori
Memoori's 10th Webinar in the 2019 Smart Buildings Series. We spoke with Chris Irwin, VP Sales EMEA & Asia at J2 Innovations about the FIN 5 software framework and “Simplifying Building Automation by Leveraging Semantic Tagging with a New Breed of Software”.
Machine Learning for Smarter Apps - Jacksonville MeetupSri Ambati
Machine Learning for Smarter Apps with Tom Kraljevic
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
Apache Kafka is now nearly ubiquitous in modern data pipelines and use cases. While the Kafka development model is elegantly simple, operating Kafka clusters in production environments is a challenge. It’s hard to troubleshoot misbehaving Kafka clusters, especially when there are potentially hundreds or thousands of topics, producers and consumers and billions of messages.
The root cause of why real-time applications is lag may be due to an application problem – like poor data partitioning or load imbalance – or due to a Kafka problem – like resource exhaustion or suboptimal configuration. Therefore getting the best performance, predictability, and reliability for Kafka-based applications can be difficult. In the end, the operation of your Kafka powered analytics pipelines could themselves benefit from machine learning (ML).
Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz
Mit der Architektur steht und fällt jedes IT-Projekt. Das gilt in noch stärkerem Maße für Big-Data-Projekte, denn hier konnten noch keine Standards über Jahrzehnte ihre Tauglichkeit beweisen. Dennoch verbreiten und etablieren sich auch hier gute und effektive Lösungen. Der Vortrag erklärt, welche Bausteine wichtig für die verschiedenen Einsatzmöglichkeiten im Big-Data-Umfeld sind, und wie sie in konkrete Lösungen gegossen werden können. Dabei beleuchtet er sowohl traditionelle Big-Data-Architekturen als auch aktuelle Ansätze, wie z. B. die Lambda- und die Kappa-Architektur. Ebenfalls ein Thema sind Stream-Processing-Infrastrukturen und ihre Kombination mit Big-Data-Technologien. Ausgehend von einer produkt- und technologieunabhängigen Referenzarchitektur stellt dieser Vortrag verschiedene Lösungsmöglichkeiten auf Basis von Open-Source-Komponenten vor.
The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.
This presentation will describe how to go beyond a "Hello world" stream application and build a real-time data-driven product. We will present architectural patterns, go through tradeoffs and considerations when deciding on technology and implementation strategy, and describe how to put the pieces together. We will also cover necessary practical pieces for building real products: testing streaming applications, and how to evolve products over time.
Presented at highloadstrategy.com 2016 by Øyvind Løkling (Schibsted Products & Technology), joint work with Lars Albertsson (independent, www.mapflat.com).
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDATAVERSITY
Architecture matters. That's why today's innovators are taking a hard look at streaming data, an increasingly attractive option that can transform business in several ways: replacing aging data ingestion techniques like ETL; solving long-standing data quality challenges; improving business processes ranging from sales and marketing to logistics and procurement; or any number of activities related to accelerating data warehousing, business intelligence and analytics.
Register for this DM Radio Deep Dive Webinar to learn how streaming data can rejuvenate or supplant traditional data management practices. Host Eric Kavanagh will explain how streaming-first architectures can relieve data engineers from time-consuming, error-prone processes, ideally bidding farewell to those unpleasant batch windows. He'll be joined by Kevin Petrie of Attunity, who will explain why (with real-world story successes) streaming data solutions can keep the business fueled with trusted data in a timely, efficient manner for improved business outcomes.
Horses for Courses: Database RoundtableEric Kavanagh
The blessing and curse of today's database market? So many choices! While relational databases still dominate the day-to-day business, a host of alternatives has evolved around very specific use cases: graph, document, NoSQL, hybrid (HTAP), column store, the list goes on. And the database tools market is teeming with activity as well. Register for this special Research Webcast to hear Dr. Robin Bloor share his early findings about the evolving database market. He'll be joined by Steve Sarsfield of HPE Vertica, and Robert Reeves of Datical in a roundtable discussion with Bloor Group CEO Eric Kavanagh. Send any questions to info@insideanalysis.com, or tweet with #DBSurvival.
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
"In this talk, attendees will be provided with an introduction to Kafka Connect and the basics of Single Message Transforms (SMTs) and how they can be used to transform data streams in a simple and efficient way. SMTs are a powerful feature of Kafka Connect that allow custom logic to be applied to individual messages as they pass through the data pipeline. The session will explain how SMTs work, the types of transformations they can be used for, and how they can be applied in a modular and composable way.
Further, the session will discuss where SMTs fit in with Kafka Connect and when they should be used. Examples will be provided of how SMTs can be used to solve common data integration challenges, such as data enrichment, filtering, and restructuring. Attendees will also learn about the limitations of SMTs and when it might be more appropriate to use other tools or frameworks.
Additionally, an overview of the alternatives to SMTs, such as Kafka Streams and KSQL, will be provided. This will help attendees make an informed decision about which approach is best for their specific use case.
Whether attendees are developers, data engineers, or data scientists, this talk will provide valuable insights into how Kafka Connect and SMTs can help streamline data processing workflows. Attendees will come away with a better understanding of how these tools work and how they can be used to solve common data integration challenges."
"While Apache Kafka lacks native support for topic renaming, there are scenarios where renaming topics becomes necessary. This presentation will delve into the utilization of MirrorMaker 2.0 as a solution for renaming Kafka topics. It will illustrate how MirrorMaker 2.0 can efficiently facilitate the migration of messages from the old topic to the new one and how Kafka Connect Metrics can be employed to monitor the mirroring progress. The discussion will encompass the complexity of renaming Kafka topics, addressing certain limitations, and exploring potential workarounds when using MirrorMaker 2.0 for this purpose. Despite not being originally designed for topic renaming, MirrorMaker 2.0 has a suitable solution for renaming Kafka topics.
Blog Post : https://engineering.hellofresh.com/renaming-a-kafka-topic-d6ff3aaf3f03"
More Related Content
Similar to Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindAvere Systems
While cloud computing offers virtually unlimited capacity, harnessing that capacity in an efficient, cost effective fashion can be cumbersome and difficult at the workload level. At the organizational level, it can quickly become chaos.
You must make choices around cloud deployment, and these choices could have a long-lasting impact on your organization. It is important to understand your options and avoid incomplete, complicated, locked-in scenarios. Data management and placement challenges make having the ability to automate workflows and processes across multiple clouds a requirement.
In this webinar, you will:
• Learn how to leverage cloud services as part of an overall computation approach
• Understand data management in a cloud-based world
• Hear what options you have to orchestrate HPC in the cloud
• Learn how cloud orchestration works to automate and align computing with specific goals and objectives
• See an example of an orchestrated HPC workload using on-premises data
From computational research to financial back testing, and research simulations to IoT processing frameworks, decisions made now will not only impact future manageability, but also your sanity.
(Big) Data Processing for Next Generation Business Value. Presented at the Leaders Buildings Leaders Conference, held at Union College on April 3, 2015.
https://www.ucollege.edu/academics/business-and-computer-science/leaders-building-leaders
This presentation is a semi-technical overview of big data and related use-cases, the Apache Hadoop software stack, and some example data-science / analysis models.
Most data visualisation solutions today still work on data sources which are stored persistently in a data store, using the so called “data at rest” paradigms. More and more data sources today provide a constant stream of data, from IoT devices to Social Media streams. These data stream publish with high velocity and messages often have to be processed as quick as possible. For the processing and analytics on the data, so called stream processing solutions are available. But these only provide minimal or no visualisation capabilities. One was is to first persist the data into a data store and then use a traditional data visualisation solution to present the data.
If latency is not an issue, such a solution might be good enough. An other question is which data store solution is necessary to keep up with the high load on write and read. If it is not an RDBMS but an NoSQL database, then not all traditional visualisation tools might already integrate with the specific data store. An other option is to use a Streaming Visualisation solution. They are specially built for streaming data and often do not support batch data. A much better solution would be to have one tool capable of handling both, batch and streaming data. This talk presents different architecture blueprints for integrating data visualisation into a fast data solution and highlights some of the products available to implement these blueprints.
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...Memoori
Memoori's 10th Webinar in the 2019 Smart Buildings Series. We spoke with Chris Irwin, VP Sales EMEA & Asia at J2 Innovations about the FIN 5 software framework and “Simplifying Building Automation by Leveraging Semantic Tagging with a New Breed of Software”.
Machine Learning for Smarter Apps - Jacksonville MeetupSri Ambati
Machine Learning for Smarter Apps with Tom Kraljevic
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
Apache Kafka is now nearly ubiquitous in modern data pipelines and use cases. While the Kafka development model is elegantly simple, operating Kafka clusters in production environments is a challenge. It’s hard to troubleshoot misbehaving Kafka clusters, especially when there are potentially hundreds or thousands of topics, producers and consumers and billions of messages.
The root cause of why real-time applications is lag may be due to an application problem – like poor data partitioning or load imbalance – or due to a Kafka problem – like resource exhaustion or suboptimal configuration. Therefore getting the best performance, predictability, and reliability for Kafka-based applications can be difficult. In the end, the operation of your Kafka powered analytics pipelines could themselves benefit from machine learning (ML).
Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz
Mit der Architektur steht und fällt jedes IT-Projekt. Das gilt in noch stärkerem Maße für Big-Data-Projekte, denn hier konnten noch keine Standards über Jahrzehnte ihre Tauglichkeit beweisen. Dennoch verbreiten und etablieren sich auch hier gute und effektive Lösungen. Der Vortrag erklärt, welche Bausteine wichtig für die verschiedenen Einsatzmöglichkeiten im Big-Data-Umfeld sind, und wie sie in konkrete Lösungen gegossen werden können. Dabei beleuchtet er sowohl traditionelle Big-Data-Architekturen als auch aktuelle Ansätze, wie z. B. die Lambda- und die Kappa-Architektur. Ebenfalls ein Thema sind Stream-Processing-Infrastrukturen und ihre Kombination mit Big-Data-Technologien. Ausgehend von einer produkt- und technologieunabhängigen Referenzarchitektur stellt dieser Vortrag verschiedene Lösungsmöglichkeiten auf Basis von Open-Source-Komponenten vor.
The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.
This presentation will describe how to go beyond a "Hello world" stream application and build a real-time data-driven product. We will present architectural patterns, go through tradeoffs and considerations when deciding on technology and implementation strategy, and describe how to put the pieces together. We will also cover necessary practical pieces for building real products: testing streaming applications, and how to evolve products over time.
Presented at highloadstrategy.com 2016 by Øyvind Løkling (Schibsted Products & Technology), joint work with Lars Albertsson (independent, www.mapflat.com).
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDATAVERSITY
Architecture matters. That's why today's innovators are taking a hard look at streaming data, an increasingly attractive option that can transform business in several ways: replacing aging data ingestion techniques like ETL; solving long-standing data quality challenges; improving business processes ranging from sales and marketing to logistics and procurement; or any number of activities related to accelerating data warehousing, business intelligence and analytics.
Register for this DM Radio Deep Dive Webinar to learn how streaming data can rejuvenate or supplant traditional data management practices. Host Eric Kavanagh will explain how streaming-first architectures can relieve data engineers from time-consuming, error-prone processes, ideally bidding farewell to those unpleasant batch windows. He'll be joined by Kevin Petrie of Attunity, who will explain why (with real-world story successes) streaming data solutions can keep the business fueled with trusted data in a timely, efficient manner for improved business outcomes.
Horses for Courses: Database RoundtableEric Kavanagh
The blessing and curse of today's database market? So many choices! While relational databases still dominate the day-to-day business, a host of alternatives has evolved around very specific use cases: graph, document, NoSQL, hybrid (HTAP), column store, the list goes on. And the database tools market is teeming with activity as well. Register for this special Research Webcast to hear Dr. Robin Bloor share his early findings about the evolving database market. He'll be joined by Steve Sarsfield of HPE Vertica, and Robert Reeves of Datical in a roundtable discussion with Bloor Group CEO Eric Kavanagh. Send any questions to info@insideanalysis.com, or tweet with #DBSurvival.
Similar to Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022 (20)
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
"In this talk, attendees will be provided with an introduction to Kafka Connect and the basics of Single Message Transforms (SMTs) and how they can be used to transform data streams in a simple and efficient way. SMTs are a powerful feature of Kafka Connect that allow custom logic to be applied to individual messages as they pass through the data pipeline. The session will explain how SMTs work, the types of transformations they can be used for, and how they can be applied in a modular and composable way.
Further, the session will discuss where SMTs fit in with Kafka Connect and when they should be used. Examples will be provided of how SMTs can be used to solve common data integration challenges, such as data enrichment, filtering, and restructuring. Attendees will also learn about the limitations of SMTs and when it might be more appropriate to use other tools or frameworks.
Additionally, an overview of the alternatives to SMTs, such as Kafka Streams and KSQL, will be provided. This will help attendees make an informed decision about which approach is best for their specific use case.
Whether attendees are developers, data engineers, or data scientists, this talk will provide valuable insights into how Kafka Connect and SMTs can help streamline data processing workflows. Attendees will come away with a better understanding of how these tools work and how they can be used to solve common data integration challenges."
"While Apache Kafka lacks native support for topic renaming, there are scenarios where renaming topics becomes necessary. This presentation will delve into the utilization of MirrorMaker 2.0 as a solution for renaming Kafka topics. It will illustrate how MirrorMaker 2.0 can efficiently facilitate the migration of messages from the old topic to the new one and how Kafka Connect Metrics can be employed to monitor the mirroring progress. The discussion will encompass the complexity of renaming Kafka topics, addressing certain limitations, and exploring potential workarounds when using MirrorMaker 2.0 for this purpose. Despite not being originally designed for topic renaming, MirrorMaker 2.0 has a suitable solution for renaming Kafka topics.
Blog Post : https://engineering.hellofresh.com/renaming-a-kafka-topic-d6ff3aaf3f03"
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
"Trendyol, Turkey's leading e-commerce company, is committed to positively impacting the lives of millions of customers. Our decision-making processes are entirely driven by data. As a data warehouse team, our primary goal is to provide accurate and up-to-date data, enabling the extraction of valuable business insights.
We utilize the benefits provided by Kafka and Kafka Connect to facilitate the transfer of data from the source to our analytical environment. We recently transitioned our Kafka Connect clusters from on-premise VMs to Kubernetes. This shift was driven by our desire to effectively manage rapid growth(marked by a growing number of producers, consumers, and daily messages), ensuring proper monitoring and consistency. Consistency is crucial, especially in instances where we employ Single Message Transforms to manipulate records like filtering based on their keys or converting a JSON Object into a JSON string.
Monitoring our cluster's health is key and we achieve this through Grafana dashboards and alerts generated through kube-state-metrics. Additionally, Kafka Connect's JMX metrics, coupled with NewRelic, are employed for comprehensive monitoring.
The session will aim to explain our approach to NRT data ingestion, outlining the role of Kafka and Kafka Connect, our transition journey to K8s, and methods employed to monitor the health of our clusters."
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
"Join our lightning talk to delve into the strategies vital for maintaining a resilient Kafka service.
While proactive monitoring is key for issue prevention, failures will still occur. Rapid detection tools will enable you to identify and resolve problems before they impact end-users. This session explores the techniques employed by Kafka cloud providers for this detection, many of which are also applicable if you are managing independent Kafka clusters or applications.
The talk focuses on health-checking, a powerful tool that encompasses an application and its monitoring to validate Kafka environment availability. The session navigates through Kafka health-check methods, sharing best practices, identifying common pitfalls, and highlighting the monitoring of critical performance metrics like throughput and latency for early issue detection.
Attendees will gain valuable insights into the art of health-checking their Kafka environment, equipping them with the tools to identify and address issues before they escalate into critical problems. We invite all Kafka enthusiasts to join us in this talk to foster a deeper understanding of Kafka health-checking and ensure the continued smooth operation of your Kafka environment."
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
"Stream processing systems traditionally gave their users the choice between at least once processing and at most once processing: accepting duplicate data or missing data. But ideally we would provide exactly-once processing, where every event in the input data is represented exactly once in the output.
Kafka provides a transaction API that enables exactly-once when using Kafka as your source and sink. But this API has turned out to not be well suited for use by high level streaming systems, requiring various work arounds to still provide transactional processing.
In this talk, I’ll cover how the transaction API works, and how systems like Arroyo and Flink have used it to build exactly-once support, and how improvements to the transactional API will enable better end-to-end support for consistent stream processing."
"In this talk, we will explore the exciting world of IoT and computer vision by presenting a unique project: Fish Plays Pokemon. Using an ESP Eye camera connected to an ESP32 and other IoT devices, to monitor fish's movements in an aquarium.
This project showcases the power of IoT and computer vision, demonstrating how even a fish can play a popular video game. We will discuss the challenges we faced during development, including real-time processing, IoT device integration, and Kafka message consumption.
By the end of the talk, attendees will have a better understanding of how to combine IoT, computer vision, and the usage of a serverless cloud to create innovative projects. They will also learn how to integrate IoT devices with Kafka to simulate keyboard behavior, opening up endless possibilities for real-time interactions between the physical and digital worlds."
What is tiered storage and what is it good for? After this session you will know how to leverage the tiered storage feature to enable longer retention than the storage attached to brokers allows. You will get acquainted with the different configuration options and know what to expect when you enable the feature, like for example when will the first upload to the remote object storage take place.
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
"Real-time 24/7 monitoring and verification of massive data is challenging – even more so for the world’s second largest manufacturer of memory chips and semiconductors. Tolerance levels are incredibly small, any small defect needs to be identified and dealt with immediately. The goal of semiconductor manufacturing is to improve yield and minimize unnecessary work.
However, even with real-time data collection, the data was not easy to manipulate by users and it took many days to enable stream processing requests – limiting its usefulness and value to the business.
You’ll hear why SK hynix switched to Confluent and how we developed a self-service stream process portal on top of it. Now users have an easy-to-use service to manipulate the data they want.
Results have been impressive, stream processing requests are available the same day – previously taking 5 days! We were also able to drive down costs by 10% as stream processing requests no longer require additional hardware.
What you’ll take away from our talk:
- What were the pain points in the previous environment
- How we transitioned to Confluent without service downtime
- Creating a self-service stream processing portal built on top of Connect and ksqlDB
- Use case of stream process portal"
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
"Discover how default configurations might impact ingestion times, especially when dealing with large files. We'll explore a real-world scenario with a 20,000,000+ line file, assessing metrics and exploring the bottleneck in the default setup. Understand the intricacies of batch size calculations and how to optimize them based on your unique data characteristics.
Walk away with actionable insights as we showcase a practical example, turning a 7-hour ingestion process into a mere 30 minutes for over 30,000,000 records in a Kafka topic. Uncover metrics, configurations, and best practices to elevate the performance of your Kafka Connect CSV source connectors. Don't miss this opportunity to optimize your data pipeline and ensure smooth, efficient data flow."
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
"In order to meet the current and ever-increasing demand for near-zero RPO/RTO systems, a focus on resiliency is critical. While Kafka offers built-in resiliency features, a perfect blend of client and cluster resiliency is necessary in order to achieve a highly resilient Kafka client application.
At Fidelity Investments, Kafka is used for a variety of event streaming needs such as core brokerage trading platforms, log aggregation, communication platforms, and data migrations. In this lightening talk, we will discuss the governance framework that has enabled producers and consumers to achieve their SLAs during unprecedented failure scenarios. We will highlight how we automated resiliency tests through chaos engineering and tightly integrated observability dashboards for Kafka clients to analyze and optimize client configurations. And finally, we will summarize the chaos test suite and the ""test, test and test"" mantra that are helping Fidelity Investments reach its goal of a future with zero down-time."
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
"There are various strategies for securely connecting to Kafka clusters between different networks or over the public internet. Many cloud providers even offer endpoints that privately route traffic between networks and are not exposed to the internet. But, depending on your network setup and how you are running Kafka, these options ... might not be an option!
In this session, we’ll discuss how you can use SSH bastions or a self managed PrivateLink endpoint to establish connectivity to your Kafka clusters without exposing brokers directly to the internet. We explain the required network configuration, and show how we at Materialize have contributed to librdkafka to simplify these scenarios and avoid fragile workarounds."
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
"In my talk, we will examine all the stages of building our self-service Streaming Data Platform based on Apache Flink and Kafka Connect, from the selection of a solution for stateful streaming data processing, right up to the successful design of a robust self-service platform, covering the challenges that we’ve met.
I will share our experience in providing non-Java developers with a company-wide self-service solution, which allows them to quickly and easily develop their streaming data pipelines.
Additionally, I will highlight specific business use cases that would not have been implemented without our platform.0 characters0 characters"
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
"Almost everyone has heard about large language models, and tens of millions of people have tried out OpenAI ChatGPT and Google Bard. However, the intricate architecture and underlying mathematics driving these remarkable systems remain elusive to many.
LLM's are fascinating - so let's grab a drink and find out how these systems are built and dive deep into their inner workings. In the length of time it to enjoy a round of drinks, you'll understand the inner workings of these models. We'll take our first sip of word vectors, enjoy the refreshing taste of the transformer, and drain a glass understanding how these models are trained on phenomenally large quantities of data.
Large language models for your streaming application - explained with a little maths and a lot of pub stories"
"Monitoring is a fundamental operation when running Kafka and Kafka applications in production. There are numerous metrics available when using Kafka, however the sheer number is overwhelming, making it challenging to know where to start and how to properly utilise them.
This session will introduce you to some of the key metrics that should be monitored and best practices in fine tuning your monitoring. We will delve into which metrics are the indicators for cluster’s availability and performance and are the most helpful when debugging client applications."
Kafka Streams relies on state restoration for maintaining standby tasks as failure recovery mechanism as well as for restoring the state after rebalance scenarios. When you are scaling up or down your application instances, it is necessary to know the current state of the restoration process for each active and standby task in order to prevent a long restoration process as much as possible. During this presentation, you will get an understanding of how KIP-869 provides valuable information about the current active task restoration after a rebalance and KIP-988 opens a window to the continuous process of standby restoration. When you encounter a situation in which you need to choose whether or not to scale up or down your application instances, both KIPs will be an invaluable ally for you.
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
"In this talk, we will dive into the world of Kafka producer configs and explore how to understand and optimize them for better performance. We will cover the different types of configs, their impact on performance, and how to tune them to achieve the best results. Whether you're new to Kafka or a seasoned pro, this session will provide valuable insights and practical tips for improving your Kafka producer performance.
- Introduction to Kafka producer internal and workflow
- Understanding the producer configs like linger.ms, batch.size, buffer.memory and their impact on performance
- Learning about producer configs like max.block.ms, delivery.timeout.ms, request.timeout.ms and retries to make producer more resilient.
- Discuss configs like enable.idempotence, max.in.flight.requests.per.connection and transaction related configs to achieve delivery guarantees.
- Q&A session with attendees to address specific questions and concerns."
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
"Data contracts are one of the hottest topics in the data management community. A data contract is a formal agreement between a data producer and its consumers, aimed at reducing data downtime and improving data quality. Schemas are an important part of data contracts, but they are not the only relevant element.
In this talk, we’ll:
1. see why data contracts are so important but also difficult to implement;
2. identify the characteristics of a well-designed data contract:
discuss the anatomy of a data contract, its main elements and, how to formally describe them;
3. show how to manage the lifecycle of a data contract leveraging Confluent Platform's services."
"In the realm of stateful stream processing, Apache Flink has emerged as a powerful and versatile platform. However, the conventional SQL-based approach often limits the full potential of Flink applications.
We will delve into the benefits of adopting a code-first approach, which provides developers with greater control over application logic, facilitates complex transformations, and enables more efficient handling of state and time. We will also discuss how the code-first approach can lead to more maintainable and testable code, ultimately improving the overall quality of your Flink applications.
Whether you're a seasoned Flink developer or just starting your journey, this talk will provide valuable insights into how a code-first approach can revolutionize your stream processing applications."
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
"Change Data Capture (CDC) has become a commodity in data engineering, much in part due to the ever-rising success of Debezium [1]. But is that all there is? In this lightning talk, we’ll outline the current state of the CDC ecosystem, and understand why adopting a Debezium alternative is still a hard sell. If you’ve ever wondered what else is out there, but can’t keep up with the sprawling of new tools in the ecosystem; we’ll wrap it up for you!
[1] https://debezium.io/"
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
"Separation of compute and storage has become the de-facto standard in the data industry for batch processing.
The addition of tiered storage to open source Apache Kafka is the first step in bringing true separation of compute and storage to the streaming world.
In this talk, we'll discuss in technical detail how to take the concept of tiered storage to its logical extreme by building an Apache Kafka protocol compatible system that has zero local disks.
Eliminating all local disks in the system requires not only separating storage from compute, but also separating data from metadata. This is a monumental task that requires reimagining Kafka's architecture from the ground up, but the benefits are worth it.
This approach enables a stateless, elastic, and serverless deployment model that minimizes operational overhead and also drives inter-zone networking costs to almost zero."
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
3. Background
• This talk is based on my blog
that I published in April, 2022.
• This talk has been updated
with a few new things since
then.
• Enjoy!
4. Goal of the Talk
What Are We Going To Talk About Today?
Introduce you to the things
required to build real-time applications
that harness value from streaming data
5. The Plan
The Order of Things
1. A refresher on streaming data
2. The classic streaming stack
3. The modern streaming stack
4. Current trends and the future outlook
7. Streaming Data
What Is a stream?
A stream is a continuous, never-ending data
f
low with no beginning or
end. The data is incrementally made available over time, enabling you to
act upon it without needing to be downloaded
f
irst.
8. Events
Streams are made of events
A data stream consists of a series of data points ordered in time.
Each data point represents an “event” or a change in the state of the
business.
T4 T3 T2 T1 T0
Event source
Event stream
Time
Events
9. Event First Thinking
Modelling State Changes in Systems
A user with ID 1234 purchased item 567 for $3.99 on 2022/06/12 at Austin, TX
Fact Value
User ID 1234
Item ID 567
Price Paid $3.99
Date 2022/06/12
Place Austin, TX
• Events represents facts.
• Events are immutable.
• Events belong to the past.
11. Events Have A Shelf Life
Act Fast Before You Lose Their Value
Image credit - https://d3i71xaburhd42.cloudfront.net/8cb6c2711afd3e504400ee12d3b582cc06348b08/7-Figure2-1.png
16. The Beginning
• Real-time analytics dates back to decades, existed in the forms of
Complex Event Processing (CEP) and Event Stream Processing (ESP).
• Most of the work has been academic. But few vendors like Progress
Apama, Esper, Tibco, and Streambase tried bringing it to the mass
market.
22. • Overly complicated technology:
Specialised skillset of distributed
systems and performance
engineering.
• Limited only to the JVM: Non-
JVM developers had no option
rather than adapting.
• Higher footprint on
infrastructure: Stream
processors tax heavily on the CPU
and RAM.
• Maintenance overhead: Having
to maintain both speed and batch
layers.
24. Modern Streaming Stack
Modern Cloud-native tools
Managed and Serverless platforms
Rich tooling and developer experience
Expressive programming model
25. MSS is the classic streaming stack reimagined with
self-service cloud-native tools
providing a simpli
f
ied yet powerful developer experience
to build real-time analytics applications.
26. Modern Streaming Stack
STREAMING DATA
PLATFORM
STREAM PROCESSING
EVENT
PRODUCERS
TIERED
STORAGE
DATA API,
METADATA &
GOVERNANCE
Data-driven
Applications
Operational
Systems
Real-time
Analytics
SERVING LAYER
30. • Ingest events from sources in a
scalable manner, and store
them durably until they are
processed.
• Based on an immutable,
distributed log
f
ile. Events are
appended to the log and
partitioned across multiple
servers for durability and
scalability.
EVENT
PRODUCERS
Streaming Data Platform
TOPIC
TOPIC
TOPIC
TOPIC
TOPIC
TOPIC
37. Serving Layer
Expectations
• Serve queries with sub-second latency to provide a better user experience.
• Support a throughput of hundreds of thousands of queries per second to
serve an Internet-scale user base.
• Ensure data freshness — serve analytics from data ingested a few seconds
ago.
• Run complex OLAP queries, supporting joins, aggregations, and
f
iltering on
large data sets.
40. Serving Layer
STREAMING DATA
PLATFORM
New Events
Older Events
Tiered Storage
• Back
f
illing
• Hydrating new applications
• Experimentation (ad-hoc querying)
• Archival/regulatory compliance
• Training ML models
O
ff
line Use Cases
42. Analytics must be democratised
and accessible across the board…
Image credits - https://www.datanami.com/2022/01/21/data-meshes-set-to-spread-in-2022/, https://www.con
f
luent.io/blog/how-to-build-a-data-mesh-using-
event-streams/
43. Event Mesh
EVENT CATALOG SCHEMA REGISTRY
STREAMING API GRAPHQL API
Serving Layer
STREAM PROCESSOR
EVENT STREAMING
PLATFORM
Decision makers Data applications Regulatory bodies Business partners
Real-time Insights
46. Convergence of Stream Processing and Serving Layer
Streaming databases takes the stateful stream processing to the next level.
SaaS o
ff
errings Integrated serving layer Write logic with SQL
Pluggable integrations
A
ff
ordable Developer friendly
Pay-as-you-go
Less components to manage
Integrated tooling
Caters to non-JVM developers
Self-serve
47. Rise of The Lakehouse Architecture
A Lakehouse combines a data warehouse, data lake, and an event streaming platform
together.
High-throughput
streaming ingestion
Change Data Capture
Upserts
Transactions
Table formats
49. Takeaways
There’s No Silver Bullet
• Start small, build the critical path, and iterate.
• Pick components based on the need and know their limitations.
• Experiment, fail fast, and fail cheap.
• Go for managed services, if the team is small and new to streaming
technologies.
• Learn from mistakes, establish processes, and share wisdom!!