This document summarizes a cloud-native stream processor. It discusses how the stream processor is lightweight, open source, and supports distributed deployment on Docker and Kubernetes. It also outlines key features like real-time data integration, complex pattern detection, online machine learning, and integration with databases and services. Use cases like fraud detection, IoT analytics, and real-time decision making are provided.
Amazon Elastic MapReduce is one of the largest Hadoop operators in the world. Since its launch five years ago, AWS customers have launched more than 5.5 million Hadoop clusters.
In this talk, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters and other Amazon EMR architectural patterns. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost efficient.
Speakers:
Ian Meyers, AWS Solutions Architect
Ian McDonald, IT Director, SwiftKey
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaKai Wähner
Streaming all over the World: Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka.
Learn about various case studies for event streaming with Apache Kafka across industries. The talk explores architectures for real-world deployments from Audi, BMW, Disney, Generali, Paypal, Tesla, Unity, Walmart, William Hill, and more. Use cases include fraud detection, mainframe offloading, predictive maintenance, cybersecurity, edge computing, track&trace, live betting, and much more.
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Kai Wähner
The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems.
Unfortunately, the underlying technologies are often misunderstood, overused for monolithic and inflexible architectures, and pitched for wrong use cases by vendors. Let’s explore this dilemma in a presentation.
The slides cover technologies such as Apache Kafka, Apache Spark, Confluent, Databricks, Snowflake, Elasticsearch, AWS Redshift, GCP with Google Bigquery, and Azure Synapse.
An overview of Amazon Kinesis Firehose, Amazon Kinesis Analytics, and Amazon Kinesis Streams so you can quickly get started with real-time, streaming data.
Kafka Streams State Stores Being Persistentconfluent
Being Persistent: A Look Into Kafka Streams State Stores, Neil Buesing, Principal Solutions Architect, Rill Data
Meetup link: https://www.meetup.com/TwinCities-Apache-Kafka/events/284002062/
Amazon Elastic MapReduce is one of the largest Hadoop operators in the world. Since its launch five years ago, AWS customers have launched more than 5.5 million Hadoop clusters.
In this talk, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters and other Amazon EMR architectural patterns. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost efficient.
Speakers:
Ian Meyers, AWS Solutions Architect
Ian McDonald, IT Director, SwiftKey
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaKai Wähner
Streaming all over the World: Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka.
Learn about various case studies for event streaming with Apache Kafka across industries. The talk explores architectures for real-world deployments from Audi, BMW, Disney, Generali, Paypal, Tesla, Unity, Walmart, William Hill, and more. Use cases include fraud detection, mainframe offloading, predictive maintenance, cybersecurity, edge computing, track&trace, live betting, and much more.
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Kai Wähner
The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems.
Unfortunately, the underlying technologies are often misunderstood, overused for monolithic and inflexible architectures, and pitched for wrong use cases by vendors. Let’s explore this dilemma in a presentation.
The slides cover technologies such as Apache Kafka, Apache Spark, Confluent, Databricks, Snowflake, Elasticsearch, AWS Redshift, GCP with Google Bigquery, and Azure Synapse.
An overview of Amazon Kinesis Firehose, Amazon Kinesis Analytics, and Amazon Kinesis Streams so you can quickly get started with real-time, streaming data.
Kafka Streams State Stores Being Persistentconfluent
Being Persistent: A Look Into Kafka Streams State Stores, Neil Buesing, Principal Solutions Architect, Rill Data
Meetup link: https://www.meetup.com/TwinCities-Apache-Kafka/events/284002062/
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
Not all workloads allow cloud computing. Low latency, cybersecurity, and cost-efficiency require a suitable combination of edge computing and cloud integration.
This session explores architectures and design patterns for software and hardware considerations to deploy hybrid data streaming with Apache Kafka anywhere. A live demo shows data synchronization from the edge to the public cloud across continents with Kafka on Hivecell and Confluent Cloud.
講師: Ivan Cheng, Solution Architect, AWS
Join us for a series of introductory and technical sessions on AWS Big Data solutions. Gain a thorough understanding of what Amazon Web Services offers across the big data lifecycle and learn architectural best practices for applying those solutions to your projects.
We will kick off this technical seminar in the morning with an introduction to the AWS Big Data platform, including a discussion of popular use cases and reference architectures. In the afternoon, we will deep dive into Machine Learning and Streaming Analytics. We will then walk everyone through building your first Big Data application with AWS.
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureKai Wähner
Apache Kafka in conjunction with Apache Spark became the de facto standard for processing and analyzing data. Both frameworks are open, flexible, and scalable.
Unfortunately, the latter makes operations a challenge for many teams. Ideally, teams can use serverless SaaS offerings to focus on business logic. However, hybrid and multi-cloud scenarios require a cloud-native platform that provides automated and elastic tooling to reduce the operations burden.
This session explores different architectures to build serverless Apache Kafka and Apache Spark multi-cloud architectures across regions and continents.
We start from the analytics perspective of a data lake and explore its relation to a fully integrated data streaming layer with Kafka to build a modern data Data Lakehouse.
Real-world use cases show the joint value and explore the benefit of the "delta lake" integration.
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
Real-time data beats slow data. That’s true for almost every use case. Nevertheless, enterprise architects build new infrastructures with the Lambda architecture that includes separate batch and real-time layers.
This video explores why a single real-time pipeline, called Kappa architecture, is the better fit for many enterprise architectures. Real-world examples from companies such as Disney, Shopify, Uber, and Twitter explore the benefits of Kappa but also show how batch processing fits into this discussion positively without the need for a Lambda architecture.
The main focus of the discussion is on Apache Kafka (and its ecosystem) as the de facto standard for event streaming to process data in motion (the key concept of Kappa), but the video also compares various technologies and vendors such as Confluent, Cloudera, IBM Red Hat, Apache Flink, Apache Pulsar, AWS Kinesis, Amazon MSK, Azure Event Hubs, Google Pub Sub, and more.
Video recording of this presentation:
https://youtu.be/j7D29eyysDw
Further reading:
https://www.kai-waehner.de/blog/2021/09/23/real-time-kappa-architecture-mainstream-replacing-batch-lambda/
https://www.kai-waehner.de/blog/2021/04/20/comparison-open-source-apache-kafka-vs-confluent-cloudera-red-hat-amazon-msk-cloud/
https://www.kai-waehner.de/blog/2021/05/09/kafka-api-de-facto-standard-event-streaming-like-amazon-s3-object-storage/
Amazon Kinesis provides services for you to work with streaming data on AWS. Learn how to load streaming data continuously and cost-effectively to Amazon S3 and Amazon Redshift using Amazon Kinesis Firehose without writing custom stream processing code. Get an introduction to building custom stream processing applications with Amazon Kinesis Streams for specialized needs.
Apache Kafka as Event Streaming Platform for Microservice ArchitecturesKai Wähner
This session introduces Apache Kafka, an event-driven open source streaming platform. Apache Kafka goes far beyond scalable, high volume messaging. In addition, you can leverage Kafka Connect for integration and the Kafka Streams API for building lightweight stream processing microservices in autonomous teams. The Confluent Platform adds further components such as a Schema Registry, REST Proxy, KSQL, Clients for different programming languages and Connectors for different technologies.
The session discusses how tech giants like LinkedIn, Ebay or Airbnb leverage Apache Kafka as event streaming platform to solve various different business problems and how to create a scalable, flexible microservice architecture. A live demo shows how you can easily process and analyze streams of events using Apache Kafka and KSQL.
Amazon EMR enables fast processing of large structured or unstructured datasets, and in this presentation we'll show you how to setup an Amazon EMR job flow to analyse application logs, and perform Hive queries against it. We also review best practices around data file organisation on Amazon Simple Storage Service (S3), how clusters can be started from the AWS web console and command line, and how to monitor the status of a Map/Reduce job.
Finally we take a look at Hadoop ecosystem tools you can use with Amazon EMR and the additional features of the service.
See a recording of the webinar based on this presentation on YouTube here:
Check out the rest of the Masterclass webinars for 2015 here: http://aws.amazon.com/campaigns/emea/masterclass/
See the Journey Through the Cloud webinar series here: http://aws.amazon.com/campaigns/emea/journey/
This is my presentation at SQLBits 8, Brighton, 9th April 2011. This session is about advanced dimensional modelling topics such as Fact Table Primary Key, Vertical Fact Tables, Aggregate Fact Tables, SCD Type 6, Snapshotting Transaction Fact Tables, 1 or 2 Dimensions, Dealing with Currency Rates, When to Snowflake, Dimensions with Multi Valued Attributes, Transaction-Level Dimensions, Very Large Dimensions, A Dimension With Only 1 Attribute, Rapidly Changing Dimensions, Banding Dimension Rows, Stamping Dimension Rows and Real Time Fact Table. Prerequisites: You need have a basic knowledge of dimensional modelling and relational database design.
My name is Vincent Rainardi. I am a data warehouse & BI architect. I wrote a book on SQL Server data warehousing & BI, as well as many articles on my blog, www.datawarehouse.org.uk. I welcome questions and discussions on data warehousing on vrainardi@gmail.com. Enjoy the presentation.
Apache Kafka in the Transportation and LogisticsKai Wähner
Event Streaming with Apache Kafka in the Transportation and Logistics.
Track & Trace, Real-time Locating System, Customer 360, Open API, and more…
Examples include Swiss Post, SBB, Deutsche Bahn, Hermes, Migros, Here Technologies, Otonomo, Lyft, Uber, Free Now, Lufthansa, Air France, Singapore Airlines, Amadeus Group, and more.
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
Kafka streams windowing behind the curtain confluent
Kafka Streams Windowing Behind the Curtain, Neil Buesing, Principal Solutions Architect, Rill
https://www.meetup.com/TwinCities-Apache-Kafka/events/279316299/
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...Databricks
Working with our customers, developers and partners around the world, it's clear DevOps has become increasingly critical to a team's success. Continuous integration (CI) and continuous delivery (CD) which is part of DevOps, embody a culture, set of operating principles, and collection of practices that enable application development teams to deliver code changes more frequently and reliably. In this session, we will cover how you can automate your entire process from code commit to production using CI/CD pipelines in Azure DevOps for Azure Databricks applications. Using CI/CD practices, you can simplify, speed and improve your cloud development to deliver features to your customers as soon as they're ready.
Training for AWS Solutions Architect at http://zekelabs.com/courses/amazon-web-services-training-bangalore/.This slide describes about cloud trail key concepts, workflow and event history
___________________________________________________
zekeLabs is a Technology training platform. We provide instructor led corporate training and classroom training on Industry relevant Cutting Edge Technologies like Big Data, Machine Learning, Natural Language Processing, Artificial Intelligence, Data Science, Amazon Web Services, DevOps, Cloud Computing and Frameworks like Django,Spring, Ruby on Rails, Angular 2 and many more to Professionals.
Reach out to us at www.zekelabs.com or call us at +91 8095465880 or drop a mail at info@zekelabs.com
[WSO2Con Asia 2018] Patterns for Building Streaming AppsWSO2
This slide deck explains how to enable digital transformation through streaming analytics and how easily streaming applications can be implemented
Learn more: https://wso2.com/library/conference/2018/08/wso2con-asia-2018-patterns-for-building-streaming-apps/
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital EnterpriseWSO2
The WSO2 analytics platform provides a high performance, lean, enterprise-ready, streaming solution to solve data integration and analytics challenges faced by connected businesses. This platform offers real-time, interactive, machine learning and batch processing technologies that empower enterprises to build a digital business. This session explores how to enable digital transformation by building a data analytics platform.
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
Not all workloads allow cloud computing. Low latency, cybersecurity, and cost-efficiency require a suitable combination of edge computing and cloud integration.
This session explores architectures and design patterns for software and hardware considerations to deploy hybrid data streaming with Apache Kafka anywhere. A live demo shows data synchronization from the edge to the public cloud across continents with Kafka on Hivecell and Confluent Cloud.
講師: Ivan Cheng, Solution Architect, AWS
Join us for a series of introductory and technical sessions on AWS Big Data solutions. Gain a thorough understanding of what Amazon Web Services offers across the big data lifecycle and learn architectural best practices for applying those solutions to your projects.
We will kick off this technical seminar in the morning with an introduction to the AWS Big Data platform, including a discussion of popular use cases and reference architectures. In the afternoon, we will deep dive into Machine Learning and Streaming Analytics. We will then walk everyone through building your first Big Data application with AWS.
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureKai Wähner
Apache Kafka in conjunction with Apache Spark became the de facto standard for processing and analyzing data. Both frameworks are open, flexible, and scalable.
Unfortunately, the latter makes operations a challenge for many teams. Ideally, teams can use serverless SaaS offerings to focus on business logic. However, hybrid and multi-cloud scenarios require a cloud-native platform that provides automated and elastic tooling to reduce the operations burden.
This session explores different architectures to build serverless Apache Kafka and Apache Spark multi-cloud architectures across regions and continents.
We start from the analytics perspective of a data lake and explore its relation to a fully integrated data streaming layer with Kafka to build a modern data Data Lakehouse.
Real-world use cases show the joint value and explore the benefit of the "delta lake" integration.
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
Real-time data beats slow data. That’s true for almost every use case. Nevertheless, enterprise architects build new infrastructures with the Lambda architecture that includes separate batch and real-time layers.
This video explores why a single real-time pipeline, called Kappa architecture, is the better fit for many enterprise architectures. Real-world examples from companies such as Disney, Shopify, Uber, and Twitter explore the benefits of Kappa but also show how batch processing fits into this discussion positively without the need for a Lambda architecture.
The main focus of the discussion is on Apache Kafka (and its ecosystem) as the de facto standard for event streaming to process data in motion (the key concept of Kappa), but the video also compares various technologies and vendors such as Confluent, Cloudera, IBM Red Hat, Apache Flink, Apache Pulsar, AWS Kinesis, Amazon MSK, Azure Event Hubs, Google Pub Sub, and more.
Video recording of this presentation:
https://youtu.be/j7D29eyysDw
Further reading:
https://www.kai-waehner.de/blog/2021/09/23/real-time-kappa-architecture-mainstream-replacing-batch-lambda/
https://www.kai-waehner.de/blog/2021/04/20/comparison-open-source-apache-kafka-vs-confluent-cloudera-red-hat-amazon-msk-cloud/
https://www.kai-waehner.de/blog/2021/05/09/kafka-api-de-facto-standard-event-streaming-like-amazon-s3-object-storage/
Amazon Kinesis provides services for you to work with streaming data on AWS. Learn how to load streaming data continuously and cost-effectively to Amazon S3 and Amazon Redshift using Amazon Kinesis Firehose without writing custom stream processing code. Get an introduction to building custom stream processing applications with Amazon Kinesis Streams for specialized needs.
Apache Kafka as Event Streaming Platform for Microservice ArchitecturesKai Wähner
This session introduces Apache Kafka, an event-driven open source streaming platform. Apache Kafka goes far beyond scalable, high volume messaging. In addition, you can leverage Kafka Connect for integration and the Kafka Streams API for building lightweight stream processing microservices in autonomous teams. The Confluent Platform adds further components such as a Schema Registry, REST Proxy, KSQL, Clients for different programming languages and Connectors for different technologies.
The session discusses how tech giants like LinkedIn, Ebay or Airbnb leverage Apache Kafka as event streaming platform to solve various different business problems and how to create a scalable, flexible microservice architecture. A live demo shows how you can easily process and analyze streams of events using Apache Kafka and KSQL.
Amazon EMR enables fast processing of large structured or unstructured datasets, and in this presentation we'll show you how to setup an Amazon EMR job flow to analyse application logs, and perform Hive queries against it. We also review best practices around data file organisation on Amazon Simple Storage Service (S3), how clusters can be started from the AWS web console and command line, and how to monitor the status of a Map/Reduce job.
Finally we take a look at Hadoop ecosystem tools you can use with Amazon EMR and the additional features of the service.
See a recording of the webinar based on this presentation on YouTube here:
Check out the rest of the Masterclass webinars for 2015 here: http://aws.amazon.com/campaigns/emea/masterclass/
See the Journey Through the Cloud webinar series here: http://aws.amazon.com/campaigns/emea/journey/
This is my presentation at SQLBits 8, Brighton, 9th April 2011. This session is about advanced dimensional modelling topics such as Fact Table Primary Key, Vertical Fact Tables, Aggregate Fact Tables, SCD Type 6, Snapshotting Transaction Fact Tables, 1 or 2 Dimensions, Dealing with Currency Rates, When to Snowflake, Dimensions with Multi Valued Attributes, Transaction-Level Dimensions, Very Large Dimensions, A Dimension With Only 1 Attribute, Rapidly Changing Dimensions, Banding Dimension Rows, Stamping Dimension Rows and Real Time Fact Table. Prerequisites: You need have a basic knowledge of dimensional modelling and relational database design.
My name is Vincent Rainardi. I am a data warehouse & BI architect. I wrote a book on SQL Server data warehousing & BI, as well as many articles on my blog, www.datawarehouse.org.uk. I welcome questions and discussions on data warehousing on vrainardi@gmail.com. Enjoy the presentation.
Apache Kafka in the Transportation and LogisticsKai Wähner
Event Streaming with Apache Kafka in the Transportation and Logistics.
Track & Trace, Real-time Locating System, Customer 360, Open API, and more…
Examples include Swiss Post, SBB, Deutsche Bahn, Hermes, Migros, Here Technologies, Otonomo, Lyft, Uber, Free Now, Lufthansa, Air France, Singapore Airlines, Amadeus Group, and more.
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
Kafka streams windowing behind the curtain confluent
Kafka Streams Windowing Behind the Curtain, Neil Buesing, Principal Solutions Architect, Rill
https://www.meetup.com/TwinCities-Apache-Kafka/events/279316299/
DevOps for Applications in Azure Databricks: Creating Continuous Integration ...Databricks
Working with our customers, developers and partners around the world, it's clear DevOps has become increasingly critical to a team's success. Continuous integration (CI) and continuous delivery (CD) which is part of DevOps, embody a culture, set of operating principles, and collection of practices that enable application development teams to deliver code changes more frequently and reliably. In this session, we will cover how you can automate your entire process from code commit to production using CI/CD pipelines in Azure DevOps for Azure Databricks applications. Using CI/CD practices, you can simplify, speed and improve your cloud development to deliver features to your customers as soon as they're ready.
Training for AWS Solutions Architect at http://zekelabs.com/courses/amazon-web-services-training-bangalore/.This slide describes about cloud trail key concepts, workflow and event history
___________________________________________________
zekeLabs is a Technology training platform. We provide instructor led corporate training and classroom training on Industry relevant Cutting Edge Technologies like Big Data, Machine Learning, Natural Language Processing, Artificial Intelligence, Data Science, Amazon Web Services, DevOps, Cloud Computing and Frameworks like Django,Spring, Ruby on Rails, Angular 2 and many more to Professionals.
Reach out to us at www.zekelabs.com or call us at +91 8095465880 or drop a mail at info@zekelabs.com
[WSO2Con Asia 2018] Patterns for Building Streaming AppsWSO2
This slide deck explains how to enable digital transformation through streaming analytics and how easily streaming applications can be implemented
Learn more: https://wso2.com/library/conference/2018/08/wso2con-asia-2018-patterns-for-building-streaming-apps/
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital EnterpriseWSO2
The WSO2 analytics platform provides a high performance, lean, enterprise-ready, streaming solution to solve data integration and analytics challenges faced by connected businesses. This platform offers real-time, interactive, machine learning and batch processing technologies that empower enterprises to build a digital business. This session explores how to enable digital transformation by building a data analytics platform.
This slide deck explores WSO2 Stream Processor’s new features and improvements and explain how they make an organization excel in the current competitive marketplace.
Today’s highly connected world is flooding businesses with big and fast-moving data. The ability to trawl this data ocean and identify actionable insights can deliver a competitive advantage to any organization. The WSO2 Analytics Platform enables businesses to do just that by providing batch, real-time, interactive and predictive analysis capabilities all in one place.
In this tutorial we will
* Plug in the WSO2 Analytics Platform to some common business use cases
* Showcase the numerous capabilities of the platform
* Demonstrate how to collect data, analyze, predict and communicate effectively
* Demonstrate how it can analyze integration, security and IoT scenarios
Stick around till the end and you will walk away with the necessary skills to create a winning data strategy for your organization to stay ahead of its competition.
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisAmazon Web Services
Amazon Kinesis is a fully managed service for real-time processing of streaming data at massive scale. Amazon Kinesis can collect and process hundreds of terabytes of data per hour from hundreds of thousands of sources, allowing you to easily write applications that process information in real-time, from sources such as web site click-streams, marketing and financial information, manufacturing instrumentation and social media, and operational logs and metering data.
Reasons to attend:
- This session, will provide you with an overview of Amazon Kinesis.
- Learn about sample use cases and real life case studies.
- Learn how Amazon Kinesis can be integrated into your own applications.
Characteristics of cloud native apps.
Problems in implementing event-driven stateful applications.
Siddhi: Cloud-Native Stream Processor.
Patterns of implementing event-driven applications.
Deploying event-driven applications on Kubernetes with Siddhi and NATS.
Logisland is an event mining OpenSource platform based on Kafka/spark to handle huge amount of event, temporal data to find pattern, detect correlation. Useful for log mining in security, fraud detection, IoT, performance & system supervision
Implementing and Visualizing Clickstream data with MongoDBMongoDB
Having recently implemented a new framework for the real-time collection, aggregation and visualization of web and mobile generated Clickstream traffic (realizing daily click-stream volumes of 1M+ events), this walkthrough is about the motivations, throughout-process and key decisions made, as well as an in depth look at the implementation of how to buildout a data-collection, analytics and visualization framework using MongoDB. Technologies covered in this presentation (as well as MongoDB) are Java, Spring, Django and Pymongo.
#JaxLondon: Building microservices with Scala, functional domain models and S...Chris Richardson
In this talk you will learn about a modern way of designing applications that’s very different from the traditional approach of building monolithic applications that persist mutable domain objects in a relational database.We will talk about the microservice architecture, it’s benefits and drawbacks and how Spring Boot can help. You will learn about implementing business logic using functional, immutable domain models written in Scala. We will describe event sourcing and how it’s an extremely useful persistence mechanism for persisting functional domain objects in a microservices architecture.
Amazon Kinesis is a fully managed service for real-time processing of streaming data at massive scale. Amazon Kinesis can collect and process hundreds of terabytes of data per hour from hundreds of thousands of sources, allowing you to easily write applications that process information in real-time, from sources such as web site click-streams, marketing and financial information, manufacturing instrumentation and social media, and operational logs and metering data.
This introductory webinar, presented by Adi Krishnan, Senior Product Manager for Amazon Kinesis, will provide you with an overview of the service, sample use cases, and some examples of customer experiences with the service so you can better understand its capabilities and see how it might be integrated into your own applications.
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
[WSO2Con USA 2018] Patterns for Building Streaming AppsWSO2
This slide deck explains how to enable digital transformation through streaming analytics and how easily streaming applications can be implemented.
Watch video: https://wso2.com/library/conference/2018/07/wso2con-usa-2018-patterns-for-building-streaming-apps/
Kalix: Tackling the The Cloud to Edge ContinuumJonas Bonér
Read this blog for an overview of Kalix:
https://www.kalix.io/blog/kalix-move-to-the-cloud-extend-to-the-edge-go-beyond
Abstract:
What will the future of the Cloud and Edge look like for us as developers? We have great infrastructure nowadays, but that only solves half of the problem. The Serverless developer experience shows the way, but it’s clear that FaaS is not the final answer. What we need is a programming model and developer UX that takes full advantage of new Cloud and Edge infrastructure, allowing us to build general-purpose applications, without needless complexity.
What if you only had to think about your business logic, public API, and how your domain data is structured, not worry about how to store and manage it? What if you could not only be serverless but become “databaseless” and forget about databases, storage APIs, and message brokers?
Instead, what if your data just existed wherever it needed to be, co-located with the service and its user, at the edge, in the cloud, or in your own private network—always there and available, always correct and consistent? Where the data is injected into your services on an as-needed basis, automatically, timely, efficiently, and intelligently.
Services, powered with this “data plane” of application state—attached to and available throughout the network—can run anywhere in the world: from the public Cloud to 10,000s of PoPs out at the Edge of the network, in close physical approximation to its users, where the co-location of state, processing, and end-user, ensures ultra-low latency and high throughput.
Sounds exciting? Let me show you how we are making this vision a reality building a distributed real-time Data Plane PaaS using technologies like Akka, Kubernetes, gRPC, Linkerd, and more.
Organizations that can make sense out of massive amounts of data produced by systems, customers, or partners will have a competitive edge. Ballerina Stream Processing provides real-time event stream processing capabilities to microservices, with intuitive SQL queries allowing users to filter, aggregate and correlate data to make sense, take decisions and act in real-time in a distributed manner.
In this talk, we will discuss the following:
* Ballerina’s Stream Processing capability.
* How can it be used for real-time decision making?
* Building highly scalable data pipelines with data processing at the edge.
* Building event-driven architecture with stream processing.
* The roadmap.
What's streaming processing? The evolution of streaming SQL. It's advantages & challenges, and how we can overcome them. Presented at WSO2 Con 2018 USA
Organizational success depends on our ability to sense the environment, grab opportunities and eliminate threats that are present in real-time. Such real-time processing is now available to all organizations (with or without a big data background) through the new WSO2 Stream Processor.
This slides presents WSO2 Stream Processor’s new features and improvements and explains how they make an organization excel in the current competitive marketplace. Some key features we will consider are:
* WSO2 Stream Processor’s highly productive developer environment, with graphical drag-and-drop, and the Streaming SQL query editor
* The ability to process real-time queries that span from seconds to years
* Its interactive visualization and dashboarding features with improved widget generation
* Its ability to processing at scale via distributed deployments with full observability
* Default support for HTTP analytics, distributed message trace analytics, and Twitter analytics
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
2. Adaptation of Microservices for Stream Processing
Organizations try to:
1. Port legacy big data solutions to the cloud.
- Not designed to work in microservice architecture.
- Massive (need 5 - 6 large nodes).
- Needs multiple tools for integration and analytics.
2. Build microservices using on-premise solutions such as
Siddhi or Esper library.
- Do not support scalability and fault tolerance.
3. Introducing A Cloud Native Stream Processor
● Lightweight (low memory footprint and quick startup)
● 100% open source (no commercial features) with Apache License v2.
● Native support for Docker and Kubernetes.
● Support agile devops workflow and full CI/CD pipeline.
● Allow event processing logics to be written in SQL like query language and
via graphical tool.
● Single tool for data collection, injection, processing, analyzing, integrating
(with services and databases), and to manage notifications
5. Key features
● Native distributed deployment in Kubernetes
● Native CDC support for Oracle, MySQL, MSSQL, Postgres
● Long running aggregations from sec to years
● Complex pattern detection
● Online machine learning
● Synchronous decision making
● DB integration with caching
● Service integration with error handling
● Multiple built in connectors (file, Kafka, NATS, gRPC, ...)
6. Scenarios and Use Cases Supported by Siddhi
1. Realtime Policy Enforcement Engine
2. Notification Management
3. Streaming Data Integration
4. Fraud Detection
5. Stream Processing at Scale on Kubernetes
6. Embedded Decision Making
7. Monitoring and Time Series Data Analytics
8. IoT, Geo and Edge Analytics
9. Realtime Decision as a Service
10. Realtime Predictions With Machine Learning
Find out more about the Supported Siddhi Scenarios here
8. Success Stories
Experian makes real-time marketing channel decisions under 200
milliseconds using Siddhi.
Eurecat built their next generation shopping experience by
integrating iBeacons and IoT devices with WSO2.
Cleveland Clinic and Hospital Corporation of America detects critical
patient conditions and alert nurses and also automates decisions
during emergencies.
BNY Mellon uses Siddhi as a notification management engine.
9. Success Stories ...
a
TFL used WSO2 real time streaming to create next generation
transport systems.
Uber detected fraud in real time, processing over 400K events per
second
Use Siddhi as its APIM management throttling engine and as the
policy manager for its Identify and Access Platform
EBay and PayPal uses Siddhi as part of Apache Eagle to as a policy
enforcement engine.
10. Image Area
Working with Siddhi
● Develop apps using Siddhi Editor.
● CI/CD with build integration and
Siddhi Test Framework.
● Running modes
○ Emadded in Java/Python apps.
○ Microservice in bare metal/VM.
○ Microservice in Docker.
○ Microservice in Kubernetes
(distributed deployment with NATS)
11. Streaming SQL
@app:name('Alert-Processor')
@source(type='kafka', ..., @map(type='json'))
define stream TemperatureStream (roomNo string, temp double);
@info(name='AlertQuery')
from TemperatureStream#window.time(5 min)
select roomNo, avg(temp) as avgTemp
group by roomNo
insert into AvgTemperatureStream;
Source/Sink & Streams
Window Query
with Rate Limiting
15. Reference CI/CD Pipeline of Siddhi
https://medium.com/siddhi-io/building-an-efficient-ci-cd-pipeline-for-siddhi-c33150721b5d
16. Full size Image Area with text
Supported Data Processing Patterns
17. Supported Data Processing Patterns
1. Consume and publish events with various data formats.
2. Data filtering and preprocessing.
3. Date transformation.
4. Database integration and caching.
5. Service integration and error handling.
6. Data Summarization.
7. Rule processing.
8. Serving online and predefined ML models.
9. Scatter-gather and data pipelining.
10. Realtime decisions as a service (On-demand processing).
18. Image Area
Scenario: Order Processing
Customers place orders.
Shipments are made.
Customers pay for the order.
Tasks:
● Process order fulfillment.
● Alerts sent on abnormal conditions.
● Send recommendations.
● Throttle order requests when limit
exceeded.
● Provide order analytics over time.
19. Full size Image Area with text
Consume and Publish Events With
Various Data Formats
20. Consume and Publish Events With Various Data Formats
Supported transports
● NATS, Kafka, RabbitMQ, JMS, IBMMQ, MQTT
● Amazon SQS, Google Pub/Sub
● HTTP, gRPC, TCP, Email, WebSocket,
● Change Data Capture (CDC)
● File, S3, Google Cloud Storage
Supported data formats
● JSON, XML, Avro, Protobuf, Text, Binary, Key-value, CSV
25. Date Transformation
Data extraction
● JSON, Text
Reconstruct messages
● JSON, Text
Inline operations
● Math, Logical operations
Inbuilt functions
● 60+ extensions
Custom functions
● Java, JS
json:getDouble(json,"$.amount") as amount
str:concat(‘Hello ’,name) as greeting
amount * price as cost
time:extract('DAY', datetime) as day
myFunction(item, price) as discount
26. Full size Image Area with text
Database Integration and Caching
28. In-memory Table
Joining stream with a table.
define stream CleansedOrderStream
(custId string, item string, amount int);
@primaryKey(‘name’)
@index(‘unitPrice’)
define table ItemPriceTable (name string, unitPrice double);
from CleansedOrderStream as O join ItemPriceTable as T
on O.item == T.name
select O.custId, O.item, O.amount * T.unitPrice as price
insert into EnrichedOrderStream;
In-memory Table
Join Query
29. Database Integration
Joining stream and table.
define stream CleansedOrderStream
(custId string, item string, amount int);
@store(type=‘rdbms’, …,)
@primaryKey(‘name’)
@index(‘unitPrice’)
define table ItemPriceTable(name string, unitPrice double);
from CleansedOrderStream as O join ItemPriceTable as T
on O.item == T.name
select O.custId, O.item, O.amount * T.unitPrice as price
insert into EnrichedOrderStream;
Table backed with DB
Join Query
30. Database Caching
Joining table with cache (preloads data for high read performance).
define stream CleansedOrderStream
(custId string, item string, amount int);
@store(type=‘rdbms’, …, @cache(cache.policy=‘LRU’, … ))
@primaryKey(‘name’)
@index(‘unitPrice’)
define table ItemPriceTable(name string, unitPrice double);
from CleansedOrderStream as O join ItemPriceTable as T
on O.item == T.name
select O.custId, O.item, O.amount * T.unitPrice as price
insert into EnrichedOrderStream;
Table with Cache
Join Query
31. Full size Image Area with text
Service Integration and Error Handling
32. Enriching data with HTTP and gRPC service Calls
● Non blocking
● Handle responses based on status
codes
Service Integration and Error Handling
200
4**
Handle response based on
status code
33. SQL for HTTP Service Integration
Calling external HTTP service and consuming the response.
@sink(type='http-call', publisher.url="http://mystore.com/discount",
sink.id="discount", @map(type='json'))
define stream EnrichedOrderStream (custId string, item string, price double);
@source(type='http-call-response', http.status.code="200",
sink.id="discount", @map(type='json',
@attributes(custId ="trp:custId", ..., price="$.discountedPrice")))
define stream DiscountedOrderStream (custId string, item string, price double);
Call service
Consume Response
34. Error Handling Options
Options when endpoint is not available.
● Log and drop the events
● Wait and back pressure until the service becomes available
● Divert events to another stream for error handling.
In all cases system continuously retries for reconnection.
35. Events Diverted Into Error Stream
@onError(action='stream')
@sink(type='http', publisher.url = 'http://localhost:8080/logger',
on.error='stream', @map(type = 'json'))
define stream DiscountedOrderStream (custId string, item string, price double);
from !DiscountedOrderStream
select custId, item, price, _error
insert into FailedEventsTable;
Diverting connection failure
events into table.
37. Data Summarization
Type of data summarization
● Time based
○ Sliding time window
○ Tumbling time window
○ On time granularities (secs to years)
● Event count based
○ Sliding length window
○ Tumbling length window
● Session based
● Frequency based
Type of aggregations
● Sum
● Count
● Avg
● Min
● Max
● DistinctCount
● StdDev
38. Summarizing Data Over Shorter Period Of Time
Use window query to aggregate orders over time for each customer.
define stream DiscountedOrderStream (custId string, item string, price double);
from DiscountedOrderStream#window.time(10 min)
select custId, sum(price) as totalPrice
group by custId
insert into AlertStream;
Window query
with aggregation and
rate limiting
39. Aggregation Over Multiple Time Granularities
Aggregation on every second, minute, hour, … , year
Built using 𝝀 architecture
● In-memory real-time data
● RDBMs based historical data
define aggregation OrderAggregation
from OrderStream
select custId, itemId, sum(price) as total, avg(price) as avgPrice
group by custId, itemId
aggregate every sec ... year;
Query
Speed Layer &
Serving Layer
Batch Layer
40. Data Retrieval from Aggregations
Query to retrieve data for relevant time interval and granularity.
Data being retrieved both from memory and DB with milliseconds accuracy.
from OrderAggregation
within "2019-10-06 00:00:00",
"2019-10-30 00:00:00"
per "days"
select total as orders;
42. Rule Processing
Type of predefined rules
● Rules on single event
○ Filter, If-then-else, Match, etc.
● Rules on collection of events
○ Summarization
○ Join with window or table
● Rules based on event occurrence order
○ Pattern detection
○ Trend (sequence) detection
○ Non-occurrence of event
43. Alert Based On Event Occurrence Order
Use pattern query to detect event occurrence order and non occurrence.
define stream OrderStream (custId string, orderId string, ...);
define stream PaymentStream (orderId string, ...);
from every (e1=OrderStream) ->
not PaymentStream[e1.orderId==orderId] for 15 min
select e1.custId, e1.orderId, ...
insert into PaymentDelayedStream;
Non occurrence of event
44. Full size Image Area with text
Serving Online and Predefined
ML Models
45. Serving Online and Predefined ML Models
Type of Machine Learning and Artificial Intelligence processing
● Anomaly detection
○ Markov model
● Serving pre-created ML models
○ PMML (build from Python, R, Spark, H2O.ai, etc)
○ TensorFlow
● Online machine learning
○ Clustering
○ Classification
○ Regression from OrderStream
#pmml:predict(“/home/user/ml.model”,custId, itemId)
insert into RecommendationStream;
Find recommendations
46. Full size Image Area with text
Scatter-gather and Data Pipelining
47. Scatter-gather and Data Pipelining
Divide into sub-elements, process each and combine the results
Example :
json:tokenize() -> process -> window.batch() -> json:group()
str:tokenize() -> process -> window.batch() -> str:groupConcat()
{x,x,x} {x},{x},{x} {y},{y},{y} {y,y,y}
48. ● Create a Siddhi App per use case (Collection of queries).
● Connect multiple Siddhi Apps using in-memory source and sink.
● Allow rules addition and deletion at runtime.
Modularization
Siddhi Runtime
Siddhi App
for data capture
and preprocessing
Siddhi Apps
for each use case
Siddhi App
for common data
publishing logic
49. Periodically Trigger Events
Periodic events can be generated to initialize data pipelines
● Time interval
● Cron expression
● At start
define trigger FiveMinTrigger at every 5 min;
define trigger WorkStartTrigger at '0 15 10 ? * MON-FRI';
define trigger InitTrigger at 'start';
50. Full size Image Area with text
Realtime Decisions As A Service
51. Realtime Decisions As A Service
Query Data Stores using REST APIs
● Database backed stores (RDBMS, NoSQL)
● Named aggregations
● In-memory windows & tables
Call HTTP and gRPC service using REST APIs
● Use Service and Service-Response
loopbacks
● Process Siddhi query chain
and send response synchronously
53. Deployment of Scalable Stateful Apps
● Data kept in memory.
● Perform periodic
state snapshots
and replay data from
NATS.
● Scalability is achieved
partitioning data by
key.