This document discusses using Cadence workflows, Apache Kafka, and TensorFlow to orchestrate drone deliveries and perform machine learning on streaming delivery data.
Part 1 describes how Cadence workflows can be used to orchestrate drone delivery tasks and coordinate with order workflows, using Kafka for asynchronous communication. Part 2 discusses using Kafka streams to aggregate delivery data and send it to TensorFlow for incremental machine learning to predict busy shops in real-time. The document provides examples of how these technologies can be integrated and addresses challenges of streaming machine learning.
Spinning your Drones with Cadence Workflows, Apache Kafka, and TensforFlowPaulBrebner2
This talk introduces a realistic real-time geospatial drone delivery demonstration application using two complementary open-source technologies working at different timescales – Uber’s Cadence (for long-running/scheduled workflows) and Apache Kafka (for fast streaming data). With up to 2,000 (simulated) drones and deliveries in progress at once this application generates a vast flow of spatiotemporal data. The second part of the talk uses this platform and data to explore Machine Learning over streaming data from Kafka. We will explore some of the challenges and solutions of incremental Machine Learning over fast-moving data, particularly when the data is spatiotemporal with concept drifts, and reveal how well a selection of open-source Kafka Machine Learning frameworks perform and scale at this demanding task.
Talk in the Geospatial track at Community over Code Halifax 2023 https://communityovercode.org/schedule-list/#GEO006
Spinning your Drones with Cadence Workflows and Apache KafkaPaul Brebner
The rapid rise in Big Data use cases over the last decade has been accelerated by popular massively scalable open-source technologies such as Apache Cassandra® for storage, Apache Kafka® for streaming, and OpenSearch® for search. Now there’s a new member of the peloton, Cadence, for orchestration - code-based scalable fault-tolerant workflow orchestration. To illustrate the most important Cadence concepts (and more) we’ll build a realistic drone delivery service demonstration application. We’ll also explore what happens when orchestration meets choreography, and use the drone application to illustrate different ways to integrate Cadence with Apache Kafka, including reusing Kafka microservices. But how scalable is Cadence in practice? We’ll fill the sky with drones - how many drones can we get flying at once?
apidays Australia 2022 - Spinning Your Drones with Cadence Workflows and Apac...apidays
apidays Australia 2022 - Enabling Business Networks
September 14 & 15, 2022
Spinning Your Drones with Cadence Workflows and Apache Kafka
Paul Brebner, Technology Evangelist at Instaclustr
------------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
Deep dive into the API industry with our reports:
https://www.apidays.global/industry-reports/
Subscribe to our global newsletter:
https://apidays.typeform.com/to/i1MPEW
Fast Open Source Software - Without The FuryPaulBrebner2
Developing fast scalable Big Data applications has been made significantly easier over the last decade with horizontally scalable open-source databases and streaming technologies such as Apache Cassandra and Apache Kafka. Cloud-native trends have also accelerated the uptake and ease of use of these technologies, and they are available as managed services on multiple cloud platforms.
But maybe it has become too easy to embark on building complex distributed applications using multiple massively scalable open-source technologies, as there are still many performance and scalability issues to be aware of. In this talk, I will give a high-level overview of some of the performance and scalability challenges I’ve overcome over the last six years building realistic demonstration applications using Apache Cassandra and Apache Kafka (and more), supplemented with performance insights from our operation of thousands of production clusters. Keynote talk for the Performance Engineering track at Community Over Code Halifax 2023, https://communityovercode.org/schedule-list/#PE001
A GitOps model for High Availability and Disaster Recovery on EKSWeaveworks
Enterprises today require high availability and disaster recovery for critical business systems. One of the advantages Kubernetes can bring to the table is greater reliability and stability. When disaster strikes, cluster or application recovery should be quick and dependable.
Paul Curtis, Principal Solutions Architect at Weaveworks will demonstrate how to leverage Weave Kubernetes Platform and GitOps to create disaster recovery plans and highly available clusters with minimal effort on EKS.
In this webinar you will learn:
The 4 principles of GitOps (operations by pull request)
How to build for reproducibility, security and scale with EKS from the start
GitOps driven cluster and cluster lifecycle management with WKP
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Paul Brebner
With the rapid onset of the global Covid-19 Pandemic from the start of this year the USA Centers for Disease Control and Prevention (CDC) had to quickly implement a new Covid-19 specific pipeline to collect testing data from all of the USA’s states and territories, and carry out other critical steps including integration, cleaning, checking, enrichment, analysis, and enforcing data governance and privacy etc. The pipeline then produces multiple consumable results for federal and public agencies. They did this in under 30 days, using Apache Kafka. In this presentation we'll build a similar (but simpler) pipeline for ingesting, integrating, indexing, searching/analysing and visualising some publicly available tidal data. We'll briefly introduce each technology and component, and walk through the steps of using Apache Kafka, Kafka Connect, Elasticsearch and Kibana to build the pipeline and visualise the results.
Modern DevOps with Spinnaker/Concourse and MicrometerJesse Tate Pulfer
Learn how you can leverage the recent addition of Micrometer to the Spring ecosystem and Cloud Foundry to the Spinnaker ecosystem to help you deliver code quickly and safely. Some highlights include:
Micrometer’s real-time application monitoring capabilities
Spinnaker’s visibility into what is going on in the system
Spinnaker’s safety of deployments and rollbacks
Spinnaker’s deployment scalability
Observability & Continuous Deployment, The Big Picture with Adib Saikali
10:15-11:00am Micrometer: Four Key Performance Indicators for Every Java Service with Jon Schneider
11:00-11:15am Micrometer & PCF with Victor Szoltysek
11:15-11:25am Break
11:25-12:15pm Spinnaker 101 with Olga Kundzich
12:15-12:45pm Lunch
12:45-1:30pm Concourse 101: Container Based CI with Concourse with Jamil Shamy
1:30-1:40pm Break
1:40-2:40pm Putting all the tools together Continuous Deployment with Concourse / Spinnaker / Micrometer & PCF with Jon Schneider
2:40-2:50pm Break
2:50-3:30pm Panel with Concourse / Spinnaker & Micrometer Team
3:30pm Wrap Up
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Paul Brebner
Modern event-based/streaming distributed systems embrace the idea that change is inevitable and actually desirable! Without being change-aware, systems are inflexible, can’t evolve or react, and are simply incapable of keeping up with real-time real-world data. But how can we speed up an “Elephant” (PostgreSQL) to be as fast as a “Cheetah” (Kafka)? In this talk, we'll introduce the Debezium PostgreSQL Connector, and explain how to deploy, configure and run it on a Kafka Connect cluster, explore the semantics and format of the change data events (including Schemas and Table/Topic mapping), and test the performance. Finally, we'll show how to stream the change data events into an example downstream system, Elasticsearch, using an open source sink connector.
Presentation for PostgresConf.CN and PGConf.Asia 2021 https://www.highgo.ca/2022/01/19/2021-pg-asia-conference-delivered-another-successful-online-conference-again/
Spinning your Drones with Cadence Workflows, Apache Kafka, and TensforFlowPaulBrebner2
This talk introduces a realistic real-time geospatial drone delivery demonstration application using two complementary open-source technologies working at different timescales – Uber’s Cadence (for long-running/scheduled workflows) and Apache Kafka (for fast streaming data). With up to 2,000 (simulated) drones and deliveries in progress at once this application generates a vast flow of spatiotemporal data. The second part of the talk uses this platform and data to explore Machine Learning over streaming data from Kafka. We will explore some of the challenges and solutions of incremental Machine Learning over fast-moving data, particularly when the data is spatiotemporal with concept drifts, and reveal how well a selection of open-source Kafka Machine Learning frameworks perform and scale at this demanding task.
Talk in the Geospatial track at Community over Code Halifax 2023 https://communityovercode.org/schedule-list/#GEO006
Spinning your Drones with Cadence Workflows and Apache KafkaPaul Brebner
The rapid rise in Big Data use cases over the last decade has been accelerated by popular massively scalable open-source technologies such as Apache Cassandra® for storage, Apache Kafka® for streaming, and OpenSearch® for search. Now there’s a new member of the peloton, Cadence, for orchestration - code-based scalable fault-tolerant workflow orchestration. To illustrate the most important Cadence concepts (and more) we’ll build a realistic drone delivery service demonstration application. We’ll also explore what happens when orchestration meets choreography, and use the drone application to illustrate different ways to integrate Cadence with Apache Kafka, including reusing Kafka microservices. But how scalable is Cadence in practice? We’ll fill the sky with drones - how many drones can we get flying at once?
apidays Australia 2022 - Spinning Your Drones with Cadence Workflows and Apac...apidays
apidays Australia 2022 - Enabling Business Networks
September 14 & 15, 2022
Spinning Your Drones with Cadence Workflows and Apache Kafka
Paul Brebner, Technology Evangelist at Instaclustr
------------
Check out our conferences at https://www.apidays.global/
Do you want to sponsor or talk at one of our conferences?
https://apidays.typeform.com/to/ILJeAaV8
Learn more on APIscene, the global media made by the community for the community:
https://www.apiscene.io
Explore the API ecosystem with the API Landscape:
https://apilandscape.apiscene.io/
Deep dive into the API industry with our reports:
https://www.apidays.global/industry-reports/
Subscribe to our global newsletter:
https://apidays.typeform.com/to/i1MPEW
Fast Open Source Software - Without The FuryPaulBrebner2
Developing fast scalable Big Data applications has been made significantly easier over the last decade with horizontally scalable open-source databases and streaming technologies such as Apache Cassandra and Apache Kafka. Cloud-native trends have also accelerated the uptake and ease of use of these technologies, and they are available as managed services on multiple cloud platforms.
But maybe it has become too easy to embark on building complex distributed applications using multiple massively scalable open-source technologies, as there are still many performance and scalability issues to be aware of. In this talk, I will give a high-level overview of some of the performance and scalability challenges I’ve overcome over the last six years building realistic demonstration applications using Apache Cassandra and Apache Kafka (and more), supplemented with performance insights from our operation of thousands of production clusters. Keynote talk for the Performance Engineering track at Community Over Code Halifax 2023, https://communityovercode.org/schedule-list/#PE001
A GitOps model for High Availability and Disaster Recovery on EKSWeaveworks
Enterprises today require high availability and disaster recovery for critical business systems. One of the advantages Kubernetes can bring to the table is greater reliability and stability. When disaster strikes, cluster or application recovery should be quick and dependable.
Paul Curtis, Principal Solutions Architect at Weaveworks will demonstrate how to leverage Weave Kubernetes Platform and GitOps to create disaster recovery plans and highly available clusters with minimal effort on EKS.
In this webinar you will learn:
The 4 principles of GitOps (operations by pull request)
How to build for reproducibility, security and scale with EKS from the start
GitOps driven cluster and cluster lifecycle management with WKP
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Paul Brebner
With the rapid onset of the global Covid-19 Pandemic from the start of this year the USA Centers for Disease Control and Prevention (CDC) had to quickly implement a new Covid-19 specific pipeline to collect testing data from all of the USA’s states and territories, and carry out other critical steps including integration, cleaning, checking, enrichment, analysis, and enforcing data governance and privacy etc. The pipeline then produces multiple consumable results for federal and public agencies. They did this in under 30 days, using Apache Kafka. In this presentation we'll build a similar (but simpler) pipeline for ingesting, integrating, indexing, searching/analysing and visualising some publicly available tidal data. We'll briefly introduce each technology and component, and walk through the steps of using Apache Kafka, Kafka Connect, Elasticsearch and Kibana to build the pipeline and visualise the results.
Modern DevOps with Spinnaker/Concourse and MicrometerJesse Tate Pulfer
Learn how you can leverage the recent addition of Micrometer to the Spring ecosystem and Cloud Foundry to the Spinnaker ecosystem to help you deliver code quickly and safely. Some highlights include:
Micrometer’s real-time application monitoring capabilities
Spinnaker’s visibility into what is going on in the system
Spinnaker’s safety of deployments and rollbacks
Spinnaker’s deployment scalability
Observability & Continuous Deployment, The Big Picture with Adib Saikali
10:15-11:00am Micrometer: Four Key Performance Indicators for Every Java Service with Jon Schneider
11:00-11:15am Micrometer & PCF with Victor Szoltysek
11:15-11:25am Break
11:25-12:15pm Spinnaker 101 with Olga Kundzich
12:15-12:45pm Lunch
12:45-1:30pm Concourse 101: Container Based CI with Concourse with Jamil Shamy
1:30-1:40pm Break
1:40-2:40pm Putting all the tools together Continuous Deployment with Concourse / Spinnaker / Micrometer & PCF with Jon Schneider
2:40-2:50pm Break
2:50-3:30pm Panel with Concourse / Spinnaker & Micrometer Team
3:30pm Wrap Up
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Paul Brebner
Modern event-based/streaming distributed systems embrace the idea that change is inevitable and actually desirable! Without being change-aware, systems are inflexible, can’t evolve or react, and are simply incapable of keeping up with real-time real-world data. But how can we speed up an “Elephant” (PostgreSQL) to be as fast as a “Cheetah” (Kafka)? In this talk, we'll introduce the Debezium PostgreSQL Connector, and explain how to deploy, configure and run it on a Kafka Connect cluster, explore the semantics and format of the change data events (including Schemas and Table/Topic mapping), and test the performance. Finally, we'll show how to stream the change data events into an example downstream system, Elasticsearch, using an open source sink connector.
Presentation for PostgresConf.CN and PGConf.Asia 2021 https://www.highgo.ca/2022/01/19/2021-pg-asia-conference-delivered-another-successful-online-conference-again/
Day 2 Kubernetes - Tools for Operability (KubeCon)bridgetkromhout
The document discusses tools for operating Kubernetes clusters, including Terraform for deploying clusters, Helm for managing applications, and Brigade and Kashti for event-driven scripting. It also covers Azure Kubernetes Service (AKS) for easily deploying and managing Kubernetes on Azure, and CNAB and Duffle for packaging distributed applications. Future topics that may be covered are changes in Helm 3 and using Virtual Kubelet to run containers across different infrastructure.
This document discusses microservices architecture using Spring Cloud and related technologies. It provides an overview of microservices and cloud native applications. It then covers Spring Boot, Spring Cloud, and Netflix OSS projects that can be used to build microservices. Specific Spring Cloud features like service registration, circuit breakers, and API gateways are demonstrated. The role of Pivotal in contributing to open source projects and providing Spring Cloud services is also mentioned.
SpringBoot and Spring Cloud Service for MSAOracle Korea
Cloud 환경에서 MSA를 하기 위해서 Service Discovery, Circuit Breaker 등을 사용하여 Application을 개발하는 방법과 SpringBoot 와 Spring Cloud Service 를 사용하는데, Cloud에서 Kubernetes를 위시한 Container 생태계가 어떻게 MSA에 영향을 미치는지 알아봅니다.
Addressing data plane performance measurement on OpenStack clouds using VMTPSuhail Syed
VMTP is an open source tool released by Cisco that measures data plane performance on OpenStack clouds. It automates the process of measuring VM throughput, latency, and CPU usage for different network traffic flows like intra-tenant, inter-tenant, and north-south traffic. VMTP addresses the need for simple and automated performance measurement on Neutron, the OpenStack networking component, to help with the migration from Nova networking to Neutron. It generates results stored in MongoDB and can create charts from multiple test runs for easy analysis.
Vector Search / Generative AI introduction at Pulsar MeetupDevin Bost
Presentation at the Pulsar Meetup in Bangalore on the use of vector search and generative AI when integrated with streaming. Explores mechanics of how vector embeddings work, the backdrop of retrieval augmented generation, and how vector databases like AstraDB enable a rich generative experience for AI applications.
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC MeetupTimothy Spann
26Oct2023_ Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup.pdf
## Details
**Important**
Please complete your registration in this short form.
For on-site we have limited room, so please confirm if you are attending in-person in Manhattan, NYC.
--------------------------------------------------------------------------------------------
We're at StarTree, excited to join forces with our friends at Cloudera, for a meetup that is all about The Latest in Real-Time Analytics: Generative AI and LLM, featuring Apache Pinot and Apache NiFi.
Join us for an insightful discussion about cutting-edge analytics, meet the community in person, and catch up over drinks and snacks.
What's the plan ?
05:30-06:00 Pizza and Networking
06:00-06:35 Adding Generative AI to Real-Time Streaming Pipelines | Tim Spann, Principal Developer Advocate, Cloudera
06:35-07:10 Apache Pinot and Kafka an excellent pairing for refined palates | Tim Veil, VP of Solutions Engineering and Enablement, StarTree
07:10-07:20 QNA
07:20- 07:30 More Snacks and Networking ;)
**Important**
Seats are limited
Please complete your registration in this short form.
Adding Generative AI to Real-Time Streaming Pipelines | Timothy Spann
In this talk, Tim will discuss the basics of real-time streaming, walk through the tools used including Apache NiFi, Apache Kafka and Apache Flink and show how to build a real-time streaming pipeline that sends prompts to LLMs hosted by the likes of Hugging Face, IBM and Cloudera. He will also discuss where real-time data stores like Apache Pinot come into play.
He will show a detailed demonstration of a few use cases involving different sources of data including Kafka, Medium Articles and interactive Question and Response in Slack. He will then show you how you can build your own and where areas of growth exist.
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more.
https://github.com/tspannhw/SpeakerProfile
Apache Pinot and Kafka an excellent pairing for refined palates | Tim Veil
The other Tim, Tim Veil, will dive into the history and architect
Kubernetes is great for deploying stateless containers, but what about the big data ecosystem? Episode 3 of our Kubernetes series covers how DC/OS enables you to connect your Kubernetes-based applications to co-located big data services.
Slides cover:
1. Why persistence is challenging in distributed architectures
How DC/OS helps you take advantage of the services available in the big data ecosystem
2. How to connect Kubernetes to your data services through networking
3. How Apache Flink and Apache Spark work with Kubernetes to enable real-time data processing on DC/OS
Meetup: Streaming Data Pipeline DevelopmentTimothy Spann
The document discusses streaming data pipelines and includes information about:
- The FLaNK stack which is comprised of Apache NiFi, Apache Kafka, Apache Flink, and Java.
- SQL Stream Builder which allows developers, analysts, and data scientists to write streaming applications using standard SQL without needing to write Java or Scala code.
- Apache Kafka as a distributed, partitioned, and replicated publish-subscribe messaging system.
- Apache Flink which is a framework for distributed stream and batch data processing.
A Review: Metaheuristic Technique in Cloud ComputingIRJET Journal
This document reviews various meta-heuristic techniques that have been applied to problems in cloud computing, such as task scheduling, load balancing, ant colony optimization (ACO), particle swarm optimization (PSO), and gravitational search algorithm (GSA). It first provides background on cloud computing and defines common cloud computing concepts. It then surveys literature applying meta-heuristics like ACO, GA, PSO, and GSA to solve problems related to load balancing and scheduling in cloud environments. The document concludes that meta-heuristic techniques are effective for optimizing resource utilization and management in cloud computing systems.
Building Scalable Stateless Applications with RxJavaRick Warren
RxJava is a lightweight open-source library, originally from Netflix, that makes it easy to compose asynchronous data sources and operations. This presentation is a high-level intro to this library and how it can fit into your application.
Manage your kubernetes cluster with cluster api, azure and git opsJorge Arteiro
In this session we are going to Introduce Cluster API, a Kubernetes subproject that allows you to manage Kubernetes clusters lifecycle running anywhere using only Kubernetes YAML files. Let’s see how Azure Arc GitOps approach improves and simplify the day-2 operations of these clusters, where your Git repo is now the source of truth. Do you have problems managing identities and Network connection for your current CI/CD process? You don’t know how to manage multiple Kubernetes clusters in production? Then this talk is for you!
This document provides an overview of service mesh and Istio on Kubernetes. It discusses microservices and the need for visibility, monitoring, and traffic management which a service mesh can provide. It then describes Kubernetes, Istio architecture including Pilot, Envoy proxy, and Mixer components. It covers how Istio provides mutual TLS, ingress/egress traffic routing, request routing to service versions, observability with metrics and tracing, and application resilience through features like timeouts and retries. The document concludes with instructions for deploying Kubernetes and getting started with Istio.
This document discusses functional reactive programming and RxJava. It begins with an overview of functional reactive programming principles like being responsive, resilient, elastic and message-driven. It then covers architectural styles like hexagonal architecture and onion architecture. The rest of the document dives deeper into RxJava concepts like Observables, Observers, Operators, and Schedulers. It provides code examples to demonstrate merging, filtering and transforming streams of data asynchronously using RxJava.
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...HostedbyConfluent
Combining Apache Kafka and InfluxDB can provide a powerful data pipeline for processing and analyzing real-time data. Kafka can be used to ingest data from various sources and stream it to InfluxDB for storage and processing. InfluxDB can then be used to analyze and visualize the data, providing insights and actionable information in real-time. This architecture can be especially useful for IoT applications, where large volumes of sensor data are generated in real-time and need to be processed and analyzed quickly. InfluxDB now offers storage in a parquet file format built on top of the Apache Arrow project that allows for querying in SQL and integration to a larger variety of visualization and analysis tools that Kafka users can now take advantage of. This talk will go into connecting the two platforms and the why, how, and what you can accomplish by doing so.
CN Asturias - Stateful application for kubernetes Cédrick Lunven
The document discusses running Apache Cassandra on Kubernetes with K8ssandra. K8ssandra combines Kubernetes and Cassandra to provide a scalable data store with an API layer and administration tools. It addresses challenges of running stateful applications in containers by providing scaling, consistency and resilience. K8ssandra allows Cassandra to be deployed in a cloud-native way on Kubernetes and provides easy and secure data access.
Stephane Lapointe, Frank Boucher & Alexandre Brisebois: Les micro-services et...MSDEVMTL
16 Avril 2016
Groupe Azure
Sujet: Les micro-services et Azure Service Fabric
Conférenciers: Alexandre Brisebois, Microsoft, Stéphane Lapointe, Orckestra et Frank Boucher, Lixar IT
Nous vous proposons une journée complète sur les micro-services et Azure Service Fabric, le but étant d'appendre la théorie avec une série de présentations pour ensuite concrétiser le tout avec une partie pratique "hands-on" et des labs.
Pour participer, vous devrez obligatoirement apporter votre ordinateur portable, avoir installé Visual Studio 2015 Update 2 et Service Fabric SDK 2.0.135.
Building Continuous Application with Structured Streaming and Real-Time Data ...Databricks
This document summarizes a presentation about building a structured streaming connector for continuous applications using Azure Event Hubs as the streaming data source. It discusses key design considerations like representing offsets, implementing the getOffset and getBatch methods required by structured streaming sources, and challenges with testing asynchronous behavior. It also outlines issues contributed back to the Apache Spark community around streaming checkpoints and recovery.
Kubernetes Operability Tooling (GOTO Chicago 2019)bridgetkromhout
The document is a transcript of a presentation about containers, Kubernetes, and cloud native tooling. It discusses what containers are, the history and basics of Kubernetes, tools for managing Kubernetes clusters like Helm and Terraform, event-driven scripting with Brigade and Kashti, packaging apps with CNAB/Duffle/Porter, and a vision for the future of containers and cloud native technologies.
Meetup Streaming Data Pipeline DevelopmentTimothy Spann
Meetup Streaming Data Pipeline Development
28 June 2023 6pm EST
Milwaukee meetup
https://www.meetup.com/futureofdata-princeton/events/292976004/
Details
This will be a hybrid event with a Zoom. The in-person event will be in Milwaukee.
In this interactive session, Tim will lead participants through how to best build streaming data pipelines. He will cover how to build applications from some common use cases and highlight tips, tricks, best practices and patterns.
He will show how to build the easy way and then dive deep into the underlying open source technologies including Apache NiFi, Apache Flink, Apache Kafka and Apache Iceberg.
If you wish to follow along, please download open source projects beforehand. You can also download this helpful streaming platform: https://docs.cloudera.com/csp-ce/latest/installation/topics/csp-ce-installing-ce.html
All source code and slides will be shared for those interested in building their own FLaNK Apps. https://www.flankstack.dev/
https://www.thecapitalgrille.com/locations/wi/milwaukee/milwaukee/8027
The Capital Grille 310 W Wisconsin Ave, Milwaukee, WI 53203
limited seating, preference will be given to NLIT attendees
A peak at the menu (Not Pizza)
RISOTTO FRITTERS WITH FRESH MOZZARELLA AND PROSCIUTTO
SLICED SIRLOIN WITH ROQUEFORT AND BALSAMIC ONIONS
MINIATURE LOBSTER AND CRAB CAKES
WILD MUSHROOM AND HERBED CHEESE
You can join the meeting virtually here (no meat or cheese virtually):
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
More Related Content
Similar to Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Day 2 Kubernetes - Tools for Operability (KubeCon)bridgetkromhout
The document discusses tools for operating Kubernetes clusters, including Terraform for deploying clusters, Helm for managing applications, and Brigade and Kashti for event-driven scripting. It also covers Azure Kubernetes Service (AKS) for easily deploying and managing Kubernetes on Azure, and CNAB and Duffle for packaging distributed applications. Future topics that may be covered are changes in Helm 3 and using Virtual Kubelet to run containers across different infrastructure.
This document discusses microservices architecture using Spring Cloud and related technologies. It provides an overview of microservices and cloud native applications. It then covers Spring Boot, Spring Cloud, and Netflix OSS projects that can be used to build microservices. Specific Spring Cloud features like service registration, circuit breakers, and API gateways are demonstrated. The role of Pivotal in contributing to open source projects and providing Spring Cloud services is also mentioned.
SpringBoot and Spring Cloud Service for MSAOracle Korea
Cloud 환경에서 MSA를 하기 위해서 Service Discovery, Circuit Breaker 등을 사용하여 Application을 개발하는 방법과 SpringBoot 와 Spring Cloud Service 를 사용하는데, Cloud에서 Kubernetes를 위시한 Container 생태계가 어떻게 MSA에 영향을 미치는지 알아봅니다.
Addressing data plane performance measurement on OpenStack clouds using VMTPSuhail Syed
VMTP is an open source tool released by Cisco that measures data plane performance on OpenStack clouds. It automates the process of measuring VM throughput, latency, and CPU usage for different network traffic flows like intra-tenant, inter-tenant, and north-south traffic. VMTP addresses the need for simple and automated performance measurement on Neutron, the OpenStack networking component, to help with the migration from Nova networking to Neutron. It generates results stored in MongoDB and can create charts from multiple test runs for easy analysis.
Vector Search / Generative AI introduction at Pulsar MeetupDevin Bost
Presentation at the Pulsar Meetup in Bangalore on the use of vector search and generative AI when integrated with streaming. Explores mechanics of how vector embeddings work, the backdrop of retrieval augmented generation, and how vector databases like AstraDB enable a rich generative experience for AI applications.
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC MeetupTimothy Spann
26Oct2023_ Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup.pdf
## Details
**Important**
Please complete your registration in this short form.
For on-site we have limited room, so please confirm if you are attending in-person in Manhattan, NYC.
--------------------------------------------------------------------------------------------
We're at StarTree, excited to join forces with our friends at Cloudera, for a meetup that is all about The Latest in Real-Time Analytics: Generative AI and LLM, featuring Apache Pinot and Apache NiFi.
Join us for an insightful discussion about cutting-edge analytics, meet the community in person, and catch up over drinks and snacks.
What's the plan ?
05:30-06:00 Pizza and Networking
06:00-06:35 Adding Generative AI to Real-Time Streaming Pipelines | Tim Spann, Principal Developer Advocate, Cloudera
06:35-07:10 Apache Pinot and Kafka an excellent pairing for refined palates | Tim Veil, VP of Solutions Engineering and Enablement, StarTree
07:10-07:20 QNA
07:20- 07:30 More Snacks and Networking ;)
**Important**
Seats are limited
Please complete your registration in this short form.
Adding Generative AI to Real-Time Streaming Pipelines | Timothy Spann
In this talk, Tim will discuss the basics of real-time streaming, walk through the tools used including Apache NiFi, Apache Kafka and Apache Flink and show how to build a real-time streaming pipeline that sends prompts to LLMs hosted by the likes of Hugging Face, IBM and Cloudera. He will also discuss where real-time data stores like Apache Pinot come into play.
He will show a detailed demonstration of a few use cases involving different sources of data including Kafka, Medium Articles and interactive Question and Response in Slack. He will then show you how you can build your own and where areas of growth exist.
Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more.
https://github.com/tspannhw/SpeakerProfile
Apache Pinot and Kafka an excellent pairing for refined palates | Tim Veil
The other Tim, Tim Veil, will dive into the history and architect
Kubernetes is great for deploying stateless containers, but what about the big data ecosystem? Episode 3 of our Kubernetes series covers how DC/OS enables you to connect your Kubernetes-based applications to co-located big data services.
Slides cover:
1. Why persistence is challenging in distributed architectures
How DC/OS helps you take advantage of the services available in the big data ecosystem
2. How to connect Kubernetes to your data services through networking
3. How Apache Flink and Apache Spark work with Kubernetes to enable real-time data processing on DC/OS
Meetup: Streaming Data Pipeline DevelopmentTimothy Spann
The document discusses streaming data pipelines and includes information about:
- The FLaNK stack which is comprised of Apache NiFi, Apache Kafka, Apache Flink, and Java.
- SQL Stream Builder which allows developers, analysts, and data scientists to write streaming applications using standard SQL without needing to write Java or Scala code.
- Apache Kafka as a distributed, partitioned, and replicated publish-subscribe messaging system.
- Apache Flink which is a framework for distributed stream and batch data processing.
A Review: Metaheuristic Technique in Cloud ComputingIRJET Journal
This document reviews various meta-heuristic techniques that have been applied to problems in cloud computing, such as task scheduling, load balancing, ant colony optimization (ACO), particle swarm optimization (PSO), and gravitational search algorithm (GSA). It first provides background on cloud computing and defines common cloud computing concepts. It then surveys literature applying meta-heuristics like ACO, GA, PSO, and GSA to solve problems related to load balancing and scheduling in cloud environments. The document concludes that meta-heuristic techniques are effective for optimizing resource utilization and management in cloud computing systems.
Building Scalable Stateless Applications with RxJavaRick Warren
RxJava is a lightweight open-source library, originally from Netflix, that makes it easy to compose asynchronous data sources and operations. This presentation is a high-level intro to this library and how it can fit into your application.
Manage your kubernetes cluster with cluster api, azure and git opsJorge Arteiro
In this session we are going to Introduce Cluster API, a Kubernetes subproject that allows you to manage Kubernetes clusters lifecycle running anywhere using only Kubernetes YAML files. Let’s see how Azure Arc GitOps approach improves and simplify the day-2 operations of these clusters, where your Git repo is now the source of truth. Do you have problems managing identities and Network connection for your current CI/CD process? You don’t know how to manage multiple Kubernetes clusters in production? Then this talk is for you!
This document provides an overview of service mesh and Istio on Kubernetes. It discusses microservices and the need for visibility, monitoring, and traffic management which a service mesh can provide. It then describes Kubernetes, Istio architecture including Pilot, Envoy proxy, and Mixer components. It covers how Istio provides mutual TLS, ingress/egress traffic routing, request routing to service versions, observability with metrics and tracing, and application resilience through features like timeouts and retries. The document concludes with instructions for deploying Kubernetes and getting started with Istio.
This document discusses functional reactive programming and RxJava. It begins with an overview of functional reactive programming principles like being responsive, resilient, elastic and message-driven. It then covers architectural styles like hexagonal architecture and onion architecture. The rest of the document dives deeper into RxJava concepts like Observables, Observers, Operators, and Schedulers. It provides code examples to demonstrate merging, filtering and transforming streams of data asynchronously using RxJava.
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...HostedbyConfluent
Combining Apache Kafka and InfluxDB can provide a powerful data pipeline for processing and analyzing real-time data. Kafka can be used to ingest data from various sources and stream it to InfluxDB for storage and processing. InfluxDB can then be used to analyze and visualize the data, providing insights and actionable information in real-time. This architecture can be especially useful for IoT applications, where large volumes of sensor data are generated in real-time and need to be processed and analyzed quickly. InfluxDB now offers storage in a parquet file format built on top of the Apache Arrow project that allows for querying in SQL and integration to a larger variety of visualization and analysis tools that Kafka users can now take advantage of. This talk will go into connecting the two platforms and the why, how, and what you can accomplish by doing so.
CN Asturias - Stateful application for kubernetes Cédrick Lunven
The document discusses running Apache Cassandra on Kubernetes with K8ssandra. K8ssandra combines Kubernetes and Cassandra to provide a scalable data store with an API layer and administration tools. It addresses challenges of running stateful applications in containers by providing scaling, consistency and resilience. K8ssandra allows Cassandra to be deployed in a cloud-native way on Kubernetes and provides easy and secure data access.
Stephane Lapointe, Frank Boucher & Alexandre Brisebois: Les micro-services et...MSDEVMTL
16 Avril 2016
Groupe Azure
Sujet: Les micro-services et Azure Service Fabric
Conférenciers: Alexandre Brisebois, Microsoft, Stéphane Lapointe, Orckestra et Frank Boucher, Lixar IT
Nous vous proposons une journée complète sur les micro-services et Azure Service Fabric, le but étant d'appendre la théorie avec une série de présentations pour ensuite concrétiser le tout avec une partie pratique "hands-on" et des labs.
Pour participer, vous devrez obligatoirement apporter votre ordinateur portable, avoir installé Visual Studio 2015 Update 2 et Service Fabric SDK 2.0.135.
Building Continuous Application with Structured Streaming and Real-Time Data ...Databricks
This document summarizes a presentation about building a structured streaming connector for continuous applications using Azure Event Hubs as the streaming data source. It discusses key design considerations like representing offsets, implementing the getOffset and getBatch methods required by structured streaming sources, and challenges with testing asynchronous behavior. It also outlines issues contributed back to the Apache Spark community around streaming checkpoints and recovery.
Kubernetes Operability Tooling (GOTO Chicago 2019)bridgetkromhout
The document is a transcript of a presentation about containers, Kubernetes, and cloud native tooling. It discusses what containers are, the history and basics of Kubernetes, tools for managing Kubernetes clusters like Helm and Terraform, event-driven scripting with Brigade and Kashti, packaging apps with CNAB/Duffle/Porter, and a vision for the future of containers and cloud native technologies.
Meetup Streaming Data Pipeline DevelopmentTimothy Spann
Meetup Streaming Data Pipeline Development
28 June 2023 6pm EST
Milwaukee meetup
https://www.meetup.com/futureofdata-princeton/events/292976004/
Details
This will be a hybrid event with a Zoom. The in-person event will be in Milwaukee.
In this interactive session, Tim will lead participants through how to best build streaming data pipelines. He will cover how to build applications from some common use cases and highlight tips, tricks, best practices and patterns.
He will show how to build the easy way and then dive deep into the underlying open source technologies including Apache NiFi, Apache Flink, Apache Kafka and Apache Iceberg.
If you wish to follow along, please download open source projects beforehand. You can also download this helpful streaming platform: https://docs.cloudera.com/csp-ce/latest/installation/topics/csp-ce-installing-ce.html
All source code and slides will be shared for those interested in building their own FLaNK Apps. https://www.flankstack.dev/
https://www.thecapitalgrille.com/locations/wi/milwaukee/milwaukee/8027
The Capital Grille 310 W Wisconsin Ave, Milwaukee, WI 53203
limited seating, preference will be given to NLIT attendees
A peak at the menu (Not Pizza)
RISOTTO FRITTERS WITH FRESH MOZZARELLA AND PROSCIUTTO
SLICED SIRLOIN WITH ROQUEFORT AND BALSAMIC ONIONS
MINIATURE LOBSTER AND CRAB CAKES
WILD MUSHROOM AND HERBED CHEESE
You can join the meeting virtually here (no meat or cheese virtually):
Similar to Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow (20)
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
Hand Rolled Applicative User ValidationCode KataPhilip Schwarz
Could you use a simple piece of Scala validation code (granted, a very simplistic one too!) that you can rewrite, now and again, to refresh your basic understanding of Applicative operators <*>, <*, *>?
The goal is not to write perfect code showcasing validation, but rather, to provide a small, rough-and ready exercise to reinforce your muscle-memory.
Despite its grandiose-sounding title, this deck consists of just three slides showing the Scala 3 code to be rewritten whenever the details of the operators begin to fade away.
The code is my rough and ready translation of a Haskell user-validation program found in a book called Finding Success (and Failure) in Haskell - Fall in love with applicative functors.
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsPeter Muessig
The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.
SMS API Integration in Saudi Arabia| Best SMS API ServiceYara Milbes
Discover the benefits and implementation of SMS API integration in the UAE and Middle East. This comprehensive guide covers the importance of SMS messaging APIs, the advantages of bulk SMS APIs, and real-world case studies. Learn how CEQUENS, a leader in communication solutions, can help your business enhance customer engagement and streamline operations with innovative CPaaS, reliable SMS APIs, and omnichannel solutions, including WhatsApp Business. Perfect for businesses seeking to optimize their communication strategies in the digital age.
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
What is Master Data Management by PiLog Groupaymanquadri279
PiLog Group's Master Data Record Manager (MDRM) is a sophisticated enterprise solution designed to ensure data accuracy, consistency, and governance across various business functions. MDRM integrates advanced data management technologies to cleanse, classify, and standardize master data, thereby enhancing data quality and operational efficiency.
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Łukasz Chruściel
No one wants their application to drag like a car stuck in the slow lane! Yet it’s all too common to encounter bumpy, pothole-filled solutions that slow the speed of any application. Symfony apps are not an exception.
In this talk, I will take you for a spin around the performance racetrack. We’ll explore common pitfalls - those hidden potholes on your application that can cause unexpected slowdowns. Learn how to spot these performance bumps early, and more importantly, how to navigate around them to keep your application running at top speed.
We will focus in particular on tuning your engine at the application level, making the right adjustments to ensure that your system responds like a well-oiled, high-performance race car.
When it is all about ERP solutions, companies typically meet their needs with common ERP solutions like SAP, Oracle, and Microsoft Dynamics. These big players have demonstrated that ERP systems can be either simple or highly comprehensive. This remains true today, but there are new factors to consider, including a promising new contender in the market that’s Odoo. This blog compares Odoo ERP with traditional ERP systems and explains why many companies now see Odoo ERP as the best choice.
What are ERP Systems?
An ERP, or Enterprise Resource Planning, system provides your company with valuable information to help you make better decisions and boost your ROI. You should choose an ERP system based on your company’s specific needs. For instance, if you run a manufacturing or retail business, you will need an ERP system that efficiently manages inventory. A consulting firm, on the other hand, would benefit from an ERP system that enhances daily operations. Similarly, eCommerce stores would select an ERP system tailored to their needs.
Because different businesses have different requirements, ERP system functionalities can vary. Among the various ERP systems available, Odoo ERP is considered one of the best in the ERp market with more than 12 million global users today.
Odoo is an open-source ERP system initially designed for small to medium-sized businesses but now suitable for a wide range of companies. Odoo offers a scalable and configurable point-of-sale management solution and allows you to create customised modules for specific industries. Odoo is gaining more popularity because it is built in a way that allows easy customisation, has a user-friendly interface, and is affordable. Here, you will cover the main differences and get to know why Odoo is gaining attention despite the many other ERP systems available in the market.
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfVALiNTRY360
Salesforce Healthcare CRM, implemented by VALiNTRY360, revolutionizes patient management by enhancing patient engagement, streamlining administrative processes, and improving care coordination. Its advanced analytics, robust security, and seamless integration with telehealth services ensure that healthcare providers can deliver personalized, efficient, and secure patient care. By automating routine tasks and providing actionable insights, Salesforce Healthcare CRM enables healthcare providers to focus on delivering high-quality care, leading to better patient outcomes and higher satisfaction. VALiNTRY360's expertise ensures a tailored solution that meets the unique needs of any healthcare practice, from small clinics to large hospital systems.
For more info visit us https://valintry360.com/solutions/health-life-sciences
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
Measures in SQL (SIGMOD 2024, Santiago, Chile)Julian Hyde
SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context.
A talk at SIGMOD, June 9–15, 2024, Santiago, Chile
Authors: Julian Hyde (Google) and John Fremlin (Google)
https://doi.org/10.1145/3626246.3653374