Pivotal CF and Continuous Delivery

Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi

Hadoop Security

Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs

Drone Data Flowing Through Apache NiFi

Ingesting Drone Data into Big Data Platforms

SpringOne Platform 2016 Speaker: Amit Gupta; Product Manager, Pivotal. Find out how Cloud Foundry does continuous integration, from a GitHub pull request against a small repository to an official final release. See how we're striving to raise the bar for open source projects when it comes to rigor, automation, and transparency of our CI. We’ll talk about how we: -integrate work from community contributors and core Foundation contributors, spread across multiple teams and continents; -test at multiple layers, from fast, tightly-scoped unit tests, to full blown deployments and acceptance tests across multiple IaaSes; and -keep the full end-to-end process transparent to the community; not just the source code, but also the build pipelines and the discussions that surround artifact promotion. The audience will come away with strategies for continuously integrating and deploying their own Cloud Foundry installations or other distributed systems.

Postgres Open 2014 - A Performance Characterization of Postgres on Different ...

Transforming Culture at Bloomberg

Continuous Delivery for Microservice Architectures with Concourse & Cloud Fou...

SpringOne Platform 2016 Speaker: Alex Ley; Product Manager, Pivotal Building a continuous delivery pipeline for your micro-service based architecture can be a real challenge when using more conventional CI systems like Jenkins and GoCD. How do you get a clear picture of the CI workflow and status? What artifact was deployed and when? How is this all configured? Introducing Concourse (https://concourse.ci), an open source pipeline based CI system that focuses on simplicity, usability and reproducibility. It offers isolated builds, a range of integrations and is built upon a proven technology stack from Cloud Foundry. This talk will demonstrate creating a continuous delivery pipeline for a Spring microservice-based application that uses Spring Cloud. You will see how the pipeline tests services, integrates and then blue / green deploys to Cloud Foundry. Expect to rush to your laptop to try out Concourse after this session!

Auto-scaled Concourse CI on AWS w/o BOSH

佑介九岡

PostgresOpen 2013 A Comparison of PostgreSQL Encryption Options

Are you looking to encrypt your data within PostgreSQL? We will review the various options available for encrypting data with PostgreSQL. We will also look at various options available to employ encryption and review various configuration and performance for using encryption. There are a number of options available when encrypting data with PostgreSQL. When determining the mechanisms to use, it is important to understand the data, the application and how it is being used. We will compare different methods of encrypting data in their feature-sets and performance. We will try to answer the following questions: Where do I enable the encryption? Where is my data safe and where is it exposed? Why should I use the various encryption modules available?

Yace 3.0

Atul Ashar

Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER

Indrajit Poddar

Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Paris Carbone

Large-scale data stream processing has come a long way to where it is today. It combines all the essential requirements of modern data analytics: subsecond latency, high throughput and impressively, strong consistency. Apache Flink is a system that serves as a proof-of-concept of these characteristics and it is mainly well-known for its lightweight fault tolerance. Data engineers and analysts can now let the system handle Terabytes of computational state without worrying about failures that can potentially occur. This presentation describes all the fundamental challenges behind exactly-once processing guarantees in large-scale streaming in a simple and intuitive way. Furthermore, it demonstrate the basic and extended versions of Flink's state-of-the-art snapshotting algorithm tailored to the needs of a dataflow graph.

Building the Ideal Stack for Machine Learning

Machine Learning is not new, but its application across memory-optimized distributed systems has led to an explosion in both the number and capability of its uses. Pandora develops personalized content recommendations with machine learning algorithms, Tesla has produced the first widely distributed autonomous vehicle, and Amazon uses autonomous robots to move packages within its warehouses and even deliver packages. When coupled with real-time data, advanced analytics approaches like machine learning and deep learning create immediate business opportunities. Machine learning has never been more accessible—if your data pipelines support real-time analysis. Attendees will learn tools and techniques for integrating machine learning models across industries and organizations. Steven Camiña, MemSQL Product Manager, will walk through critical technologies needed in your technology ecosystem, including Python, Apache Kafka, Apache Spark, and a real-time database.

Streaming with Oracle Data Integration

Michael Rainey

As a data integration professional, it’s almost a guarantee that you’ve heard of real-time stream processing of Big Data. The usual players in the open source world are Apache Kafka, used to move data in real-time, and Spark Streaming, built for in-flight transformations. But what about relational data? Quite often we forget that products incubated in the Apache Foundation can also serve a purpose for “standard” relational databases as well. But how? Well, let’s introduce Oracle GoldenGate and Oracle Data Integrator for Big Data. GoldenGate can extract relational data in real time and produce Kafka messages, ensuring relational data is a part of the enterprise data bus. These messages can then be ingested via ODI through a Spark Streaming process, integrating with additional data sources, such as other relational tables, flat files, etc, as needed. Finally, the output can be sent to multiple locations: on through to a data warehouse for analytical reporting, back to Kafka for additional targets to consume, or any number of targets. Attendees will walk away with a framework on which they can build their data streaming projects, combining relational data with big data and using a common, structured approach via the Oracle Data Integration product stack. Presented at BIWA Summit 2017.

The Fast Path to Building Operational Applications with Spark

Realtime Analytical Query Processing and Predictive Model Building on High Di...

Spark Summit

Spark SQL and Mllib are optimized for running feature extraction and machine learning algorithms on row based columnar datasets through full scan but does not provide constructs for column indexing and time series analysis. For dealing with document datasets with timestamps where the features are represented as variable number of columns in each document and use-cases demand searching over columns and time to retrieve documents to generate learning models in realtime, a close integration within Spark and Lucene was needed. We introduced LuceneDAO in Spark Summit Europe 2016 to build distributed lucene shards from data frame but the time series attributes were not part of the data model. In this talk we present our extension to LuceneDAO to maintain time stamps with document-term view for search and allow time filters. Lucene shards maintain the time aware document-term view for search and vector space representation for machine learning pipelines. We used Spark as our distributed query processing engine where each query is represented as boolean combination over terms with filters on time. LuceneDAO is used to load the shards to Spark executors and power sub-second distributed document retrieval for the queries. Our synchronous API uses Spark-as-a-Service to power analytical queries while our asynchronous API uses kafka, spark streaming and HBase to power time series prediction algorithms. In this talk we will demonstrate LuceneDAO write and read performance on millions of documents with 1M+ terms and configurable time stamp aggregate columns. We will demonstrate the latency of APIs on a suite of queries generated from terms. Key takeaways from the talk will be a thorough understanding of how to make Lucene powered time aware search a first class citizen in Spark to build interactive analytical query processing and time series prediction algorithms.

JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing

Luis Gonzalez

Blr hadoop meetup

Suneet Grover

Getting started with Azure Event Hubs and Stream Analytics services

Vladimir Bychkov

The total amount of data in the world almost doubles every 2 years. Storing data for offline processing is no longer a viable business model. In the past few years, new technologies for real-time data processing emerged. Microsoft Azure offers a comprehensive set of tools to ingest and process data in motion. In this presentation we will go over and learn how to collect data from devices, how to process data in real time using Azure Stream Analytic jobs, and how to produce and handle actionable insights.

Multi-Datacenter Kafka - Strata San Jose 2017

Gwen (Chen) Shapira

London Apache Kafka Meetup (Jan 2017)

Landoop Ltd

Landoop presenting how to simplify your ETL process using Kafka Connect for (E) and (L). Introducing KCQL - the Kafka Connect Query Language & how it can simplify fast-data (ingress & egress) pipelines. How KCQL can be used to set up Kafka Connectors for popular in-memory and analytical systems and live demos with HazelCast, Redis and InfluxDB. How to get started with a fast-data docker kafka development environment. Enhance your existing Cloudera (Hadoop) clusters with fast-data capabilities.

Introduction to Cloud Foundry #JJUG

Toshiaki Maki

Makefiles in 2020 — Why they still matter

Simon Brüggen

Viewers also liked

Redis for Security Data : SecurityScorecard JVM Redis Usage

Project Management_crynmif

How Cloud Foundry is CI'd

Postgres Open 2014 - A Performance Characterization of Postgres on Different ...

Transforming Culture at Bloomberg

Continuous Delivery for Microservice Architectures with Concourse & Cloud Fou...

Auto-scaled Concourse CI on AWS w/o BOSH

佑介九岡

PostgresOpen 2013 A Comparison of PostgreSQL Encryption Options

Yace 3.0

Atul Ashar

Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER

Indrajit Poddar

Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Paris Carbone

Building the Ideal Stack for Machine Learning

Streaming with Oracle Data Integration

Michael Rainey

The Fast Path to Building Operational Applications with Spark

Realtime Analytical Query Processing and Predictive Model Building on High Di...

Spark Summit

JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing

Luis Gonzalez

Blr hadoop meetup

Suneet Grover

Getting started with Azure Event Hubs and Stream Analytics services

Vladimir Bychkov

Multi-Datacenter Kafka - Strata San Jose 2017

Gwen (Chen) Shapira

London Apache Kafka Meetup (Jan 2017)

Landoop Ltd

Viewers also liked (20)

Redis for Security Data : SecurityScorecard JVM Redis Usage

Project Management_c

How Cloud Foundry is CI'd

Postgres Open 2014 - A Performance Characterization of Postgres on Different ...

Transforming Culture at Bloomberg

Continuous Delivery for Microservice Architectures with Concourse & Cloud Fou...

Auto-scaled Concourse CI on AWS w/o BOSH

PostgresOpen 2013 A Comparison of PostgreSQL Encryption Options

Yace 3.0

Continuous Integration with Cloud Foundry Concourse and Docker on OpenPOWER

Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Building the Ideal Stack for Machine Learning

Streaming with Oracle Data Integration

The Fast Path to Building Operational Applications with Spark

Realtime Analytical Query Processing and Predictive Model Building on High Di...

JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing

Blr hadoop meetup

Getting started with Azure Event Hubs and Stream Analytics services

Multi-Datacenter Kafka - Strata San Jose 2017

London Apache Kafka Meetup (Jan 2017)

Similar to Pivotal CF and Continuous Delivery

Introduction to Cloud Foundry #JJUG

Toshiaki Maki

Makefiles in 2020 — Why they still matter

Simon Brüggen

CI/CD with OCP

Dmitry Kartsev

Kubernetes to improve business scalability and processes (Cloud & DevOps Worl...

Michele Orsi

Concourse x Spinnaker #concourse_tokyo

Toshiaki Maki

Git as version control for Analytics project

Nag Arvind Gudiseva

Practical guide for front-end development for django devs

Davidson Fellipe

Docker for Software Development

inside-BigData.com

Chicago Docker Meetup Presentation - Mediafly

Mediafly

Ryan Jarvinen Open Shift Talk @ Postgres Open 2013PostgresOpen

Using the Mobile Data service on IBM Bluemix with an AngularJS web app

Gretchen Moore

Introduction to Terraform and Google Cloud Platform

Pradeep Bhadani

Gitlab, GitOps & ArgoCD

Haggai Philip Zagury

Modularity - The future, building, packaging

Langdon White

Continuous Delivery com Docker, OpenShift e Jenkins

Bruno Padilha

Oleksandr Yefremov Continuously delivering mobile project

Аліна Шепшелей

Kubernetes and the 12 factor cloud apps

Ana-Maria Mihalceanu

Governance and Risk in Cloud Computing ModelJohn Sanders

Sage 2 19_v5_busby

Ben Busby

IBM Bluemix Hackathon Accelerator

gjuljo

Similar to Pivotal CF and Continuous Delivery (20)

Introduction to Cloud Foundry #JJUG

Makefiles in 2020 — Why they still matter

CI/CD with OCP

Kubernetes to improve business scalability and processes (Cloud & DevOps Worl...

Concourse x Spinnaker #concourse_tokyo

Git as version control for Analytics project

Practical guide for front-end development for django devs

Docker for Software Development

Chicago Docker Meetup Presentation - Mediafly

Ryan Jarvinen Open Shift Talk @ Postgres Open 2013

Using the Mobile Data service on IBM Bluemix with an AngularJS web app

Introduction to Terraform and Google Cloud Platform

Gitlab, GitOps & ArgoCD

Modularity - The future, building, packaging

Continuous Delivery com Docker, OpenShift e Jenkins

Oleksandr Yefremov Continuously delivering mobile project

Kubernetes and the 12 factor cloud apps

Governance and Risk in Cloud Computing Model

Sage 2 19_v5_busby

IBM Bluemix Hackathon Accelerator

More from Timothy Spann

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI Discussion on Vector Databases, Unstructured Data and AI https://www.meetup.com/unstructured-data-meetup-new-york/ This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK

Building Real-Time Pipelines With FLaNK Timothy Spann, Principal Developer Advocate, Streaming - Cloudera Future of Data meetup, startup grind, AI Camp The combination of Apache Flink, Apache NiFi, and Apache Kafka for building real-time data processing pipelines is extremely powerful, as demonstrated by this case study using the FLaNK-MTA project. The project leverages these technologies to process and analyze real-time data from the New York City Metropolitan Transportation Authority (MTA). FLaNK-MTA demonstrates how to efficiently collect, transform, and analyze high-volume data streams, enabling timely insights and decision-making. Apache NiFi Apache Kafka Apache Flink Apache Iceberg LLM Generative AI Slack Postgresql

Generative AI on Enterprise Cloud with NiFi and Milvus

Gen AI on Enterprise Cloud Apache NiFi Milvus Apache Kafka Apache Flink Cloudera Machine Learning Cloudera DataFlow https://medium.com/@tspann/building-a-milvus-connector-for-nifi-34372cb3c7fa https://www.meetup.com/futureofdata-princeton/events/300737266/ https://lu.ma/q7pcfyjn?source=post_page-----34372cb3c7fa--------------------------------&tk=TTyakY If you're interested in working with Generative AI on the cloud, this virtual workshop is for you. Tim Spann from Cloudera and Yujian Tang from Zilliz will cover how you can implement your own GenAI workflows on the cloud at enterprise scale. 9:00 - 9:05: Intro 9:05 - 9:15: What is Milvus 9:15 - 9:25: Cloudera Development Platform 9:25 - 10:00: Demo Location https://www.youtube.com/watch?v=IfWIzKsoHnA https://github.com/tspannhw/SpeakerProfile https://www.linkedin.com/in/yujiantang/

April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024

Real-Time AI Streaming - AI Max Princeton

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines https://www.youtube.com/watch?v=Yeua8NlzQ3Y https://www.conf42.com/Large_Language_Models_LLMs_2024_Tim_Spann_generative_ai_streaming Adding Generative AI to Real-Time Streaming Pipelines Abstract Let’s build streaming pipelines that convert streaming events into prompts, call LLMs, and process the results. Summary Tim Spann: My talk is adding generative AI to real time streaming pipelines. I'm going to discuss a couple of different open source technologies. We'll touch on Kafka, Nifi, Flink, Python, Iceberg. All the slides, all the code and GitHub are out there. Llm, if you didn't know, is rapidly evolving. There's a lot of different ways to interact with models. That enrichment, transformation, processing really needs tools. The amount of models and projects and software that are available is massive. Nifi supports hundreds of different inputs and can convert them on the fly. Great way to distribute your data quickly to whoever needs it without duplication, without tight coupling. Fun to find new things to integrate into. So what we can do is, well, I want to get a meetup chat going. I have a processor here that just listens for events as they come from slack. And then I'm going to clean it up, add a couple fields and push that out to slack. Every model is a little bit of different tweaking. Nifi acts as a whole website. And as you see here, it can be get, post, put, whatever you want. We send that response back to flink and it shows up here. Thank you for attending this talk. I'm going to be speaking at some other events very shortly. Transcript This transcript was autogenerated. To make changes, submit a PR. Hi, Tim Spann here. My talk is adding generative AI to real time streaming pipelines, and we're here for the large language model conference at Comp 42, which is always a nice one, great place to be. I'm going to discuss a couple of different open source technologies that work together to enable you to build real time pipelines using large language models. So we'll touch on Kafka, Nifi, Flink, Python, Iceberg, and I'll show you a little bit of each one in the demos. I've been working with data machine learning, streaming IoT, some other things for a number of years, and you could contact me at any of these places, whether Twitter or whatever it's called, some different blogs, or in person at my meetups and at different conferences around the world. I do a weekly newsletter, cover streaming ML, a lot of LLM, open source, Python, Java, all kinds of fun stuff, as I mentioned, do a bunch of different meetups. They are not just in the east coast of the US, they are available virtually live, and I also put them on YouTube, and if you need them somewhere else, let me know. We publish all the slides, all the code and GitHub. Everything you need is out there. Let's get into the talk. Llm, if you didn't know, is rapidly evolving. While you're typing down the things that you use, it

2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...

28March2024-Codeless-Generative-AI-Pipelines

28March2024-Codeless-Generative-AI-Pipelines https://www.meetup.com/futureofdata-princeton/events/299440871/ https://www.meetup.com/real-time-analytics-meetup-ny/events/299290822/ ******Note***** The event is seat-limited, therefore please complete your registration here. Only people completing the form will be able to attend. ----------------------- We're excited to invite you to join us in-person, for a Real-Time Analytics exploration! Join us for an evening of insights, networking as we delve into the OSS technologies shaping the field! Agenda: 05:30-06:00: Pizza and friends 06:00- 06:40: Codeless GenAI Pipelines with Flink, Kafka, NiFi 06:40- 07:20 Real-Time Analytics in the Corporate World: How Apache Pinot® Powers Industry Leaders 07:20-07:30 QNA Codeless GenAI Pipelines with Flink, Kafka, NiFi | Tim Spann, Cloudera Explore the power of real-time streaming with GenAI using Apache NiFi. Learn how NiFi simplifies data engineering workflows, allowing you to focus on creativity over technical complexities. I'll guide you through practical examples, showcasing NiFi's automation impact from ingestion to delivery. Whether you're a seasoned data engineer or new to GenAI, this talk offers valuable insights into optimizing workflows. Join us to unlock the potential of real-time streaming and witness how NiFi makes data engineering a breeze for GenAI applications! Real-Time Analytics in the Corporate World: How Apache Pinot® Powers Industry Leaders | Viktor Gamov, StarTree Explore how industry leaders like LinkedIn, Uber Eats, and Stripe are mastering real-time data with Viktor as your guide. Discover how Apache Pinot transforms data into actionable insights instantly. Viktor will showcase Pinot's features, including the Star-Tree Index, and explain why it's a game-changer in data strategy. This session is for everyone, from data geeks to business gurus, eager to uncover the future of tech. Join us and be wowed by the power of real-time analytics with Apache Pinot! ------- Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more.

TCFPro24 Building Real-Time Generative AI Pipelines

https://princetonacm.acm.org/tcfpro/ 18th Annual IEEE IT Professional Conference (ITPC) Armstrong Hall at The College of New Jersey Friday, March 15th, 2024 | 10:00 AM to 5:00 PM IT Professional Conference at Trenton Computer Festival IEEE Information Technology Professional Conference on Friday, March 15th, 2024 TCFPro24 Building Real-Time Generative AI Pipelines Building Real-Time Generative AI Pipelines In this talk, Tim will delve into the exciting realm of building real-time generative AI pipelines with streaming capabilities. The discussion will revolve around the integration of cutting-edge technologies to create dynamic and responsive systems that harness the power of generative algorithms. From leveraging streaming data sources to implementing advanced machine learning models, the presentation will explore the key components necessary for constructing a robust real-time generative AI pipeline. Practical insights, use cases, and best practices will be shared, offering a comprehensive guide for developers and data scientists aspiring to design and implement dynamic AI systems in a streaming environment. Tim will show a live demo showing we can use Apache NiFi to provide a live chat between a person in Slack and several LLM models all orchestrated with Apache NiFi, Apache Kafka and Python. We will use RAG against Chroma and Pinecone vector data stores, Hugging Face and WatsonX.AI LLM, and add additional context with NiFi lookups of stocks, weather and other data streams in real-time. Timothy Spann Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Pulsar, Apache Kafka, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.

2024 Build Generative AI for Non-Profits

2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...

2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipelines https://www.meetup.com/futureofdata-newyork/events/298660453/ Unlocking Financial Data with Real-Time Pipelines (Flink Analytics on Stocks with SQL ) By Timothy Spann Financial institutions thrive on accurate and timely data to drive critical decision-making processes, risk assessments, and regulatory compliance. However, managing and processing vast amounts of financial data in real-time can be a daunting task. To overcome this challenge, modern data engineering solutions have emerged, combining powerful technologies like Apache Flink, Apache NiFi, Apache Kafka, and Iceberg to create efficient and reliable real-time data pipelines. In this talk, we will explore how this technology stack can unlock the full potential of financial data, enabling organizations to make data-driven decisions swiftly and with confidence. Introduction: Financial institutions operate in a fast-paced environment where real-time access to accurate and reliable data is crucial. Traditional batch processing falls short when it comes to handling rapidly changing financial markets and responding to customer demands promptly. In this talk, we will delve into the power of real-time data pipelines, utilizing the strengths of Apache Flink, Apache NiFi, Apache Kafka, and Iceberg, to unlock the potential of financial data. I will be utilizing NiFi 2.0 with Python and Vector Databases. Timothy Spann Principal Developer Advocate, Cloudera Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science. https://twitter.com/PaaSDev https://www.linkedin.com/in/timothyspann/ https://medium.com/@tspann https://github.com/tspannhw/FLiPStackWeekly/

Conf42-Python-Building Apache NiFi 2.0 Python Processors

Conf42-Python-Building Apache NiFi 2.0 Python Processors https://www.conf42.com/Python_2024_Tim_Spann_apache_nifi_2_processors Building Apache NiFi 2.0 Python Processors Abstract Let’s enhance real-time streaming pipelines with smart Python code. Adding code for vector databases and LLM. Summary Tim Spann: I'm going to be talking today, be building Apache 9520 Python processors. One of the main purposes of supporting Python in the streaming tool Apache Nifi is to interface with new machine learning and AI and Gen AI. He says Python is a real game changer for Cloudera. You're just going to add some metadata around it. It's a great way to pass a file along without changing it too substantially. We really need you to have Python 310 and again JDK 21 on your machine. You got to be smart about how you use these models. There are a ton of python processors available. You can use them in multiple ways. We're still in the early world of Python processors, so now's the time to start putting yours out there. Love to see a lot of people write their own. When we are parsing documents here, again, this is the Python one I'm picking PDF. Lots of different things you could do. If you're interested on writing your own python code for Apache Nifi, definitely reach out and thank.

Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...

Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg with Stock Data and LLM Abstract In this talk, we’ll discuss how to use Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg to process and analyze stock data. We demonstrated the ingestion, processing, and analysis of stock data. Additionally, we illustrated how to use an LLM to generate predictions from the analyzed data. Karin Wolok Developer Relations, Dev Marketing, and Community Programming @ Project Elevate Karin Wolok's LinkedIn account Karin Wolok's twitter account Tim Spann Principal Developer Advocate @ Cloudera Tim Spann's LinkedIn account Tim Spann's twitter account https://www.conf42.com/Python_2024_Karin_Wolok_Tim_Spann_nifi__kafka_risingwave_iceberg_llm

2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines

2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines https://www.aicamp.ai/event/eventdetails/W2024022214 apache nifi llm generative ai gen ai ml dl machine learning apache kafka apache flink postgresql python AI Meetup (NYC): GenAI, LLMs, ML and Data Feb 22, 05:30 PM EST Welcome to the monthly in-person AI meetup in New York City, in collaboration with Microsoft. Join us for deep dive tech talks on AI, GenAI, LLMs and machine learning, food/drink, networking with speakers and fellow developers Agenda: * 5:30pm~6:00pm: Checkin, Food/drink and networking * 6:00pm~6:10pm: Welcome/community update * 6:10pm~8:30pm: Tech talks * 8:30pm: Q&A, Open discussion Tech Talk: Searching and Reasoning Over Multimedia Data with Vector Databases and LMMs Speaker: Zain Hasan (Weaviate LinkedIn) Abstract: In this talk, Zain Hasan will discuss how we can use open-source multimodal embedding models in conjunction with large generative multimodal models that can that can see, hear, read, and feel data(!), to perform cross-modal search(searching audio with images, videos with text etc.) and multimodal retrieval augmented generation (MM-RAG) at the billion-object scale with the help of open source vector databases. I will also demonstrate, with live code demos, how being able to perform this cross-modal retrieval in real-time can enables users to use LLMs that can reason over their enterprise multimodal data. This talk will revolve around how we can scale the usage of multimodal embedding and generative models in production. Tech Talk: Codeless Generative AI Pipelines Speaker: Timothy Spann (Cloudera LinkedIn) Abstract: Join us for an insightful talk on leveraging the power of real-time streaming tools, specifically Apache NiFi, to revolutionize GenAI data engineering. In this session, we’ll explore how the integration of Apache NiFi can automate the entire process of prompt building, making it a seamless and efficient task. Speakers/Topics: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics Sponsors: We are actively seeking sponsors to support our community. Whether it is by offering venue spaces, providing food/drink, or cash sponsorship. Sponsors will have the chance to speak at the meetups, receive prominent recognition, and gain exposure to our extensive membership base of 20,000+ local or 300K+ developers worldwide. Venue: Microsoft NYC - Times Square, 11 Times Square, New York, NY 10036 Room Name: Central Park West 6501 Community on Slack/Discord - Event chat: chat and connect with speakers and attendees - Sharing blogs, events, job openings, projects collaborations Join Slack (search and join the #newyork channel) | Join Discord

DBA Fundamentals Group: Continuous SQL with Kafka and Flink

DBA Fundamentals Group: Continuous SQL with Kafka and Flink 20-Feb-2024 In this talk, I will walk through how someone can set up and run continuous SQL queries against Kafka topics utilizing Apache Flink. We will walk through creating Kafka topics, schemas, and publishing data. We will then cover consuming Kafka data, joining Kafka topics, and inserting new events into Kafka topics as they arrive. This basic overview will show hands-on techniques, tips, and examples of how to do this. Tim Spann Tim Spann is the Principal Developer Advocate for Data in Motion @ Cloudera where he works with Apache Kafka, Apache Flink, Apache NiFi, Apache Iceberg, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.

NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...

OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines

OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines Unlocking Financial Data with Real-Time Pipelines Financial institutions thrive on accurate and timely data to drive critical decision-making processes, risk assessments, and regulatory compliance. However, managing and processing vast amounts of financial data in real-time can be a daunting task. To overcome this challenge, modern data engineering solutions have emerged, combining powerful technologies like Apache Flink, Apache NiFi, Apache Kafka, and Iceberg to create efficient and reliable real-time data pipelines. In this talk, we will explore how this technology stack can unlock the full potential of financial data, enabling organizations to make data-driven decisions swiftly and with confidence. Introduction: Financial institutions operate in a fast-paced environment where real-time access to accurate and reliable data is crucial. Traditional batch processing falls short when it comes to handling rapidly changing financial markets and responding to customer demands promptly. In this talk, we will delve into the power of real-time data pipelines, utilizing the strengths of Apache Flink, Apache NiFi, Apache Kafka, and Iceberg, to unlock the potential of financial data. Key Points to be Covered: Introduction to Real-Time Data Pipelines: a. The limitations of traditional batch processing in the financial domain. b. Understanding the need for real-time data processing. Apache Flink: Powering Real-Time Stream Processing: a. Overview of Apache Flink and its role in real-time stream processing. b. Use cases for Apache Flink in the financial industry. c. How Flink enables fast, scalable, and fault-tolerant processing of streaming financial data. Apache Kafka: Building Resilient Event Streaming Platforms: a. Introduction to Apache Kafka and its role as a distributed streaming platform. b. Kafka's capabilities in handling high-throughput, fault-tolerant, and real-time data streaming. c. Integration of Kafka with financial data sources and consumers. Apache NiFi: Data Ingestion and Flow Management: a. Overview of Apache NiFi and its role in data ingestion and flow management. b. Data integration and transformation capabilities of NiFi for financial data. c. Utilizing NiFi to collect and process financial data from diverse sources. Iceberg: Efficient Data Lake Management: a. Understanding Iceberg and its role in managing large-scale data lakes. b. Iceberg's schema evolution and table-level metadata capabilities. c. How Iceberg simplifies data lake management in financial institutions. Real-World Use Cases: a. Real-time fraud detection using Flink, Kafka, and NiFi. b. Portfolio risk analysis with Iceberg and Flink. c. Streamlined regulatory reporting leveraging all four technologies. Best Practices and Considerations: a. Architectural considerations when building real-time financial data pipelines. b. Ensuring data integrity, security, and compliance in real-time pipelines. c. Scalability an

Building Real-Time Travel Alerts

Building Real-time Travel Alerts In this session, we will walk through how to build a complete streaming application to send alerts based on travel advisories from public data. We will also join in other data sources of relevance and push out alerts. We will show you how to build this streaming application with Apache NiFi, Apache Kafka, and Apache Flink and show you when/why/how, and what to build to maximize performance, productivity, and ease of development. Let's get streaming. Apache Flink Apache Kafka Apache NiFi FLaNK Stack Tim Spann Big Data Conference Europe 2023

JConWorld_ Continuous SQL with Kafka and Flink

JConWorld: Continuous SQL with Kafka and Flink In this talk, I will walk through how someone can setup and run continous SQL queries against Kafka topics utilizing Apache Flink. We will walk through creating Kafka topics, schemas and publishing data. We will then cover consuming Kafka data, joining Kafka topics and inserting new events into Kafka topics as they arrive. This basic over view will show hands-on techniques, tips and examples of how to do this. Tim Spann is the Principal Developer Advocate for Data in Motion @ Cloudera where he works with Apache Kafka, Apache Flink, Apache NiFi, Apache Iceberg, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science. https://www.datainmotion.dev/p/about-me.html https://dzone.com/users/297029/bunkertor.html https://www.youtube.com/channel/UCDIDMDfje6jAvNE8DGkJ3_w?view_as=subscriber

More from Timothy Spann (20)

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK

Generative AI on Enterprise Cloud with NiFi and Milvus

April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024

Real-Time AI Streaming - AI Max Princeton

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines

2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...

28March2024-Codeless-Generative-AI-Pipelines

TCFPro24 Building Real-Time Generative AI Pipelines

2024 Build Generative AI for Non-Profits

2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...

Conf42-Python-Building Apache NiFi 2.0 Python Processors

Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...

2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines

DBA Fundamentals Group: Continuous SQL with Kafka and Flink

NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...

OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines

Building Real-Time Travel Alerts

JConWorld_ Continuous SQL with Kafka and Flink

Recently uploaded

Neuro-symbolic is not enough, we need neuro-*semantic*

Frank van Harmelen

Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”. All of this illustrated with link prediction over knowledge graphs, but the argument is general.

Designing Great Products: The Power of Design and Leadership by Chief Designe...

JMeter webinar - integration with InfluxDB and Grafana

RTTS

Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application. In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics. Length: 30 minutes Session Overview ------------------------------------------- During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana: - What out-of-the-box solutions are available for real-time monitoring JMeter tests? - What are the benefits of integrating InfluxDB and Grafana into the load testing stack? - Which features are provided by Grafana? - Demonstration of InfluxDB and Grafana using a practice web application To view the webinar recording, go to: https://www.rttsweb.com/jmeter-integration-webinar

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Ramesh Iyer

In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.

Leading Change strategies and insights for effective change management pdf 1.pdf

OnBoard

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster. However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks. In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

Thierry Lestable

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

91mobiles

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

FIDO Alliance

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Sri Ambati

Monitoring Java Application Security with JDK Tools and JFR Events

Ana-Maria Mihalceanu

The Art of the Pitch: WordPress Relationships and Sales

Laura Byrne

Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes? All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.

Knowledge engineering: from people to machines and back

Elena Simperl

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

Prayukth K V

The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development. The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers: State of global ICS asset and network exposure Sectoral targets and attacks as well as the cost of ransom Global APT activity, AI usage, actor and tactic profiles, and implications Rise in volumes of AI-powered cyberattacks Major cyber events in 2024 Malware and malicious payload trends Cyberattack types and targets Vulnerability exploit attempts on CVEs Attacks on counties – USA Expansion of bot farms – how, where, and why In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East Why are attacks on smart factories rising? Cyber risk predictions Axis of attacks – Europe Systemic attacks in the Middle East Download the full report from here: https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024

Tobias Schneck

As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other? Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...

James Anderson

Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management. The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM). Speakers: Bob Boule Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle. Gopinath Rebala Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...