Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo

•

0 likes•194 views

Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo Apache NiFi - Cloudera DataFlow Apache Flink - Cloudera Streaming Analytics Apache Kafka - Streaming LLM / WatsonX.AI Generative AI granite and llama2 models slack to slack apps

Data & Analytics

November 2, 2023 | 9:00 AM – 6:00 PM
INTEGRATING AI INTO REAL-TIME DATA
PIPELINES

Integrating AI Into Real-Time Data
Pipelines

3
© 2023 Cloudera, Inc. All rights reserved.
Streaming
data
Data at rest
Change data
capture
Any
DATA
Real-time
Processing
• Analyze data in
motion
• Continuous
monitoring
• Trends and
anomalies
Data
Lakehouse
Data
products
Any
BUSINESS
EVENT
Continuous
Results
• No-Code UI
• Author once
publish anywhere
• Analytics lifecycle
management for
dev/ops
Any
DATA
ANALYST
AI models
Event-driven
apps
Analytics apps
Any
DATA
CONSUMER
Data Relevance for Real-Time Applications

4
© 2023 Cloudera, Inc. All rights reserved.
INGEST PREPARE PUBLISH
DATA SOURCES
Internal Users
(After Sales)
External
Systems
ENTERPRISE
LAKEHOUSE
CAPABILITY VIEW
INGESTION
MESSAGE HUB
STORAGE
BATCH
MANAGEMENT
STREAM
CONSUMPTION
Closed Loop
Systems
SQL Stream Builder
Machine Learning
Data Visualization
Workload Manager
watsonx.data

5
© 2023 Cloudera, Inc. All rights reserved.
Cloudera’s Data in Motion Services
Cloudera Offers Two Core Data-In-Motion Services: DataFlow & Stream Processing
DATAFLOW — Powered by Apache NiFi, it enables
developers to connect to any data source anywhere with
any structure, process it, and deliver to any destination
using a low-code authoring experience.
CLOUDERA SDX — Secure, Monitor and Govern your
Streaming workloads with the same tooling using Apache
Ranger & Apache Atlas.
STREAM PROCESSING — Powered by Apache Flink and
Kafka, it provides a complete, enterprise-grade stream
management and stateful processing solution. With
support for industry standard interfaces like SQL,
developers, data analysts, and data scientist can easily
build a wide variety of hybrid real-time applications.

6
© 2023 Cloudera, Inc. All rights reserved.
Simpliﬁed Streaming Pipelines
Connect to any data source anywhere, process and deliver to any destination
Ingest Process Distribute
Active
Passive
Route
Filter
Enrich
Transform
Data born in
the cloud
Data born
outside the
cloud
Any
destination
Connectors
Gateway
Endpoint
Connect & Pull
Send
Connectors
Deliver

LLM USE CASE
Vector DB
AI Model
Unstructured ﬁle types
Data in Motion
on Cloudera Data
Platform (CDP)
Capture, process &
distribute any data,
anywhere
Other enterprise data Open Data Lakehouse
Materialized Views
Structured Sources
Applications/API’s
Streams

8
© 2023 Cloudera, Inc. All rights reserved.
Apache NiFi in a few numbers
A very active project with a dynamic community & comparison with ACEU 2019
2800+ members on the Slack channel (535+ - 4 years ago)
475+ contributors on Github across the repositories (260+ - 4 years ago)
65 committers in the Apache NiFi community (45 - 4 years ago)
Apache NiFi 1.23.2 is the latest release, NiFi 2.0 coming soon (NiFi 1.10 - 4 years ago)
14M+ docker pulls of the Apache NiFi image (1M+ - 4 years ago)

Meetup Streaming Data Pipeline Development 28 June 2023 6pm EST Milwaukee meetup https://www.meetup.com/futureofdata-princeton/events/292976004/ Details This will be a hybrid event with a Zoom. The in-person event will be in Milwaukee. In this interactive session, Tim will lead participants through how to best build streaming data pipelines. He will cover how to build applications from some common use cases and highlight tips, tricks, best practices and patterns. He will show how to build the easy way and then dive deep into the underlying open source technologies including Apache NiFi, Apache Flink, Apache Kafka and Apache Iceberg. If you wish to follow along, please download open source projects beforehand. You can also download this helpful streaming platform: https://docs.cloudera.com/csp-ce/latest/installation/topics/csp-ce-installing-ce.html All source code and slides will be shared for those interested in building their own FLaNK Apps. https://www.flankstack.dev/ https://www.thecapitalgrille.com/locations/wi/milwaukee/milwaukee/8027 The Capital Grille 310 W Wisconsin Ave, Milwaukee, WI 53203 limited seating, preference will be given to NLIT attendees A peak at the menu (Not Pizza) RISOTTO FRITTERS WITH FRESH MOZZARELLA AND PROSCIUTTO SLICED SIRLOIN WITH ROQUEFORT AND BALSAMIC ONIONS MINIATURE LOBSTER AND CRAB CAKES WILD MUSHROOM AND HERBED CHEESE You can join the meeting virtually here (no meat or cheese virtually):

Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023

ssuser73434e

Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023 Future of Data: New Jersey - Princeton, Edison, Holmdel This will be a hybrid event with a Zoom. The in-person event will be in Milwaukee. In this interactive session, Tim will lead participants through how to best build streaming data pipelines. He will cover how to build applications from some common use cases and highlight tips, tricks, best practices and patterns. He will show how to build the easy way and then dive deep into the underlying open source technologies including Apache NiFi, Apache Flink, Apache Kafka and Apache Iceberg. If you wish to follow along, please download open source projects beforehand. You can also download this helpful streaming platform: https://docs.cloudera.com/csp-ce/latest/installation/topics/csp-ce-installing-ce.html All source code and slides will be shared for those interested in building their own FLaNK Apps. https://www.flankstack.dev/ https://www.thecapitalgrille.com/locations/wi/milwaukee/milwaukee/8027 The Capital Grille 310 W Wisconsin Ave, Milwaukee, WI 53203 limited seating, preference will be given to NLIT attendees A peak at the menu (Not Pizza) RISOTTO FRITTERS WITH FRESH MOZZARELLA AND PROSCIUTTO SLICED SIRLOIN WITH ROQUEFORT AND BALSAMIC ONIONS MINIATURE LOBSTER AND CRAB CAKES WILD MUSHROOM AND HERBED CHEESE You can join the meeting virtually here (no meat or cheese virtually):

GSJUG: Mastering Data Streaming Pipelines 09May2023

Timothy Spann

GSJUG: Mastering Data Streaming Pipelines 09May2023 https://www.meetup.com/futureofdata-princeton/events/293233881/ This is a repost from the Garden State Java Users Group Event. Join me at https://www.meetup.com/garden-state-java-user-group/events/293229660/ See: https://www.eventbrite.com/e/mastering-data-streaming-pipelines-tickets-627677218457?_ga=2.253257801.1787151623.1682868226-741104479.1678110925 Please note that registration via EventBrite is required to attend either in-person or online. We are happy to announce that Tim Spann will be our special guest for the May 9, 2023 meeting! Abstract: In this session, Tim will show you some best practices that he has discovered over the last seven years in building data streaming applications including IoT, CDC, Logs, and more. In his modern approach, we utilize several Apache frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Kafka. From there we build streaming ETL with Apache Flink, enhance events with NiFi enrichment. We build continuous queries against our topics with Flink SQL. We will show where Java fits in as sources, enrichments, NiFi processors and sinks. We hope to see you on May 9! Speaker Timothy Spann Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Pulsar, Apache Kafka, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science. In this session, Tim will show you some best practices that he has discovered over the last seven years in building data streaming applications, including IoT, CDC, Logs, and more. In his modern approach, we utilize several Apache frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Kafka. From there, we build streaming ETL with Apache Flink, enhance events with NiFi enrichment. We build continuous queries against our topics with Flink SQL. We will show where Java fits in as sources, enrichments, NiFi processors, and sinks. https://www.eventbrite.com/e/mastering-data-streaming-pipelines-tickets-627677218457?_ga=2.253257801.178

OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf

Timothy Spann

Unlocking Financial Data with Real-Time Pipelines tspannOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf Tim Spann Twitter: @PaasDev // Blog: datainmotion.dev Principal Developer Advocate. Princeton Future of Data Meetup. ex-Pivotal, ex-Hortonworks, ex-StreamNative, ex-PwC https://medium.com/@tspann https://github.com/tspannhw https://events.linuxfoundation.org/open-source-finance-forum-new-york/ Open Source in Finance Forum NYC November 1, 2023

The Never Landing Stream with HTAP and Streaming

Timothy Spann

Reliable Data Intestion in BigData / IoT

Guido Schmutz

Many of the Big Data and IoT use cases are based on combing data from multiple data sources and to make them available on a Big Data platform for analysis. The data sources are often very heterogeneous, from simple files, databases to high-volume event streams from sensors (IoT devices). It’s important to retrieve this data in a secure and reliable manner and integrate it with the Big Data platform so that it is available for analysis in real-time (stream processing) as well as in batch (typical big data processing). In past some new tools have emerged, which are especially capable of handling the process of integrating data from outside, often called Data Ingestion. From an outside perspective, they are very similar to a traditional Enterprise Service Bus infrastructures, which in larger organization are often in use to handle message-driven and service-oriented systems. But there are also important differences, they are typically easier to scale in a horizontal fashion, offer a more distributed setup, are capable of handling high-volumes of data/messages, provide a very detailed monitoring on message level and integrate very well with the Hadoop ecosystem. This session will present and compare Apache Flume, Apache NiFi, StreamSets and the Kafka Ecosystem and show how they handle the data ingestion in a Big Data solution architecture.

Implement a Universal Data Distribution Architecture to Manage All Streaming ...

Timothy Spann

Azure IOT

Maik van der Gaag

Unconference Round Table Notes The future of real-time stream processing WASM (Web Assembly) Petabyte, 5000 Node Clusters, Smart Hyper Scaling Multi-language support (Python, Rust, Kotlin, Golang, Carbon, JVM) Machine Learning, Deep Learning, AI and Advanced Math Low Code Development like Apache NiFi, DataFlow Designer, SQL Dynamic Hybrid Deployment Citizen Stream Engineer IoT, Edge Streaming and Hybrid Edge Streaming Java 20, 21; Java Loom Virtual Threading Ultra low latency, trillions of events per second, massive RAM/network Current challenges of real-time stream processing and proposed solutions Deployment, Automation and Scaling Choosing right project/sizing for use case Simple Event Processing vs Complex Event Processing Leveraging existing applications Developer Skills Self management and monitoring Cost issues -> autoscaling, optimizing, performance, hybrid deployment Performance / Benchmarking real-time stream processing Kafka/Pulsar: https://openmessaging.cloud/docs/benchmarks/ NiFi: https://blog.cloudera.com/benchmarking-nifi-performance-and-scalability/ Flink: https://github.com/ververica/flink-sql-benchmark Hazelcast: https://hazelcast.com/press-release/hazelcast-demonstrates-cloud-efficiency-real-time-stream-processing-of-one-billion-events-per-second/ Current trends of real-time stream processing in 2023 Current challenges of real-time stream processing and proposed solutions Performance / Benchmarking real-time stream processing The future of real-time stream processing Current trends of real-time stream processing in 2023 Lightweight serverless Hazelcast SQL Flink Kafka or Pulsar as Messaging Hub Java 17+ Managed Clusters, Containers and Environments Real-Time Analytics Fast Storage Options

Streaming Visualization

Guido Schmutz

Most data visualisation solutions today still work on data sources which are stored persistently in a data store, using the so called “data at rest” paradigms. More and more data sources today provide a constant stream of data, from IoT devices to Social Media streams. These data stream publish with high velocity and messages often have to be processed as quick as possible. For the processing and analytics on the data, so called stream processing solutions are available. But these only provide minimal or no visualisation capabilities. One was is to first persist the data into a data store and then use a traditional data visualisation solution to present the data. If latency is not an issue, such a solution might be good enough. An other question is which data store solution is necessary to keep up with the high load on write and read. If it is not an RDBMS but an NoSQL database, then not all traditional visualisation tools might already integrate with the specific data store. An other option is to use a Streaming Visualisation solution. They are specially built for streaming data and often do not support batch data. A much better solution would be to have one tool capable of handling both, batch and streaming data. This talk presents different architecture blueprints for integrating data visualisation into a fast data solution and highlights some of the products available to implement these blueprints.

Streaming Data and Stream Processing with Apache Kafka

confluent

Time's Up! Getting Value from Big Data Now

Eric Kavanagh

The Briefing Room with Dr. Robin Bloor and CASK We all know the promise of big data, but who gets the value? There are plenty of success stories already, and most of them involve one key ingredient: facilitated access to important data sets. Most research studies suggest that the Pareto principle applies: 80 percent goes to data integration, and only 20 to analysis. Inverting that balance is the Holy Grail. Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain why the time has finally come for turning the tables on the status quo in analytics. He'll be briefed by CASK CEO Jonathan Gray, who will showcase his company's big data integration platform, CDAP, which was specifically designed to expedite time-to-value for big data.

Leveraging cloud database connectors to automate analytics in alteryx

Grazitti Interactive

Big Data 2.0: ETL & Analytics: Implementing a next generation platform

Caserta

In our most recent Big Data Warehousing Meetup, we learned about transitioning from Big Data 1.0 with Hadoop 1.x with nascent technologies to the advent of Hadoop 2.x with YARN to enable distributed ETL, SQL and Analytics solutions. Caserta Concepts Chief Architect Elliott Cordo and an Actian Engineer covered the complete data value chain of an Enterprise-ready platform including data connectivity, collection, preparation, optimization and analytics with end user access. Access additional slides from this meetup here: http://www.slideshare.net/CasertaConcepts/big-data-warehousing-meetup-january-20 For more information on our services or upcoming events, please visit http://www.actian.com/ or http://www.casertaconcepts.com/.

Benefits of the Azure Cloud

Caserta

Caserta Concepts, Datameer and Microsoft shared their combined knowledge and a use case on big data, the cloud and deep analytics. Attendes learned how a global leader in the test, measurement and control systems market reduced their big data implementations from 18 months to just a few. Speakers shared how to provide a business user-friendly, self-service environment for data discovery and analytics, and focus on how to extend and optimize Hadoop based analytics, highlighting the advantages and practical applications of deploying on the cloud for enhanced performance, scalability and lower TCO. Agenda included: - Pizza and Networking - Joe Caserta, President, Caserta Concepts - Why are we here? - Nikhil Kumar, Sr. Solutions Engineer, Datameer - Solution use cases and technical demonstration - Stefan Groschupf, CEO & Chairman, Datameer - The evolving Hadoop-based analytics trends and the role of cloud computing - James Serra, Data Platform Solution Architect, Microsoft, Benefits of the Azure Cloud Service - Q&A, Networking For more information on Caserta Concepts, visit our website: http://casertaconcepts.com/

Confluent kafka meetupseattle jan2017

Nitin Kumar

Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...

VMware Tanzu

SpringOne Platform 2017 Gilbert Lau, Data Stax; Wayne Lund, Pivotal "Spring Cloud Data Flow satisfies all of the demands of modern streaming and task workloads. A growing number of customers are viewing Pivotal Cloud Foundry as an ideal runtime for these types of workloads to take advantage of all of the microservice architecture features of Spring Boot apps leveraging Spring Cloud Services. This is only half of the equation. Once the streaming data is persisted on their database, our customers want to generate actionable insights to provide the best customer experience to stay on top of the competitive marketplace. DataStax Enterprise (DSE) is a single and unified big data platform with Apache Cassandra NoSQL database at its core. Integrated within each node of DSE is powerful indexing, search through Apache Solr, analytics through Apache Spark, and a enterprise-ready graph functionality. It is by far the only operational data platform which can scale linearly in excess of 1,000 nodes, with no single point of failure, and is capable of providing real-time active-everywhere replication across many datacenters and cloud providers. In this presentation and demo we will take a common social data set and show SCDF advantages on PCF for microservice scaling and pipelining data into a DataStax Enterprise Cassandra NoSQL database. Then followed by extracting meaningful information through DataStax Enterprise Search, DataStax Enterprise Analytics, and DataStax Cassandra Service Broker Tile for PCF using a Spring Boot Dashboard application."

Your Agile, Modern Data Delivery Platform

syed_javed

Lyftron - A Modern Data Hub Platform for Faster Analytics

MohdAmzad1

The path to success with Graph Database and Graph Data Science

Neo4j

ITPC Building Modern Data Streaming Apps

Timothy Spann

ITPC Building Modern Data Streaming Apps https://princetonacm.acm.org/tcfpro/ 17th Annual IEEE IT Professional Conference (ITPC) Armstrong Hall at The College of New Jersey Friday, March 17th, 2023 at 8:30 AM to 5:00 PM TCF Photo In continuous operation since 1976, the Trenton Computer Festival (TCF) is the nation's longest running personal computer. For the seventeenth year, the TCF is extending its program to provide Information Technology and computer professionals with an additional day of conference. It is intended, in an economical way, to provide attendees with insight and information pertinent to their jobs, and to keep them informed of emerging technologies that could impact their work. The IT Professional Conference is co-sponsored by the Institute of Electrical and Electronics Engineers (IEEE) Computer Society Chapter of Princeton / Central Jersey. 11:00am Building Modern Data Streaming Apps presented by Timothy Spann Building Modern Data Streaming Apps In this session, I will show you some best practices I have discovered over the last seven years in building data streaming applications including IoT, CDC, Logs, and more. In my modern approach, we utilize several Apache frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Pulsar. From there we build streaming ETL with Spark, enhance events with Pulsar Functions for ML and enrichment. We build continuous queries against our topics with Flink SQL. Timothy Spann Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Pulsar, Apache Kafka, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.

2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...

Timothy Spann

2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipelines https://www.meetup.com/futureofdata-newyork/events/298660453/ Unlocking Financial Data with Real-Time Pipelines (Flink Analytics on Stocks with SQL ) By Timothy Spann Financial institutions thrive on accurate and timely data to drive critical decision-making processes, risk assessments, and regulatory compliance. However, managing and processing vast amounts of financial data in real-time can be a daunting task. To overcome this challenge, modern data engineering solutions have emerged, combining powerful technologies like Apache Flink, Apache NiFi, Apache Kafka, and Iceberg to create efficient and reliable real-time data pipelines. In this talk, we will explore how this technology stack can unlock the full potential of financial data, enabling organizations to make data-driven decisions swiftly and with confidence. Introduction: Financial institutions operate in a fast-paced environment where real-time access to accurate and reliable data is crucial. Traditional batch processing falls short when it comes to handling rapidly changing financial markets and responding to customer demands promptly. In this talk, we will delve into the power of real-time data pipelines, utilizing the strengths of Apache Flink, Apache NiFi, Apache Kafka, and Iceberg, to unlock the potential of financial data. I will be utilizing NiFi 2.0 with Python and Vector Databases. Timothy Spann Principal Developer Advocate, Cloudera Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science. https://twitter.com/PaaSDev https://www.linkedin.com/in/timothyspann/ https://medium.com/@tspann https://github.com/tspannhw/FLiPStackWeekly/

Key Database Criteria for Cloud Applications

NuoDB

Watch a replay of the webinar: https://www.youtube.com/watch?v=BtzPgLBy56w 451 Research and NuoDB outline the key database criteria for cloud applications. Explore how applications deployed in the cloud require a combination of standard functionality, such as ANSI SQL, and new capabilities specifically required to take full advantage of cloud economics, such as elastic scalability and continuous availability.

Benefits of the Azure cloud

James Serra

The cloud is all the rage. Does it live up to its hype? What are the benefits of the cloud? Join me as I discuss the reasons so many companies are moving to the cloud and demo how to get up and running with a VM (IaaS) and a database (PaaS) in Azure. See why the ability to scale easily, the quickness that you can create a VM, and the built-in redundancy are just some of the reasons that moving to the cloud a “no brainer”. And if you have an on-prem datacenter, learn how to get out of the air-conditioning business!

Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...

confluent

(Bruno Simic, Solutions Engineer, Couchbase) Breakout during Confluent’s streaming event in Munich. This three-day hands-on course focused on how to build, manage, and monitor clusters using industry best-practices developed by the world’s foremost Apache Kafka™ experts. The sessions focused on how Kafka and the Confluent Platform work, how their main subsystems interact, and how to set up, manage, monitor, and tune your cluster.

Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data

Timothy Spann

Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data Building Real-time Pipelines with FLaNK: A Case Study with Transit Data In this session, we will explore the powerful combination of Apache Flink, Apache NiFi, and Apache Kafka for building real-time data processing pipelines. We will present a case study using the FLaNK-MTA project, which leverages these technologies to process and analyze real-time data from the New York City Metropolitan Transportation Authority (MTA). By integrating Flink, NiFi, and Kafka, FLaNK-MTA demonstrates how to efficiently collect, transform, and analyze high-volume data streams, enabling timely insights and decision-making. Takeaways: Understanding the integration of Apache Flink, Apache NiFi, and Apache Kafka for real-time data processing Insights into building scalable and fault-tolerant data processing pipelines Best practices for data collection, transformation, and analytics with FLaNK-MTA as a reference Knowledge of use cases and potential business impact of real-time data processing pipelines https://github.com/tspannhw/FLaNK-MTA/tree/main https://medium.com/@tspann/finding-the-best-way-around-7491c76ca4cb apache nifi apache kafka apache flink apache iceberg apache parquet real-time streaming tim spann principal developer advocate cloudera datainmotion.dev

Cloud Computing and the Promise of Everything as a Service

Lew Tucker

Why Business Intelligence Should Consider Agile Modern Data Delivery Platform

syed_javed

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...

Timothy Spann

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...

Timothy Spann

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI Discussion on Vector Databases, Unstructured Data and AI https://www.meetup.com/unstructured-data-meetup-new-york/ This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.

Similar to Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo

Unconference Round Table Notes

Timothy Spann

Streaming Visualization

Guido Schmutz

Streaming Data and Stream Processing with Apache Kafka

confluent

Time's Up! Getting Value from Big Data Now

Eric Kavanagh

Leveraging cloud database connectors to automate analytics in alteryx

Grazitti Interactive

Big Data 2.0: ETL & Analytics: Implementing a next generation platform

Caserta

Benefits of the Azure Cloud

Caserta

Confluent kafka meetupseattle jan2017

Nitin Kumar

Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...

VMware Tanzu

Your Agile, Modern Data Delivery Platform

syed_javed

Lyftron - A Modern Data Hub Platform for Faster Analytics

MohdAmzad1

The path to success with Graph Database and Graph Data Science

Neo4j

ITPC Building Modern Data Streaming Apps

Timothy Spann

2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...

Timothy Spann

Key Database Criteria for Cloud Applications

NuoDB

Benefits of the Azure cloud

James Serra

Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...

confluent

Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data

Timothy Spann

Cloud Computing and the Promise of Everything as a Service

Lew Tucker

Why Business Intelligence Should Consider Agile Modern Data Delivery Platform

syed_javed

Similar to Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo (20)

Unconference Round Table Notes

Streaming Visualization

Streaming Data and Stream Processing with Apache Kafka

Time's Up! Getting Value from Big Data Now

Leveraging cloud database connectors to automate analytics in alteryx

Big Data 2.0: ETL & Analytics: Implementing a next generation platform

Benefits of the Azure Cloud

Confluent kafka meetupseattle jan2017

Delivering the power of data using Spring Cloud DataFlow and DataStax Enterpr...

Your Agile, Modern Data Delivery Platform

Lyftron - A Modern Data Hub Platform for Faster Analytics

The path to success with Graph Database and Graph Data Science

ITPC Building Modern Data Streaming Apps

2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...

Key Database Criteria for Cloud Applications

Benefits of the Azure cloud

Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...

Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data

Cloud Computing and the Promise of Everything as a Service

Why Business Intelligence Should Consider Agile Modern Data Delivery Platform

More from Timothy Spann

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...

Timothy Spann

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...

Timothy Spann

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK

Timothy Spann

Building Real-Time Pipelines With FLaNK Timothy Spann, Principal Developer Advocate, Streaming - Cloudera Future of Data meetup, startup grind, AI Camp The combination of Apache Flink, Apache NiFi, and Apache Kafka for building real-time data processing pipelines is extremely powerful, as demonstrated by this case study using the FLaNK-MTA project. The project leverages these technologies to process and analyze real-time data from the New York City Metropolitan Transportation Authority (MTA). FLaNK-MTA demonstrates how to efficiently collect, transform, and analyze high-volume data streams, enabling timely insights and decision-making. Apache NiFi Apache Kafka Apache Flink Apache Iceberg LLM Generative AI Slack Postgresql

Generative AI on Enterprise Cloud with NiFi and Milvus

Timothy Spann

Gen AI on Enterprise Cloud Apache NiFi Milvus Apache Kafka Apache Flink Cloudera Machine Learning Cloudera DataFlow https://medium.com/@tspann/building-a-milvus-connector-for-nifi-34372cb3c7fa https://www.meetup.com/futureofdata-princeton/events/300737266/ https://lu.ma/q7pcfyjn?source=post_page-----34372cb3c7fa--------------------------------&tk=TTyakY If you're interested in working with Generative AI on the cloud, this virtual workshop is for you. Tim Spann from Cloudera and Yujian Tang from Zilliz will cover how you can implement your own GenAI workflows on the cloud at enterprise scale. 9:00 - 9:05: Intro 9:05 - 9:15: What is Milvus 9:15 - 9:25: Cloudera Development Platform 9:25 - 10:00: Demo Location https://www.youtube.com/watch?v=IfWIzKsoHnA https://github.com/tspannhw/SpeakerProfile https://www.linkedin.com/in/yujiantang/

April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024

Timothy Spann

Real-Time AI Streaming - AI Max Princeton

Timothy Spann

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines

Timothy Spann

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines https://www.youtube.com/watch?v=Yeua8NlzQ3Y https://www.conf42.com/Large_Language_Models_LLMs_2024_Tim_Spann_generative_ai_streaming Adding Generative AI to Real-Time Streaming Pipelines Abstract Let’s build streaming pipelines that convert streaming events into prompts, call LLMs, and process the results. Summary Tim Spann: My talk is adding generative AI to real time streaming pipelines. I'm going to discuss a couple of different open source technologies. We'll touch on Kafka, Nifi, Flink, Python, Iceberg. All the slides, all the code and GitHub are out there. Llm, if you didn't know, is rapidly evolving. There's a lot of different ways to interact with models. That enrichment, transformation, processing really needs tools. The amount of models and projects and software that are available is massive. Nifi supports hundreds of different inputs and can convert them on the fly. Great way to distribute your data quickly to whoever needs it without duplication, without tight coupling. Fun to find new things to integrate into. So what we can do is, well, I want to get a meetup chat going. I have a processor here that just listens for events as they come from slack. And then I'm going to clean it up, add a couple fields and push that out to slack. Every model is a little bit of different tweaking. Nifi acts as a whole website. And as you see here, it can be get, post, put, whatever you want. We send that response back to flink and it shows up here. Thank you for attending this talk. I'm going to be speaking at some other events very shortly. Transcript This transcript was autogenerated. To make changes, submit a PR. Hi, Tim Spann here. My talk is adding generative AI to real time streaming pipelines, and we're here for the large language model conference at Comp 42, which is always a nice one, great place to be. I'm going to discuss a couple of different open source technologies that work together to enable you to build real time pipelines using large language models. So we'll touch on Kafka, Nifi, Flink, Python, Iceberg, and I'll show you a little bit of each one in the demos. I've been working with data machine learning, streaming IoT, some other things for a number of years, and you could contact me at any of these places, whether Twitter or whatever it's called, some different blogs, or in person at my meetups and at different conferences around the world. I do a weekly newsletter, cover streaming ML, a lot of LLM, open source, Python, Java, all kinds of fun stuff, as I mentioned, do a bunch of different meetups. They are not just in the east coast of the US, they are available virtually live, and I also put them on YouTube, and if you need them somewhere else, let me know. We publish all the slides, all the code and GitHub. Everything you need is out there. Let's get into the talk. Llm, if you didn't know, is rapidly evolving. While you're typing down the things that you use, it

2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...

Timothy Spann

28March2024-Codeless-Generative-AI-Pipelines

Timothy Spann

28March2024-Codeless-Generative-AI-Pipelines https://www.meetup.com/futureofdata-princeton/events/299440871/ https://www.meetup.com/real-time-analytics-meetup-ny/events/299290822/ ******Note***** The event is seat-limited, therefore please complete your registration here. Only people completing the form will be able to attend. ----------------------- We're excited to invite you to join us in-person, for a Real-Time Analytics exploration! Join us for an evening of insights, networking as we delve into the OSS technologies shaping the field! Agenda: 05:30-06:00: Pizza and friends 06:00- 06:40: Codeless GenAI Pipelines with Flink, Kafka, NiFi 06:40- 07:20 Real-Time Analytics in the Corporate World: How Apache Pinot® Powers Industry Leaders 07:20-07:30 QNA Codeless GenAI Pipelines with Flink, Kafka, NiFi | Tim Spann, Cloudera Explore the power of real-time streaming with GenAI using Apache NiFi. Learn how NiFi simplifies data engineering workflows, allowing you to focus on creativity over technical complexities. I'll guide you through practical examples, showcasing NiFi's automation impact from ingestion to delivery. Whether you're a seasoned data engineer or new to GenAI, this talk offers valuable insights into optimizing workflows. Join us to unlock the potential of real-time streaming and witness how NiFi makes data engineering a breeze for GenAI applications! Real-Time Analytics in the Corporate World: How Apache Pinot® Powers Industry Leaders | Viktor Gamov, StarTree Explore how industry leaders like LinkedIn, Uber Eats, and Stripe are mastering real-time data with Viktor as your guide. Discover how Apache Pinot transforms data into actionable insights instantly. Viktor will showcase Pinot's features, including the Star-Tree Index, and explain why it's a game-changer in data strategy. This session is for everyone, from data geeks to business gurus, eager to uncover the future of tech. Join us and be wowed by the power of real-time analytics with Apache Pinot! ------- Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more.

TCFPro24 Building Real-Time Generative AI Pipelines

Timothy Spann

https://princetonacm.acm.org/tcfpro/ 18th Annual IEEE IT Professional Conference (ITPC) Armstrong Hall at The College of New Jersey Friday, March 15th, 2024 | 10:00 AM to 5:00 PM IT Professional Conference at Trenton Computer Festival IEEE Information Technology Professional Conference on Friday, March 15th, 2024 TCFPro24 Building Real-Time Generative AI Pipelines Building Real-Time Generative AI Pipelines In this talk, Tim will delve into the exciting realm of building real-time generative AI pipelines with streaming capabilities. The discussion will revolve around the integration of cutting-edge technologies to create dynamic and responsive systems that harness the power of generative algorithms. From leveraging streaming data sources to implementing advanced machine learning models, the presentation will explore the key components necessary for constructing a robust real-time generative AI pipeline. Practical insights, use cases, and best practices will be shared, offering a comprehensive guide for developers and data scientists aspiring to design and implement dynamic AI systems in a streaming environment. Tim will show a live demo showing we can use Apache NiFi to provide a live chat between a person in Slack and several LLM models all orchestrated with Apache NiFi, Apache Kafka and Python. We will use RAG against Chroma and Pinecone vector data stores, Hugging Face and WatsonX.AI LLM, and add additional context with NiFi lookups of stocks, weather and other data streams in real-time. Timothy Spann Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Pulsar, Apache Kafka, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.

2024 Build Generative AI for Non-Profits

Timothy Spann

Conf42-Python-Building Apache NiFi 2.0 Python Processors

Timothy Spann

Conf42-Python-Building Apache NiFi 2.0 Python Processors https://www.conf42.com/Python_2024_Tim_Spann_apache_nifi_2_processors Building Apache NiFi 2.0 Python Processors Abstract Let’s enhance real-time streaming pipelines with smart Python code. Adding code for vector databases and LLM. Summary Tim Spann: I'm going to be talking today, be building Apache 9520 Python processors. One of the main purposes of supporting Python in the streaming tool Apache Nifi is to interface with new machine learning and AI and Gen AI. He says Python is a real game changer for Cloudera. You're just going to add some metadata around it. It's a great way to pass a file along without changing it too substantially. We really need you to have Python 310 and again JDK 21 on your machine. You got to be smart about how you use these models. There are a ton of python processors available. You can use them in multiple ways. We're still in the early world of Python processors, so now's the time to start putting yours out there. Love to see a lot of people write their own. When we are parsing documents here, again, this is the Python one I'm picking PDF. Lots of different things you could do. If you're interested on writing your own python code for Apache Nifi, definitely reach out and thank.

Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...

Timothy Spann

Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg with Stock Data and LLM Abstract In this talk, we’ll discuss how to use Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg to process and analyze stock data. We demonstrated the ingestion, processing, and analysis of stock data. Additionally, we illustrated how to use an LLM to generate predictions from the analyzed data. Karin Wolok Developer Relations, Dev Marketing, and Community Programming @ Project Elevate Karin Wolok's LinkedIn account Karin Wolok's twitter account Tim Spann Principal Developer Advocate @ Cloudera Tim Spann's LinkedIn account Tim Spann's twitter account https://www.conf42.com/Python_2024_Karin_Wolok_Tim_Spann_nifi__kafka_risingwave_iceberg_llm

2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines

Timothy Spann

2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines https://www.aicamp.ai/event/eventdetails/W2024022214 apache nifi llm generative ai gen ai ml dl machine learning apache kafka apache flink postgresql python AI Meetup (NYC): GenAI, LLMs, ML and Data Feb 22, 05:30 PM EST Welcome to the monthly in-person AI meetup in New York City, in collaboration with Microsoft. Join us for deep dive tech talks on AI, GenAI, LLMs and machine learning, food/drink, networking with speakers and fellow developers Agenda: * 5:30pm~6:00pm: Checkin, Food/drink and networking * 6:00pm~6:10pm: Welcome/community update * 6:10pm~8:30pm: Tech talks * 8:30pm: Q&A, Open discussion Tech Talk: Searching and Reasoning Over Multimedia Data with Vector Databases and LMMs Speaker: Zain Hasan (Weaviate LinkedIn) Abstract: In this talk, Zain Hasan will discuss how we can use open-source multimodal embedding models in conjunction with large generative multimodal models that can that can see, hear, read, and feel data(!), to perform cross-modal search(searching audio with images, videos with text etc.) and multimodal retrieval augmented generation (MM-RAG) at the billion-object scale with the help of open source vector databases. I will also demonstrate, with live code demos, how being able to perform this cross-modal retrieval in real-time can enables users to use LLMs that can reason over their enterprise multimodal data. This talk will revolve around how we can scale the usage of multimodal embedding and generative models in production. Tech Talk: Codeless Generative AI Pipelines Speaker: Timothy Spann (Cloudera LinkedIn) Abstract: Join us for an insightful talk on leveraging the power of real-time streaming tools, specifically Apache NiFi, to revolutionize GenAI data engineering. In this session, we’ll explore how the integration of Apache NiFi can automate the entire process of prompt building, making it a seamless and efficient task. Speakers/Topics: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics Sponsors: We are actively seeking sponsors to support our community. Whether it is by offering venue spaces, providing food/drink, or cash sponsorship. Sponsors will have the chance to speak at the meetups, receive prominent recognition, and gain exposure to our extensive membership base of 20,000+ local or 300K+ developers worldwide. Venue: Microsoft NYC - Times Square, 11 Times Square, New York, NY 10036 Room Name: Central Park West 6501 Community on Slack/Discord - Event chat: chat and connect with speakers and attendees - Sharing blogs, events, job openings, projects collaborations Join Slack (search and join the #newyork channel) | Join Discord

DBA Fundamentals Group: Continuous SQL with Kafka and Flink

Timothy Spann

DBA Fundamentals Group: Continuous SQL with Kafka and Flink 20-Feb-2024 In this talk, I will walk through how someone can set up and run continuous SQL queries against Kafka topics utilizing Apache Flink. We will walk through creating Kafka topics, schemas, and publishing data. We will then cover consuming Kafka data, joining Kafka topics, and inserting new events into Kafka topics as they arrive. This basic overview will show hands-on techniques, tips, and examples of how to do this. Tim Spann Tim Spann is the Principal Developer Advocate for Data in Motion @ Cloudera where he works with Apache Kafka, Apache Flink, Apache NiFi, Apache Iceberg, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.

NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...

Timothy Spann

OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines

Timothy Spann

OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines Unlocking Financial Data with Real-Time Pipelines Financial institutions thrive on accurate and timely data to drive critical decision-making processes, risk assessments, and regulatory compliance. However, managing and processing vast amounts of financial data in real-time can be a daunting task. To overcome this challenge, modern data engineering solutions have emerged, combining powerful technologies like Apache Flink, Apache NiFi, Apache Kafka, and Iceberg to create efficient and reliable real-time data pipelines. In this talk, we will explore how this technology stack can unlock the full potential of financial data, enabling organizations to make data-driven decisions swiftly and with confidence. Introduction: Financial institutions operate in a fast-paced environment where real-time access to accurate and reliable data is crucial. Traditional batch processing falls short when it comes to handling rapidly changing financial markets and responding to customer demands promptly. In this talk, we will delve into the power of real-time data pipelines, utilizing the strengths of Apache Flink, Apache NiFi, Apache Kafka, and Iceberg, to unlock the potential of financial data. Key Points to be Covered: Introduction to Real-Time Data Pipelines: a. The limitations of traditional batch processing in the financial domain. b. Understanding the need for real-time data processing. Apache Flink: Powering Real-Time Stream Processing: a. Overview of Apache Flink and its role in real-time stream processing. b. Use cases for Apache Flink in the financial industry. c. How Flink enables fast, scalable, and fault-tolerant processing of streaming financial data. Apache Kafka: Building Resilient Event Streaming Platforms: a. Introduction to Apache Kafka and its role as a distributed streaming platform. b. Kafka's capabilities in handling high-throughput, fault-tolerant, and real-time data streaming. c. Integration of Kafka with financial data sources and consumers. Apache NiFi: Data Ingestion and Flow Management: a. Overview of Apache NiFi and its role in data ingestion and flow management. b. Data integration and transformation capabilities of NiFi for financial data. c. Utilizing NiFi to collect and process financial data from diverse sources. Iceberg: Efficient Data Lake Management: a. Understanding Iceberg and its role in managing large-scale data lakes. b. Iceberg's schema evolution and table-level metadata capabilities. c. How Iceberg simplifies data lake management in financial institutions. Real-World Use Cases: a. Real-time fraud detection using Flink, Kafka, and NiFi. b. Portfolio risk analysis with Iceberg and Flink. c. Streamlined regulatory reporting leveraging all four technologies. Best Practices and Considerations: a. Architectural considerations when building real-time financial data pipelines. b. Ensuring data integrity, security, and compliance in real-time pipelines. c. Scalability an

Building Real-Time Travel Alerts

Timothy Spann

Building Real-time Travel Alerts In this session, we will walk through how to build a complete streaming application to send alerts based on travel advisories from public data. We will also join in other data sources of relevance and push out alerts. We will show you how to build this streaming application with Apache NiFi, Apache Kafka, and Apache Flink and show you when/why/how, and what to build to maximize performance, productivity, and ease of development. Let's get streaming. Apache Flink Apache Kafka Apache NiFi FLaNK Stack Tim Spann Big Data Conference Europe 2023

JConWorld_ Continuous SQL with Kafka and Flink

Timothy Spann

JConWorld: Continuous SQL with Kafka and Flink In this talk, I will walk through how someone can setup and run continous SQL queries against Kafka topics utilizing Apache Flink. We will walk through creating Kafka topics, schemas and publishing data. We will then cover consuming Kafka data, joining Kafka topics and inserting new events into Kafka topics as they arrive. This basic over view will show hands-on techniques, tips and examples of how to do this. Tim Spann is the Principal Developer Advocate for Data in Motion @ Cloudera where he works with Apache Kafka, Apache Flink, Apache NiFi, Apache Iceberg, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science. https://www.datainmotion.dev/p/about-me.html https://dzone.com/users/297029/bunkertor.html https://www.youtube.com/channel/UCDIDMDfje6jAvNE8DGkJ3_w?view_as=subscriber

[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines

Timothy Spann

[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines https://dssconf.pl/en/#agenda-section Integrating LLM with Streaming Data Pipelines Timothy Spann, Principal Developer Advocate, Cloudera APACHE NIFI, APACHE FLINK, APACHE KAFKA, LLM, HUGGINGFACE, REST, STREAMING LESS In this talk and demo I will walk through how to add LLMs to your streaming pipelines by integration through Apache NiFi. https://github.com/tspannhw/FLaNK-watsonx.ai Cloudera streaming llm generative ai slack to slack

More from Timothy Spann (20)

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK

Generative AI on Enterprise Cloud with NiFi and Milvus

April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024

Real-Time AI Streaming - AI Max Princeton

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines

2024 XTREMEJ_ Building Real-time Pipelines with FLaNK_ A Case Study with Tra...

28March2024-Codeless-Generative-AI-Pipelines

TCFPro24 Building Real-Time Generative AI Pipelines

2024 Build Generative AI for Non-Profits

Conf42-Python-Building Apache NiFi 2.0 Python Processors

Conf42Python -Using Apache NiFi, Apache Kafka, RisingWave, and Apache Iceberg...

2024 Feb AI Meetup NYC GenAI_LLMs_ML_Data Codeless Generative AI Pipelines

DBA Fundamentals Group: Continuous SQL with Kafka and Flink

NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...

OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines

Building Real-Time Travel Alerts

JConWorld_ Continuous SQL with Kafka and Flink

[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines

Recently uploaded

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation

Boston Institute of Analytics

Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/

SOCRadar Germany 2024 Threat Landscape Report

SOCRadar

As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape. In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity. 🔑 Key findings include: 🔍 Increased frequency and complexity of cyber threats. 🔍 Escalation of state-sponsored and criminally motivated cyber operations. 🔍 Active dark web exchanges of malicious tools and tactics. Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities. This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.

社内勉強会資料_LLM Agents　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　.

NABLAS株式会社

Malana- Gimlet Market Analysis (Portfolio 2)

TravisMalana

一比一原版(UVic毕业证)维多利亚大学毕业证成绩单

ukgaet

UVic毕业证【微信95270640】（维多利亚大学毕业证成绩单本科学历）Q微信95270640(补办UVic学位文凭证书)维多利亚大学留信网学历认证怎么办理维多利亚大学毕业证成绩单精仿本科学位证书硕士文凭证书认证Seneca College diplomaoffer,Transcript办理硕士学位证书造假维多利亚大学假文凭学位证书制作UVic本科毕业证书硕士学位证书精仿维多利亚大学学历认证成绩单修改制作，办理真实认证、留信认证、使馆公证、购买成绩单，购买假文凭，购买假学位证，制造假国外大学文凭、毕业公证、毕业证明书、录取通知书、Offer、在读证明、雅思托福成绩单、假文凭、假毕业证、请假条、国际驾照、网上存档可查！【实体公司】办维多利亚大学维多利亚大学毕业证文凭证书学历认证学位证文凭认证办留信网认证办留服认证办教育部认证（网上可查实体公司专业可靠） — — — 留学归国服务中心 — — - 【主营项目】一.维多利亚大学毕业证成绩单使馆认证教育部认证成绩单等！二.真实使馆公证(即留学回国人员证明,不成功不收费) 三.真实教育部学历学位认证（教育部存档！教育部留服网站永久可查）四.办理各国各大学文凭(一对一专业服务,可全程监控跟踪进度) 国外毕业证学位证成绩单办理流程： 1客户提供维多利亚大学维多利亚大学毕业证文凭证书办理信息：姓名生日专业学位毕业时间等（如信息不确定可以咨询顾问：我们有专业老师帮你查询）； 2开始安排制作毕业证成绩单电子图； 3毕业证成绩单电子版做好以后发送给您确认； 4毕业证成绩单电子版您确认信息无误之后安排制作成品； 5成品做好拍照或者视频给您确认； 6快递给客户（国内顺丰国外DHLUPS等快读邮寄）。专业服务请勿犹豫联系我！本公司是留学创业和海归创业者们的桥梁。一次办理终生受用一步到位高效服务。详情请在线咨询办理,欢迎有诚意办理的客户咨询!洽谈。招聘代理：本公司诚聘英国加拿大澳洲新西兰美国法国德国新加坡各地代理人员如果你有业余时间有兴趣就请联系我们咨询顾问：+微信:95270640刀劈开抑或用拳头砸开每人抱起一大块就啃啃得满嘴满脸猴屁股般的红艳大家一个劲地指着对方吃吃地笑瓜裂得古怪奇形怪状却丝毫不影响瓜味甜丝丝的满嘴生津遍地都是瓜横七竖八的活像掷满了一地的大石块摘走二三只爷爷是断然发现不了的即便发现爷爷也不恼反而教山娃辨认孰熟孰嫩孰甜孰淡名义上是护瓜往往在瓜棚里坐上一刻饱吃一顿后山娃就领着阿黑漫山遍野地跑阿黑是一条黑色的大猎狗挺机灵的是山娃多年的忠实伙伴平时山娃上学阿黑也静

一比一原版(BU毕业证)波士顿大学毕业证成绩单

ewymefz

BU毕业证【微信95270640】购买（波士顿大学毕业证成绩单硕士学历）Q微信95270640代办BU学历认证留信网伪造波士顿大学学位证书精仿波士顿大学本科/硕士文凭证书补办波士顿大学 diplomaoffer,Transcript购买波士顿大学毕业证成绩单购买BU假毕业证学位证书购买伪造波士顿大学文凭证书学位证书,专业办理雅思、托福成绩单，学生ID卡，在读证明，海外各大学offer录取通知书，毕业证书，成绩单，文凭等材料:1:1完美还原毕业证、offer录取通知书、学生卡等各种在读或毕业材料的防伪工艺（包括烫金、烫银、钢印、底纹、凹凸版、水印、防伪光标、热敏防伪、文字图案浮雕，激光镭射，紫外荧光，温感光标）学校原版上有的工艺我们一样不会少，不论是老版本还是最新版本，都能保证最高程度还原，力争完美以求让所有同学都能享受到完美的品质服务。专业为留学生办理波士顿大学波士顿大学毕业证offer【100%存档可查】留学全套申请材料办理。本公司承诺所有毕业证成绩单成品全部按照学校原版工艺对照一比一制作和学校一样的羊皮纸张保证您证书的质量！如果你回国在学历认证方面有以下难题请联系我们我们将竭诚为你解决认证瓶颈 1所有材料真实但资料不全无法提供完全齐整的原件。【如：成绩单丶毕业证丶回国证明等材料中有遗失的。】 2获得真实的国外最终学历学位但国外本科学历就读经历存在问题或缺陷。【如：国外本科是教育部不承认的或者是联合办学项目教育部没有备案的或者外本科没有正常毕业的。】 3学分转移联合办学等情况复杂不知道怎么整理材料的。时间紧迫自己不清楚递交流程的。如果你是以上情况之一请联系我们我们将在第一时间内给你免费咨询相关信息。我们将帮助你整理认证所需的各种材料.帮你解决国外学历认证难题。国外波士顿大学波士顿大学毕业证offer办理方法： 1客户提供办理信息：姓名生日专业学位毕业时间等（如信息不确定可以咨询顾问：我们有专业老师帮你查询波士顿大学波士顿大学毕业证offer）； 2开始安排制作波士顿大学毕业证成绩单电子图； 3波士顿大学毕业证成绩单电子版做好以后发送给您确认； 4波士顿大学毕业证成绩单电子版您确认信息无误之后安排制作成品； 5波士顿大学成品做好拍照或者视频给您确认； 6快递给客户（国内顺丰国外DHLUPS等快读邮寄）。二条巴掌般大的裤衩衩走出泳池山娃感觉透身粘粘乎乎散发着药水味有点痒山娃顿时留恋起家乡的小河潺潺活水清凉无比日子就这样孤寂而快乐地过着寂寞之余山娃最神往最开心就是晚上无论多晚多累父亲总要携山娃出去兜风逛夜市流光溢彩人潮涌动的都市夜生活总让山娃目不暇接惊叹不已父亲老问山娃想买什么想吃什么山娃知道父亲赚钱很辛苦除了书籍和文具山娃啥也不要能牵着父亲的手满城闲逛他已心满意足了父亲连挑了三套童装叫山娃试穿山伸

Best best suvichar in gujarati english meaning of this sentence as Silk road ...

AbhimanyuSinha9

Criminal IP - Threat Hunting Webinar.pdf

Criminal IP

Investigate & Recover / StarCompliance.io / Crypto_Crimes

StarCompliance.io

StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft. Our Services Include: Reporting to Tracking Authorities: We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them. Assistance with Filing Police Reports: We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window. Launching the Refund Process: Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served. At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.

Jpolillo Amazon PPC - Bid Optimization Sample

James Polillo

Ch03-Managing the Object-Oriented Information Systems Project a.pdf

haila53

Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...

Subhajit Sahu

Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.

一比一原版(NYU毕业证)纽约大学毕业证成绩单

ewymefz

NYU毕业证【微信95270640】《如何办理NYU毕业证纽约大学文凭学历》【Q微信95270640】《纽约大学文凭学历证书》《纽约大学毕业证书与成绩单样本图片》毕业证书补办 Fake Degree做学费单《毕业证明信-推荐信》成绩单，录取通知书，Offer，在读证明，雅思托福成绩单，真实大使馆教育部认证，回国人员证明，留信网认证。网上存档永久可查！【本科硕士】纽约大学纽约大学毕业证学位证（GPA修改）；学历认证（教育部认证）；大学Offer录取通知书留信认证使馆认证；雅思语言证书等高仿类证书。办理流程： 1客户提供办理纽约大学纽约大学毕业证学位证信息：姓名生日专业学位毕业时间等（如信息不确定可以咨询顾问：我们有专业老师帮你查询）； 2开始安排制作毕业证成绩单电子图； 3毕业证成绩单电子版做好以后发送给您确认； 4毕业证成绩单电子版您确认信息无误之后安排制作成品； 5成品做好拍照或者视频给您确认； 6快递给客户（国内顺丰国外DHLUPS等快读邮寄）真实网上可查的证明材料 1教育部学历学位认证留服官网真实存档可查永久存档。 2留学回国人员证明（使馆认证）使馆网站真实存档可查。我们对海外大学及学院的毕业证成绩单所使用的材料尺寸大小防伪结构（包括：纽约大学纽约大学毕业证学位证隐形水印阴影底纹钢印LOGO烫金烫银LOGO烫金烫银复合重叠。文字图案浮雕激光镭射紫外荧光温感复印防伪）都有原版本文凭对照。质量得到了广大海外客户群体的认可同时和海外学校留学中介做到与时俱进及时掌握各大院校的（毕业证成绩单资格证结业证录取通知书在读证明等相关材料）的版本更新信息能够在第一时间掌握最新的海外学历文凭的样版尺寸大小纸张材质防伪技术等等并在第一时间收集到原版实物以求达到客户的需求。本公司还可以按照客户原版印刷制作且能够达到客户理想的要求。有需要办理证件的客户请联系我们在线客服中心微信：95270640 或咨询在线已转到了尽头他的城市生活也将划上一个不很圆满的句号了值得庆幸的是山娃早记下了他们的学校和联系方式说也奇怪在山娃离城的头一天父亲居然请假陪山娃耍了一天那一天父亲陪着山娃辗转长隆水上乐园疯了一整天水上漂流高空冲浪看大马戏大凡里面有的父亲都带着他去疯一把山娃算了算这一次足足花了老爸元够他挣上半个月的山娃很不解一向节俭的父亲啥时变得如此阔绰大方大把大把掏钱时居然连眉头也不皱一下车票早买好了直达卧铺车得经子

FP Growth Algorithm and its Applications

MaleehaSheikh2

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单

nscud

CBU毕业证【微信95270640】《如何办理不列颠海角大学毕业证认证》【办证Q微信95270640】《不列颠海角大学文凭毕业证制作》《CBU学历学位证书哪里买》办理不列颠海角大学学位证书扫描件、办理不列颠海角大学雅思证书！国际留学归国服务中心《如何办不列颠海角大学毕业证认证》《CBU学位证书扫描件哪里买》实体公司，注册经营，行业标杆，精益求精！ 1:1完美还原海外各大学毕业材料上的工艺：水印阴影底纹钢印LOGO烫金烫银LOGO烫金烫银复合重叠。文字图案浮雕激光镭射紫外荧光温感复印防伪。可办理以下真实不列颠海角大学存档留学生信息存档认证： 1不列颠海角大学真实留信网认证（网上可查永久存档无风险百分百成功入库）； 2真实教育部认证（留服）等一切高仿或者真实可查认证服务（暂时不可办理）； 3购买英美真实学籍（不用正常就读直接出学历）； 4英美一年硕士保毕业证项目（保录取学校挂名不用正常就读保毕业）留学本科/硕士毕业证书成绩单制作流程： 1客户提供办理信息：姓名生日专业学位毕业时间等（如信息不确定可以咨询顾问：我们有专业老师帮你查询不列颠海角大学不列颠海角大学本科学位证成绩单）； 2开始安排制作不列颠海角大学毕业证成绩单电子图； 3不列颠海角大学毕业证成绩单电子版做好以后发送给您确认； 4不列颠海角大学毕业证成绩单电子版您确认信息无误之后安排制作成品； 5不列颠海角大学成品做好拍照或者视频给您确认； 6快递给客户（国内顺丰国外DHLUPS等快读邮寄） — — — — — — — — — — — 《文凭顾问Q/微：95270640》这么大这么美的地方赚大钱高楼大厦鳞次栉比大街小巷人潮涌动山娃一路张望一路惊叹他发现城里的桥居然层层叠叠扭来扭去桥下没水却有着水一般的车水马龙山娃惊诧于城里的公交车那么大那么美不用买票乖乖地掷下二枚硬币空调享受还能坐着看电视呢屡经辗转山娃终于跟着父亲到家了山娃没想到父亲城里的家会如此寒碜更没料到父亲的城里竟有如此简陋的鬼地方父亲的家在高楼最底屋最下面很矮很黑是很不显眼的地下室父亲的家安在别人脚底下孰

tapal brand analysis PPT slide for comptetive data

theahmadsaood

一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单

ewymefz

UPenn毕业证【微信95270640】办理宾夕法尼亚大学毕业证原版一模一样、UPenn毕业证制作【Q微信95270640】《宾夕法尼亚大学毕业证购买流程》《UPenn成绩单制作》宾夕法尼亚大学毕业证书UPenn毕业证文凭宾夕法尼亚大学本科毕业证书,学历学位认证如何办理【留学国外学位学历认证、毕业证、成绩单、大学Offer、雅思托福代考、语言证书、学生卡、高仿教育部认证等一切高仿或者真实可查认证服务】代办国外（海外）英国、加拿大、美国、新西兰、澳大利亚、新西兰等国外各大学毕业证、文凭学历证书、成绩单、学历学位认证真实可查。办国外宾夕法尼亚大学宾夕法尼亚大学硕士学位证成绩单教育部学历学位认证留信认证大使馆认证留学回国人员证明修改成绩单信封申请学校offer录取通知书在读证明offer letter。快速办理高仿国外毕业证成绩单： 1宾夕法尼亚大学毕业证+成绩单+留学回国人员证明+教育部学历认证（全套留学回国必备证明材料给父母及亲朋好友一份完美交代）; 2雅思成绩单托福成绩单OFFER在读证明等留学相关材料（申请学校转学甚至是申请工签都可以用到）。 3.毕业证 #成绩单等全套材料从防伪到印刷从水印到钢印烫金高精仿度跟学校原版100%相同。专业服务请勿犹豫联系我！联系人微信号：95270640诚招代理：本公司诚聘当地代理人员如果你有业余时间有兴趣就请联系我们。国外宾夕法尼亚大学宾夕法尼亚大学硕士学位证成绩单办理过程： 1客户提供办理信息：姓名生日专业学位毕业时间等（如信息不确定可以咨询顾问：我们有专业老师帮你查询）； 2开始安排制作毕业证成绩单电子图； 3毕业证成绩单电子版做好以后发送给您确认； 4毕业证成绩单电子版您确认信息无误之后安排制作成品； 5成品做好拍照或者视频给您确认； 6快递给客户（国内顺丰国外DHLUPS等快读邮寄）。我们在哪里父母对我们的爱和思念为我们的生命增加了光彩给予我们自由追求的力量生活的力量我们也不忘感恩正因为这股感恩的线牵着我们使我们在一年的结束时刻义无反顾的踏上了回家的旅途人们常说父母恩最难回报愿我能以当年爸爸妈妈对待小时候的我们那样耐心温柔地对待我将渐渐老去的父母体谅他们以反哺之心奉敬父母以感恩之心孝顺父母哪怕只为父母换洗衣服为父母喂饭送汤按摩酸痛的腰背握着父母的手扶着他们一步一步地慢慢散步.娃

Empowering Data Analytics Ecosystem.pptx

benishzehra469

Show drafts volume_up Empowering the Data Analytics Ecosystem: A Laser Focus on Value The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem: 1. Democratize Access, Not Data: Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse. Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources. 2. Foster Collaboration with Clear Roles: Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities. Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together. 3. Leverage Advanced Analytics Strategically: AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis. Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems. 4. Prioritize Data Quality with Automation: Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues. Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors. 5. Cultivate a Data-Driven Mindset: Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making. Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action. Benefits of a Precise Ecosystem: Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency. Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights. Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement. Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation. By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.

Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...

John Andrews

SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation" Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults Description: Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project. Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...

Subhajit Sahu

Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.

Recently uploaded (20)

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation

SOCRadar Germany 2024 Threat Landscape Report

社内勉強会資料_LLM Agents　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　.

Malana- Gimlet Market Analysis (Portfolio 2)

一比一原版(UVic毕业证)维多利亚大学毕业证成绩单

一比一原版(BU毕业证)波士顿大学毕业证成绩单

Best best suvichar in gujarati english meaning of this sentence as Silk road ...

Criminal IP - Threat Hunting Webinar.pdf

Investigate & Recover / StarCompliance.io / Crypto_Crimes

Jpolillo Amazon PPC - Bid Optimization Sample

Ch03-Managing the Object-Oriented Information Systems Project a.pdf

Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...

一比一原版(NYU毕业证)纽约大学毕业证成绩单

FP Growth Algorithm and its Applications

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单

tapal brand analysis PPT slide for comptetive data

一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单

Empowering Data Analytics Ecosystem.pptx

Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...

Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo

1. November 2, 2023 | 9:00 AM – 6:00 PM INTEGRATING AI INTO REAL-TIME DATA PIPELINES

2. Integrating AI Into Real-Time Data Pipelines

3. 3 © 2023 Cloudera, Inc. All rights reserved. Streaming data Data at rest Change data capture Any DATA Real-time Processing • Analyze data in motion • Continuous monitoring • Trends and anomalies Data Lakehouse Data products Any BUSINESS EVENT Continuous Results • No-Code UI • Author once publish anywhere • Analytics lifecycle management for dev/ops Any DATA ANALYST AI models Event-driven apps Analytics apps Any DATA CONSUMER Data Relevance for Real-Time Applications

4. 4 © 2023 Cloudera, Inc. All rights reserved. INGEST PREPARE PUBLISH DATA SOURCES Internal Users (After Sales) External Systems ENTERPRISE LAKEHOUSE CAPABILITY VIEW INGESTION MESSAGE HUB STORAGE BATCH MANAGEMENT STREAM CONSUMPTION Closed Loop Systems SQL Stream Builder Machine Learning Data Visualization Workload Manager watsonx.data

5. 5 © 2023 Cloudera, Inc. All rights reserved. Cloudera’s Data in Motion Services Cloudera Offers Two Core Data-In-Motion Services: DataFlow & Stream Processing DATAFLOW — Powered by Apache NiFi, it enables developers to connect to any data source anywhere with any structure, process it, and deliver to any destination using a low-code authoring experience. CLOUDERA SDX — Secure, Monitor and Govern your Streaming workloads with the same tooling using Apache Ranger & Apache Atlas. STREAM PROCESSING — Powered by Apache Flink and Kafka, it provides a complete, enterprise-grade stream management and stateful processing solution. With support for industry standard interfaces like SQL, developers, data analysts, and data scientist can easily build a wide variety of hybrid real-time applications.

6. 6 © 2023 Cloudera, Inc. All rights reserved. Simpliﬁed Streaming Pipelines Connect to any data source anywhere, process and deliver to any destination Ingest Process Distribute Active Passive Route Filter Enrich Transform Data born in the cloud Data born outside the cloud Any destination Connectors Gateway Endpoint Connect & Pull Send Connectors Deliver

7. LLM USE CASE Vector DB AI Model Unstructured ﬁle types Data in Motion on Cloudera Data Platform (CDP) Capture, process & distribute any data, anywhere Other enterprise data Open Data Lakehouse Materialized Views Structured Sources Applications/API’s Streams

8. 8 © 2023 Cloudera, Inc. All rights reserved. Apache NiFi in a few numbers A very active project with a dynamic community & comparison with ACEU 2019 2800+ members on the Slack channel (535+ - 4 years ago) 475+ contributors on Github across the repositories (260+ - 4 years ago) 65 committers in the Apache NiFi community (45 - 4 years ago) Apache NiFi 1.23.2 is the latest release, NiFi 2.0 coming soon (NiFi 1.10 - 4 years ago) 14M+ docker pulls of the Apache NiFi image (1M+ - 4 years ago)

Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo

Recommended

Recommended

More Related Content

Similar to Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo

Similar to Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo (20)

More from Timothy Spann

More from Timothy Spann (20)

Recently uploaded

Recently uploaded (20)

Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo