Streaming AI Pipelines with
Apache NiFi and Snowflake
Tim Spann, Senior Solutions Engineer
Tim Spann
paasdev.bsky.social
@PaasDev // Blog: datainmotion.dev
Senior Solutions Engineer, Snowflake
NY/NJ/Philly - Cloud Data + AI Meetups
ex-Zilliz, ex-Pivotal, ex-Cloudera, ex-HPE,
ex-StreamNative, ex-EY, ex-Hortonworks.
https://medium.com/@tspann
https://github.com/tspannhw
This week in Apache NiFi, Apache Polaris,
Apache Flink, Apache Kafka, ML, AI,
Streamlit, Jupyter, Apache Iceberg, Python,
Java, LLM, GenAI, Snowflake, Unstructured
Data and Open Source friends.
https://bit.ly/32dAJft
DATA + AI + Streaming Weekly
How Snowflake and Apache
NiFi work with Streaming
Data and AI
Building
Streaming Data +
AI Pipelines
Requires a Team
Example Smart City Architecture
6
DATA
SOURCES
DATA
INTEGRATION
DATA
PLATFORM
DATA
CONSUMERS
Marketplace
Raw
Data
Modeled
Data
Snowpipe
Sensors
Transit Data
AI/ML & Apps
Weather
Traffic Data
SNOWSIGHT
Snowflake Cortex AI
Raw Data
DATA
FROM
THE
REAL
WORLD
I Can Haz
Data?
Camera Images
Apache NiFi
● From laptop to 1,000 nodes
● Ingest, Extract, Split
● Enrich, Transform
● Mature, 10 years+
● Any Data, Any Source
● LLM Calls
● Data Provenance
● Back Pressure
● Guaranteed Delivery
Unstructured Data
● Lots of formats
● Text, Documents, PDF
● Images, Videos, Audio
● Email, Slack, Teams
● Logs
● Binary Data Formats
● Zip
● Variants
Unstructured
● Open Data like Open AQ - Air
Quality Data
● Location, Time,Sensors
● Apache Avro, Parquet, Orc
● JSON and XML
● Hierarchical Data
● Logs
● Key-Value
Semi-Structured Data
https://docs.snowflake.com/en/sql-refe
rence/data-types-semistructured
Semi-structured
Structured Data
● Snowflake Tables
● Snowflake Hybrid Tables
● Apache Iceberg Tables
● Relational Tables
● Postgresql Tables
● CSV, TSV
Structured
Open LLM Options
● Arctic Instruct
● Arctic-embed-m-v2.0
● Llama-3.3-70b
● Mixtral-8x7b
● Llama3.1-405b
● Mistral-7b
● Deepseek-r1
Streaming AI Pipelines with Apache NiFi and Snowflake 2025

Streaming AI Pipelines with Apache NiFi and Snowflake 2025

  • 1.
    Streaming AI Pipelineswith Apache NiFi and Snowflake Tim Spann, Senior Solutions Engineer
  • 2.
    Tim Spann paasdev.bsky.social @PaasDev //Blog: datainmotion.dev Senior Solutions Engineer, Snowflake NY/NJ/Philly - Cloud Data + AI Meetups ex-Zilliz, ex-Pivotal, ex-Cloudera, ex-HPE, ex-StreamNative, ex-EY, ex-Hortonworks. https://medium.com/@tspann https://github.com/tspannhw
  • 3.
    This week inApache NiFi, Apache Polaris, Apache Flink, Apache Kafka, ML, AI, Streamlit, Jupyter, Apache Iceberg, Python, Java, LLM, GenAI, Snowflake, Unstructured Data and Open Source friends. https://bit.ly/32dAJft DATA + AI + Streaming Weekly
  • 4.
    How Snowflake andApache NiFi work with Streaming Data and AI
  • 5.
    Building Streaming Data + AIPipelines Requires a Team
  • 6.
    Example Smart CityArchitecture 6 DATA SOURCES DATA INTEGRATION DATA PLATFORM DATA CONSUMERS Marketplace Raw Data Modeled Data Snowpipe Sensors Transit Data AI/ML & Apps Weather Traffic Data SNOWSIGHT Snowflake Cortex AI Raw Data DATA FROM THE REAL WORLD I Can Haz Data? Camera Images
  • 7.
    Apache NiFi ● Fromlaptop to 1,000 nodes ● Ingest, Extract, Split ● Enrich, Transform ● Mature, 10 years+ ● Any Data, Any Source ● LLM Calls ● Data Provenance ● Back Pressure ● Guaranteed Delivery
  • 8.
    Unstructured Data ● Lotsof formats ● Text, Documents, PDF ● Images, Videos, Audio ● Email, Slack, Teams ● Logs ● Binary Data Formats ● Zip ● Variants Unstructured
  • 9.
    ● Open Datalike Open AQ - Air Quality Data ● Location, Time,Sensors ● Apache Avro, Parquet, Orc ● JSON and XML ● Hierarchical Data ● Logs ● Key-Value Semi-Structured Data https://docs.snowflake.com/en/sql-refe rence/data-types-semistructured Semi-structured
  • 10.
    Structured Data ● SnowflakeTables ● Snowflake Hybrid Tables ● Apache Iceberg Tables ● Relational Tables ● Postgresql Tables ● CSV, TSV Structured
  • 11.
    Open LLM Options ●Arctic Instruct ● Arctic-embed-m-v2.0 ● Llama-3.3-70b ● Mixtral-8x7b ● Llama3.1-405b ● Mistral-7b ● Deepseek-r1