SlideShare a Scribd company logo
1 of 43
Download to read offline
Creating your own Chat GPT with
Apache Airflow
@tati_alchueyr
Staff Software Engineer - Astronomer
13th July 2023 - AI Camp London Meetup
Turing test
https://marcabraham.com/2022/10/17/what-is-the-turing-test/
ChatGPT
https://chat.openai.com/
https://chat.openai.com/
https://xkcd.com/329/
inspect(ChatGPT)
● Artificial intelligence chatbot
● Developed by OpenAI
● Proprietary machine learning model
○ Uses LLM (Large Language Models)
○ GPT == Generative Pre-Trained Transformer
○ Fine-tuned GPT-3.5 (text-DaVinci-003)
● Over 100 million user base
● Dataset size: 570 GBs; 175 Billion Parameters
● Estimated cost to run per month: $3 million
https://www.theguardian.com/technology/2023/feb/02/chatgpt-100-million-users-open-ai-fastest-growing-app
https://indianexpress.com/article/technology/tech-news-technology/chatgpt-interesting-things-to-know-8334991/
https://meetanshi.com/blog/chatgpt-statistics/
help(LLM)
A Large Language Model is a type
of AI algorithm trained on huge
amounts of text data that can
understand and generate text
help(LLM)
LLM can be characterized by 4 parameters:
● Size of the training dataset
● Cost of training
● Size of the model
● Performance after training
timeline(LLM)
https://samim.io/p/2023-04-30-evolutionary-tree-of-llms/
Proprietary LLM limitations
● Data Privacy and Security
● Dependency and Customisation
● Cost and Scalability
● Access and Availability
Open-source LLM alternatives
● LLaMA (Meta)
● Alpaca (Stanford)
● Vicuna (Berkeley, Carnegie Mellon, Stanford)
● Dolly (Datricks)
● Open Assistant (individuals)
● h2oGPT (h2o)
https://bdtechtalks.com/2023/04/17/open-source-chatgpt-alternatives/
h2oGPT about
● Open-source (Apache 2.0) generative AI
● Empowers users to create their own language models
● https://gpt.h2o.ai/
● https://github.com/h2oai/h2ogpt
● https://www.youtube.com/watch?v=Coj72EzmX20&t=757s
https://bdtechtalks.com/2023/04/17/open-source-chatgpt-alternatives/
h2oGPT about
https://bdtechtalks.com/2023/04/17/open-source-chatgpt-alternatives/
Apache Airflow
Apache Airflow is an open-source
platform for developing,
scheduling, and monitoring
batch-oriented workflows.
help(airflow)
usage(airflow)
https://github.com/apache/airflow
https://pypistats.org/packages/apache-airflow
airflow.__author__
example(workflow)
airflow.concepts
airflow.concepts
airflow.concepts
airflow providers packages
https://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html
● apache-airflow-providers-airbyte
● apache-airflow-providers-alibaba
● apache-airflow-providers-amazon
● apache-airflow-providers-apache-beam
● apache-airflow-providers-apache-cassandra
● apache-airflow-providers-apache-drill
● apache-airflow-providers-apache-druid
● apache-airflow-providers-apache-flink
● apache-airflow-providers-apache-hdfs
● apache-airflow-providers-apache-hive
● apache-airflow-providers-apache-impala
● apache-airflow-providers-apache-kafka
● apache-airflow-providers-apache-kylin
● apache-airflow-providers-apache-livy
● apache-airflow-providers-apache-pig
● apache-airflow-providers-apache-pinot
● apache-airflow-providers-apache-spark
● apache-airflow-providers-apache-sqoop
● apache-airflow-providers-apprise
● apache-airflow-providers-arangodb
● apache-airflow-providers-asana
● apache-airflow-providers-atlassian-jira
● apache-airflow-providers-celery
● apache-airflow-providers-cloudant
● apache-airflow-providers-cncf-kubernetes
● apache-airflow-providers-common-sql
● apache-airflow-providers-databricks
● apache-airflow-providers-datadog
● apache-airflow-providers-dbt-cloud
● apache-airflow-providers-dingding
● apache-airflow-providers-discord
● apache-airflow-providers-docker
● apache-airflow-providers-elasticsearch
● apache-airflow-providers-exasol
● apache-airflow-providers-facebook
● apache-airflow-providers-ftp
● apache-airflow-providers-github
● apache-airflow-providers-google
● apache-airflow-providers-grpc
● apache-airflow-providers-hashicorp
airflow providers packages
https://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html
● apache-airflow-providers-http
● apache-airflow-providers-imap
● apache-airflow-providers-influxdb
● apache-airflow-providers-jdbc
● apache-airflow-providers-jenkins
● apache-airflow-providers-microsoft-azure
● apache-airflow-providers-microsoft-mssql
● apache-airflow-providers-microsoft-psrp
● apache-airflow-providers-microsoft-winrm
● apache-airflow-providers-mongo
● apache-airflow-providers-mysql
● apache-airflow-providers-neo4j
● apache-airflow-providers-odbc
● apache-airflow-providers-openfaas
● apache-airflow-providers-openlineage
● apache-airflow-providers-opsgenie
● apache-airflow-providers-oracle
● Apache-airflow-providers-pagerduty
● Apache-airflow-providers-papermill
● Apache-airflow-providers-plexus
● apache-airflow-providers-postgres
● apache-airflow-providers-presto
● apache-airflow-providers-qubole
● apache-airflow-providers-redis
● apache-airflow-providers-salesforce
● apache-airflow-providers-samba
● apache-airflow-providers-segment
● apache-airflow-providers-sendgrid
● apache-airflow-providers-sftp
● apache-airflow-providers-singularity
● apache-airflow-providers-slack
● apache-airflow-providers-smtp
● apache-airflow-providers-snowflake
● apache-airflow-providers-sqlite
● apache-airflow-providers-ssh
● apache-airflow-providers-tableau
● apache-airflow-providers-tabular
● apache-airflow-providers-telegram
● apache-airflow-providers-trino
● apache-airflow-providers-vertica
● apache-airflow-providers-zendesk
airflow example DAG
from airflow import DAG
from datetime import datetime
def train_model():
pass
with DAG(
“train_models",
start_date=datetime(2023, 7, 4),
schedule="@daily") as dag:
train_model = PythonOperator(
task_id="train_model",
python_callable=train_model
)
airflow example DAG
from airflow import DAG
from airflow.operators.python import PythonOperator
from random import randint
from datetime import datetime
def _evaluate_model():
return randint(1,10)
def _choose_best(ti):
tasks = [
"evaluate_model_a",
"evaluate_model_b"
]
accuracies = [ti.xcom_pull(task_id) for task_id in
tasks]
best_accuracy = max(accuracies)
for model, model_accuracy in zip(tasks,
accuracies):
if model_accuracy == best_accuracy:
return model
with DAG(
"evaluate_models",
start_date=datetime(2023, 7, 4),
schedule="@daily") as dag:
evaluate_model_a = PythonOperator(
task_id="evaluate_model_a",
python_callable=_evaluate_model
)
evaluate_model_b = PythonOperator(
task_id="evaluate_model_b",
python_callable=_evaluate_model
)
choose_best_model = PythonOperator(
task_id="choose_best_model",
python_callable=_choose_best
)
[evaluate_model_a, evaluate_model_b] >>
choose_best_model
airflow example DAG
airflow example of pipelines
airflow example of pipelines
airflow example of pipelines
Building an AI Chat Bot
with Airflow
Airflow to build a LLM Chat Bot
● Open-source and cloud-agnostic: you are not locked in!
● Same orchestration tool for ELT/ETL and ML
● Automate the steps of a model pipeline, using Airflow to:
○ Monitor the status and duration of tasks over time
○ Retry on failures
○ Send notifications (email, slack, others) to the team
● Dynamically trigger tasks using different hyper parameters
● Dynamically select models based on their scores
● Trigger model pipelines based of dataset changes
● Smoothly run tasks in VMs, containers or Kubernetes
Use the KubernetesPodOperator
● Create tasks which are run in Kubernetes pods
● Use node_affinity to allocate job to run on the nodepool
with the desired memory/CPU/GPU
● Use k8s.V1VolumeMount to efficiently mount volumes (e.g.
NFS) to access large models from different Pods (evaluate,
serve)
https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.html
Use Dataset-aware scheduling
● Schedule when tasks (from other DAGs) complete successfully
from airflow.datasets import Dataset
with DAG(“ingest_dataset”, ...):
MyOperator(
# this task updates example.csv
outlets=[Dataset("s3://dataset-bucket/source-data.parquet")],
...,
)
with DAG(“train_model”,
# this DAG should be run when source-data.parquet is updated (by dag “ingest_dataset”)
schedule=[Dataset("s3://dataset-bucket/source_data.csv")],
...,
):
https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/datasets.html
Use Dynamic Task Mapping
● Create a variable number of tasks at runtime based upon the
data created by the previous task
● Can be useful in several situations, including chosing the most
adequate model
● Support map/reduce
https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/dynamic-task-mapping.html
Dynamic Task Mapping
from __future__ import annotations
from datetime import datetime
from airflow import DAG
from airflow.decorators import task
with DAG(
dag_id="example_dynamic_task_mapping",
start_date=datetime(2022, 3, 4)
) as dag:
@task
def evaluate_model(model_path):
(...)
return evaluation_metrics
@task
def chose_model(metrics_by_model):
(...)
return chosen_one
models_metrics = evaluate_model.expand(
model_path=["/data/model1", "/data/model2", "/data/model3"]
)
chose_model(models_metrics)
Apache Airflow
Community
Apache Airflow Community
https://airflow.apache.org/community/
https://github.com/apache/airflow
https://www.meetup.com/london-apache-airflow-meetup/
https://www.astronomer.io/
@tati_alchueyr
tatiana.alchueyr@astronomer.io
Thank you!

More Related Content

What's hot

ChatGPT OpenAI Primer for Business
ChatGPT OpenAI Primer for BusinessChatGPT OpenAI Primer for Business
ChatGPT OpenAI Primer for BusinessDion Hinchcliffe
 
Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafkaconfluent
 
Impact of Generative AI in Cybersecurity - How can ISO/IEC 27032 help?
Impact of Generative AI in Cybersecurity - How can ISO/IEC 27032 help?Impact of Generative AI in Cybersecurity - How can ISO/IEC 27032 help?
Impact of Generative AI in Cybersecurity - How can ISO/IEC 27032 help?PECB
 
[Machine Learning 15minutes! #61] Azure OpenAI Service
[Machine Learning 15minutes! #61] Azure OpenAI Service[Machine Learning 15minutes! #61] Azure OpenAI Service
[Machine Learning 15minutes! #61] Azure OpenAI ServiceNaoki (Neo) SATO
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...ssuser4edc93
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDARaymond Tay
 
Oracle Database Appliance X5-2 アップデート内容のご紹介
Oracle Database Appliance X5-2 アップデート内容のご紹介Oracle Database Appliance X5-2 アップデート内容のご紹介
Oracle Database Appliance X5-2 アップデート内容のご紹介オラクルエンジニア通信
 
Generative AI and law.pptx
Generative AI and law.pptxGenerative AI and law.pptx
Generative AI and law.pptxChris Marsden
 
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Multi cluster, multitenant and hierarchical kafka messaging service   slideshareMulti cluster, multitenant and hierarchical kafka messaging service   slideshare
Multi cluster, multitenant and hierarchical kafka messaging service slideshareAllen (Xiaozhong) Wang
 
The current state of generative AI
The current state of generative AIThe current state of generative AI
The current state of generative AIBenjaminlapid1
 
Part 2: Data & AI 基盤 (製造リファレンス・アーキテクチャ勉強会)
Part 2: Data & AI 基盤 (製造リファレンス・アーキテクチャ勉強会)Part 2: Data & AI 基盤 (製造リファレンス・アーキテクチャ勉強会)
Part 2: Data & AI 基盤 (製造リファレンス・アーキテクチャ勉強会)Takeshi Fukuhara
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMsSylvainGugger
 
Kubernetes and Nested Containers: Enhanced 3 Ps (Performance, Price and Provi...
Kubernetes and Nested Containers: Enhanced 3 Ps (Performance, Price and Provi...Kubernetes and Nested Containers: Enhanced 3 Ps (Performance, Price and Provi...
Kubernetes and Nested Containers: Enhanced 3 Ps (Performance, Price and Provi...Jelastic Multi-Cloud PaaS
 
Apache Sparkの基本と最新バージョン3.2のアップデート(Open Source Conference 2021 Online/Fukuoka ...
Apache Sparkの基本と最新バージョン3.2のアップデート(Open Source Conference 2021 Online/Fukuoka ...Apache Sparkの基本と最新バージョン3.2のアップデート(Open Source Conference 2021 Online/Fukuoka ...
Apache Sparkの基本と最新バージョン3.2のアップデート(Open Source Conference 2021 Online/Fukuoka ...NTT DATA Technology & Innovation
 
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)Mineaki Motohashi
 
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017Amazon Web Services
 
An Introduction to Generative AI
An Introduction  to Generative AIAn Introduction  to Generative AI
An Introduction to Generative AICori Faklaris
 
Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdfQualcomm Research
 
LanGCHAIN Framework
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN FrameworkKeymate.AI
 

What's hot (20)

ChatGPT OpenAI Primer for Business
ChatGPT OpenAI Primer for BusinessChatGPT OpenAI Primer for Business
ChatGPT OpenAI Primer for Business
 
CHATGPT.pptx
CHATGPT.pptxCHATGPT.pptx
CHATGPT.pptx
 
Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafka
 
Impact of Generative AI in Cybersecurity - How can ISO/IEC 27032 help?
Impact of Generative AI in Cybersecurity - How can ISO/IEC 27032 help?Impact of Generative AI in Cybersecurity - How can ISO/IEC 27032 help?
Impact of Generative AI in Cybersecurity - How can ISO/IEC 27032 help?
 
[Machine Learning 15minutes! #61] Azure OpenAI Service
[Machine Learning 15minutes! #61] Azure OpenAI Service[Machine Learning 15minutes! #61] Azure OpenAI Service
[Machine Learning 15minutes! #61] Azure OpenAI Service
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
 
Oracle Database Appliance X5-2 アップデート内容のご紹介
Oracle Database Appliance X5-2 アップデート内容のご紹介Oracle Database Appliance X5-2 アップデート内容のご紹介
Oracle Database Appliance X5-2 アップデート内容のご紹介
 
Generative AI and law.pptx
Generative AI and law.pptxGenerative AI and law.pptx
Generative AI and law.pptx
 
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Multi cluster, multitenant and hierarchical kafka messaging service   slideshareMulti cluster, multitenant and hierarchical kafka messaging service   slideshare
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
 
The current state of generative AI
The current state of generative AIThe current state of generative AI
The current state of generative AI
 
Part 2: Data & AI 基盤 (製造リファレンス・アーキテクチャ勉強会)
Part 2: Data & AI 基盤 (製造リファレンス・アーキテクチャ勉強会)Part 2: Data & AI 基盤 (製造リファレンス・アーキテクチャ勉強会)
Part 2: Data & AI 基盤 (製造リファレンス・アーキテクチャ勉強会)
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
 
Kubernetes and Nested Containers: Enhanced 3 Ps (Performance, Price and Provi...
Kubernetes and Nested Containers: Enhanced 3 Ps (Performance, Price and Provi...Kubernetes and Nested Containers: Enhanced 3 Ps (Performance, Price and Provi...
Kubernetes and Nested Containers: Enhanced 3 Ps (Performance, Price and Provi...
 
Apache Sparkの基本と最新バージョン3.2のアップデート(Open Source Conference 2021 Online/Fukuoka ...
Apache Sparkの基本と最新バージョン3.2のアップデート(Open Source Conference 2021 Online/Fukuoka ...Apache Sparkの基本と最新バージョン3.2のアップデート(Open Source Conference 2021 Online/Fukuoka ...
Apache Sparkの基本と最新バージョン3.2のアップデート(Open Source Conference 2021 Online/Fukuoka ...
 
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
Snowflake Architecture and Performance(db tech showcase Tokyo 2018)
 
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
 
An Introduction to Generative AI
An Introduction  to Generative AIAn Introduction  to Generative AI
An Introduction to Generative AI
 
Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdf
 
LanGCHAIN Framework
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN Framework
 

Similar to Integrating ChatGPT with Apache Airflow

Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...OpenShift Origin
 
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...SaltStack
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Databricks
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...Databricks
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSDerrick Qin
 
Yaetos Tech Overview
Yaetos Tech OverviewYaetos Tech Overview
Yaetos Tech Overviewprevota
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflowmutt_data
 
Elasticsearch on Kubernetes
Elasticsearch on KubernetesElasticsearch on Kubernetes
Elasticsearch on KubernetesJoerg Henning
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark DownscalingDatabricks
 
GE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoTGE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoTKai Zhao
 
OpenShift Origin Community Day (Boston) Extending OpenShift Origin: Build You...
OpenShift Origin Community Day (Boston) Extending OpenShift Origin: Build You...OpenShift Origin Community Day (Boston) Extending OpenShift Origin: Build You...
OpenShift Origin Community Day (Boston) Extending OpenShift Origin: Build You...OpenShift Origin
 
OpenShift Origin Community Day (Boston) Writing Cartridges V2 by Jhon Honce
OpenShift Origin Community Day (Boston) Writing Cartridges V2 by Jhon Honce OpenShift Origin Community Day (Boston) Writing Cartridges V2 by Jhon Honce
OpenShift Origin Community Day (Boston) Writing Cartridges V2 by Jhon Honce Diane Mueller
 
How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014Puppet
 
BaseX user-group-talk XML Prague 2013
BaseX user-group-talk XML Prague 2013BaseX user-group-talk XML Prague 2013
BaseX user-group-talk XML Prague 2013Andy Bunce
 
Time series denver an introduction to prometheus
Time series denver   an introduction to prometheusTime series denver   an introduction to prometheus
Time series denver an introduction to prometheusBob Cotton
 
Leveraging open source for large scale analytics
Leveraging open source for large scale analyticsLeveraging open source for large scale analytics
Leveraging open source for large scale analyticsSouth West Data Meetup
 
Fluent 2018: When third parties stop being polite... and start getting real
Fluent 2018: When third parties stop being polite... and start getting realFluent 2018: When third parties stop being polite... and start getting real
Fluent 2018: When third parties stop being polite... and start getting realAkamai Developers & Admins
 
When Third Parties Stop Being Polite... and Start Getting Real
When Third Parties Stop Being Polite... and Start Getting RealWhen Third Parties Stop Being Polite... and Start Getting Real
When Third Parties Stop Being Polite... and Start Getting RealNicholas Jansma
 
Improving Operations Efficiency with Puppet
Improving Operations Efficiency with PuppetImproving Operations Efficiency with Puppet
Improving Operations Efficiency with PuppetNicolas Brousse
 
Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209mffiedler
 

Similar to Integrating ChatGPT with Apache Airflow (20)

Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
Extending OpenShift Origin: Build Your Own Cartridge with Bill DeCoste of Red...
 
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
 
Yaetos Tech Overview
Yaetos Tech OverviewYaetos Tech Overview
Yaetos Tech Overview
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflow
 
Elasticsearch on Kubernetes
Elasticsearch on KubernetesElasticsearch on Kubernetes
Elasticsearch on Kubernetes
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark Downscaling
 
GE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoTGE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoT
 
OpenShift Origin Community Day (Boston) Extending OpenShift Origin: Build You...
OpenShift Origin Community Day (Boston) Extending OpenShift Origin: Build You...OpenShift Origin Community Day (Boston) Extending OpenShift Origin: Build You...
OpenShift Origin Community Day (Boston) Extending OpenShift Origin: Build You...
 
OpenShift Origin Community Day (Boston) Writing Cartridges V2 by Jhon Honce
OpenShift Origin Community Day (Boston) Writing Cartridges V2 by Jhon Honce OpenShift Origin Community Day (Boston) Writing Cartridges V2 by Jhon Honce
OpenShift Origin Community Day (Boston) Writing Cartridges V2 by Jhon Honce
 
How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014
 
BaseX user-group-talk XML Prague 2013
BaseX user-group-talk XML Prague 2013BaseX user-group-talk XML Prague 2013
BaseX user-group-talk XML Prague 2013
 
Time series denver an introduction to prometheus
Time series denver   an introduction to prometheusTime series denver   an introduction to prometheus
Time series denver an introduction to prometheus
 
Leveraging open source for large scale analytics
Leveraging open source for large scale analyticsLeveraging open source for large scale analytics
Leveraging open source for large scale analytics
 
Fluent 2018: When third parties stop being polite... and start getting real
Fluent 2018: When third parties stop being polite... and start getting realFluent 2018: When third parties stop being polite... and start getting real
Fluent 2018: When third parties stop being polite... and start getting real
 
When Third Parties Stop Being Polite... and Start Getting Real
When Third Parties Stop Being Polite... and Start Getting RealWhen Third Parties Stop Being Polite... and Start Getting Real
When Third Parties Stop Being Polite... and Start Getting Real
 
Improving Operations Efficiency with Puppet
Improving Operations Efficiency with PuppetImproving Operations Efficiency with Puppet
Improving Operations Efficiency with Puppet
 
Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209
 

More from Tatiana Al-Chueyr

Contributing to Apache Airflow
Contributing to Apache AirflowContributing to Apache Airflow
Contributing to Apache AirflowTatiana Al-Chueyr
 
From an idea to production: building a recommender for BBC Sounds
From an idea to production: building a recommender for BBC SoundsFrom an idea to production: building a recommender for BBC Sounds
From an idea to production: building a recommender for BBC SoundsTatiana Al-Chueyr
 
Precomputing recommendations with Apache Beam
Precomputing recommendations with Apache BeamPrecomputing recommendations with Apache Beam
Precomputing recommendations with Apache BeamTatiana Al-Chueyr
 
Scaling machine learning to millions of users with Apache Beam
Scaling machine learning to millions of users with Apache BeamScaling machine learning to millions of users with Apache Beam
Scaling machine learning to millions of users with Apache BeamTatiana Al-Chueyr
 
Clearing Airflow Obstructions
Clearing Airflow ObstructionsClearing Airflow Obstructions
Clearing Airflow ObstructionsTatiana Al-Chueyr
 
Scaling machine learning workflows with Apache Beam
Scaling machine learning workflows with Apache BeamScaling machine learning workflows with Apache Beam
Scaling machine learning workflows with Apache BeamTatiana Al-Chueyr
 
Responsible machine learning at the BBC
Responsible machine learning at the BBCResponsible machine learning at the BBC
Responsible machine learning at the BBCTatiana Al-Chueyr
 
Powering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and PythonPowering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and PythonTatiana Al-Chueyr
 
Responsible Machine Learning at the BBC
Responsible Machine Learning at the BBCResponsible Machine Learning at the BBC
Responsible Machine Learning at the BBCTatiana Al-Chueyr
 
PyConUK 2018 - Journey from HTTP to gRPC
PyConUK 2018 - Journey from HTTP to gRPCPyConUK 2018 - Journey from HTTP to gRPC
PyConUK 2018 - Journey from HTTP to gRPCTatiana Al-Chueyr
 
PythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummiesPythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummiesTatiana Al-Chueyr
 
QCon SP - recommended for you
QCon SP - recommended for youQCon SP - recommended for you
QCon SP - recommended for youTatiana Al-Chueyr
 
PyConUK 2016 - Writing English Right
PyConUK 2016  - Writing English RightPyConUK 2016  - Writing English Right
PyConUK 2016 - Writing English RightTatiana Al-Chueyr
 
InVesalius: 3D medical imaging software
InVesalius: 3D medical imaging softwareInVesalius: 3D medical imaging software
InVesalius: 3D medical imaging softwareTatiana Al-Chueyr
 
Automatic English text correction
Automatic English text correctionAutomatic English text correction
Automatic English text correctionTatiana Al-Chueyr
 
Python packaging and dependency resolution
Python packaging and dependency resolutionPython packaging and dependency resolution
Python packaging and dependency resolutionTatiana Al-Chueyr
 
Rio info 2013 - Linked Data at Globo.com
Rio info 2013 - Linked Data at Globo.comRio info 2013 - Linked Data at Globo.com
Rio info 2013 - Linked Data at Globo.comTatiana Al-Chueyr
 

More from Tatiana Al-Chueyr (20)

Contributing to Apache Airflow
Contributing to Apache AirflowContributing to Apache Airflow
Contributing to Apache Airflow
 
From an idea to production: building a recommender for BBC Sounds
From an idea to production: building a recommender for BBC SoundsFrom an idea to production: building a recommender for BBC Sounds
From an idea to production: building a recommender for BBC Sounds
 
Precomputing recommendations with Apache Beam
Precomputing recommendations with Apache BeamPrecomputing recommendations with Apache Beam
Precomputing recommendations with Apache Beam
 
Scaling machine learning to millions of users with Apache Beam
Scaling machine learning to millions of users with Apache BeamScaling machine learning to millions of users with Apache Beam
Scaling machine learning to millions of users with Apache Beam
 
Clearing Airflow Obstructions
Clearing Airflow ObstructionsClearing Airflow Obstructions
Clearing Airflow Obstructions
 
Scaling machine learning workflows with Apache Beam
Scaling machine learning workflows with Apache BeamScaling machine learning workflows with Apache Beam
Scaling machine learning workflows with Apache Beam
 
Responsible machine learning at the BBC
Responsible machine learning at the BBCResponsible machine learning at the BBC
Responsible machine learning at the BBC
 
Powering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and PythonPowering machine learning workflows with Apache Airflow and Python
Powering machine learning workflows with Apache Airflow and Python
 
Responsible Machine Learning at the BBC
Responsible Machine Learning at the BBCResponsible Machine Learning at the BBC
Responsible Machine Learning at the BBC
 
PyConUK 2018 - Journey from HTTP to gRPC
PyConUK 2018 - Journey from HTTP to gRPCPyConUK 2018 - Journey from HTTP to gRPC
PyConUK 2018 - Journey from HTTP to gRPC
 
Sprint cPython at Globo.com
Sprint cPython at Globo.comSprint cPython at Globo.com
Sprint cPython at Globo.com
 
PythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummiesPythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummies
 
QCon SP - recommended for you
QCon SP - recommended for youQCon SP - recommended for you
QCon SP - recommended for you
 
Crafting APIs
Crafting APIsCrafting APIs
Crafting APIs
 
PyConUK 2016 - Writing English Right
PyConUK 2016  - Writing English RightPyConUK 2016  - Writing English Right
PyConUK 2016 - Writing English Right
 
InVesalius: 3D medical imaging software
InVesalius: 3D medical imaging softwareInVesalius: 3D medical imaging software
InVesalius: 3D medical imaging software
 
Automatic English text correction
Automatic English text correctionAutomatic English text correction
Automatic English text correction
 
Python packaging and dependency resolution
Python packaging and dependency resolutionPython packaging and dependency resolution
Python packaging and dependency resolution
 
Rio info 2013 - Linked Data at Globo.com
Rio info 2013 - Linked Data at Globo.comRio info 2013 - Linked Data at Globo.com
Rio info 2013 - Linked Data at Globo.com
 
PythonBrasil[8] closing
PythonBrasil[8] closingPythonBrasil[8] closing
PythonBrasil[8] closing
 

Recently uploaded

Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Integrating ChatGPT with Apache Airflow

  • 1. Creating your own Chat GPT with Apache Airflow @tati_alchueyr Staff Software Engineer - Astronomer 13th July 2023 - AI Camp London Meetup
  • 4.
  • 6.
  • 9. inspect(ChatGPT) ● Artificial intelligence chatbot ● Developed by OpenAI ● Proprietary machine learning model ○ Uses LLM (Large Language Models) ○ GPT == Generative Pre-Trained Transformer ○ Fine-tuned GPT-3.5 (text-DaVinci-003) ● Over 100 million user base ● Dataset size: 570 GBs; 175 Billion Parameters ● Estimated cost to run per month: $3 million https://www.theguardian.com/technology/2023/feb/02/chatgpt-100-million-users-open-ai-fastest-growing-app https://indianexpress.com/article/technology/tech-news-technology/chatgpt-interesting-things-to-know-8334991/ https://meetanshi.com/blog/chatgpt-statistics/
  • 10. help(LLM) A Large Language Model is a type of AI algorithm trained on huge amounts of text data that can understand and generate text
  • 11. help(LLM) LLM can be characterized by 4 parameters: ● Size of the training dataset ● Cost of training ● Size of the model ● Performance after training
  • 13. Proprietary LLM limitations ● Data Privacy and Security ● Dependency and Customisation ● Cost and Scalability ● Access and Availability
  • 14. Open-source LLM alternatives ● LLaMA (Meta) ● Alpaca (Stanford) ● Vicuna (Berkeley, Carnegie Mellon, Stanford) ● Dolly (Datricks) ● Open Assistant (individuals) ● h2oGPT (h2o) https://bdtechtalks.com/2023/04/17/open-source-chatgpt-alternatives/
  • 15. h2oGPT about ● Open-source (Apache 2.0) generative AI ● Empowers users to create their own language models ● https://gpt.h2o.ai/ ● https://github.com/h2oai/h2ogpt ● https://www.youtube.com/watch?v=Coj72EzmX20&t=757s https://bdtechtalks.com/2023/04/17/open-source-chatgpt-alternatives/
  • 18. Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows. help(airflow)
  • 25. airflow providers packages https://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html ● apache-airflow-providers-airbyte ● apache-airflow-providers-alibaba ● apache-airflow-providers-amazon ● apache-airflow-providers-apache-beam ● apache-airflow-providers-apache-cassandra ● apache-airflow-providers-apache-drill ● apache-airflow-providers-apache-druid ● apache-airflow-providers-apache-flink ● apache-airflow-providers-apache-hdfs ● apache-airflow-providers-apache-hive ● apache-airflow-providers-apache-impala ● apache-airflow-providers-apache-kafka ● apache-airflow-providers-apache-kylin ● apache-airflow-providers-apache-livy ● apache-airflow-providers-apache-pig ● apache-airflow-providers-apache-pinot ● apache-airflow-providers-apache-spark ● apache-airflow-providers-apache-sqoop ● apache-airflow-providers-apprise ● apache-airflow-providers-arangodb ● apache-airflow-providers-asana ● apache-airflow-providers-atlassian-jira ● apache-airflow-providers-celery ● apache-airflow-providers-cloudant ● apache-airflow-providers-cncf-kubernetes ● apache-airflow-providers-common-sql ● apache-airflow-providers-databricks ● apache-airflow-providers-datadog ● apache-airflow-providers-dbt-cloud ● apache-airflow-providers-dingding ● apache-airflow-providers-discord ● apache-airflow-providers-docker ● apache-airflow-providers-elasticsearch ● apache-airflow-providers-exasol ● apache-airflow-providers-facebook ● apache-airflow-providers-ftp ● apache-airflow-providers-github ● apache-airflow-providers-google ● apache-airflow-providers-grpc ● apache-airflow-providers-hashicorp
  • 26. airflow providers packages https://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html ● apache-airflow-providers-http ● apache-airflow-providers-imap ● apache-airflow-providers-influxdb ● apache-airflow-providers-jdbc ● apache-airflow-providers-jenkins ● apache-airflow-providers-microsoft-azure ● apache-airflow-providers-microsoft-mssql ● apache-airflow-providers-microsoft-psrp ● apache-airflow-providers-microsoft-winrm ● apache-airflow-providers-mongo ● apache-airflow-providers-mysql ● apache-airflow-providers-neo4j ● apache-airflow-providers-odbc ● apache-airflow-providers-openfaas ● apache-airflow-providers-openlineage ● apache-airflow-providers-opsgenie ● apache-airflow-providers-oracle ● Apache-airflow-providers-pagerduty ● Apache-airflow-providers-papermill ● Apache-airflow-providers-plexus ● apache-airflow-providers-postgres ● apache-airflow-providers-presto ● apache-airflow-providers-qubole ● apache-airflow-providers-redis ● apache-airflow-providers-salesforce ● apache-airflow-providers-samba ● apache-airflow-providers-segment ● apache-airflow-providers-sendgrid ● apache-airflow-providers-sftp ● apache-airflow-providers-singularity ● apache-airflow-providers-slack ● apache-airflow-providers-smtp ● apache-airflow-providers-snowflake ● apache-airflow-providers-sqlite ● apache-airflow-providers-ssh ● apache-airflow-providers-tableau ● apache-airflow-providers-tabular ● apache-airflow-providers-telegram ● apache-airflow-providers-trino ● apache-airflow-providers-vertica ● apache-airflow-providers-zendesk
  • 27. airflow example DAG from airflow import DAG from datetime import datetime def train_model(): pass with DAG( “train_models", start_date=datetime(2023, 7, 4), schedule="@daily") as dag: train_model = PythonOperator( task_id="train_model", python_callable=train_model )
  • 28. airflow example DAG from airflow import DAG from airflow.operators.python import PythonOperator from random import randint from datetime import datetime def _evaluate_model(): return randint(1,10) def _choose_best(ti): tasks = [ "evaluate_model_a", "evaluate_model_b" ] accuracies = [ti.xcom_pull(task_id) for task_id in tasks] best_accuracy = max(accuracies) for model, model_accuracy in zip(tasks, accuracies): if model_accuracy == best_accuracy: return model with DAG( "evaluate_models", start_date=datetime(2023, 7, 4), schedule="@daily") as dag: evaluate_model_a = PythonOperator( task_id="evaluate_model_a", python_callable=_evaluate_model ) evaluate_model_b = PythonOperator( task_id="evaluate_model_b", python_callable=_evaluate_model ) choose_best_model = PythonOperator( task_id="choose_best_model", python_callable=_choose_best ) [evaluate_model_a, evaluate_model_b] >> choose_best_model
  • 30. airflow example of pipelines
  • 31. airflow example of pipelines
  • 32. airflow example of pipelines
  • 33. Building an AI Chat Bot with Airflow
  • 34. Airflow to build a LLM Chat Bot ● Open-source and cloud-agnostic: you are not locked in! ● Same orchestration tool for ELT/ETL and ML ● Automate the steps of a model pipeline, using Airflow to: ○ Monitor the status and duration of tasks over time ○ Retry on failures ○ Send notifications (email, slack, others) to the team ● Dynamically trigger tasks using different hyper parameters ● Dynamically select models based on their scores ● Trigger model pipelines based of dataset changes ● Smoothly run tasks in VMs, containers or Kubernetes
  • 35. Use the KubernetesPodOperator ● Create tasks which are run in Kubernetes pods ● Use node_affinity to allocate job to run on the nodepool with the desired memory/CPU/GPU ● Use k8s.V1VolumeMount to efficiently mount volumes (e.g. NFS) to access large models from different Pods (evaluate, serve) https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.html
  • 36. Use Dataset-aware scheduling ● Schedule when tasks (from other DAGs) complete successfully from airflow.datasets import Dataset with DAG(“ingest_dataset”, ...): MyOperator( # this task updates example.csv outlets=[Dataset("s3://dataset-bucket/source-data.parquet")], ..., ) with DAG(“train_model”, # this DAG should be run when source-data.parquet is updated (by dag “ingest_dataset”) schedule=[Dataset("s3://dataset-bucket/source_data.csv")], ..., ): https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/datasets.html
  • 37. Use Dynamic Task Mapping ● Create a variable number of tasks at runtime based upon the data created by the previous task ● Can be useful in several situations, including chosing the most adequate model ● Support map/reduce https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/dynamic-task-mapping.html
  • 38. Dynamic Task Mapping from __future__ import annotations from datetime import datetime from airflow import DAG from airflow.decorators import task with DAG( dag_id="example_dynamic_task_mapping", start_date=datetime(2022, 3, 4) ) as dag: @task def evaluate_model(model_path): (...) return evaluation_metrics @task def chose_model(metrics_by_model): (...) return chosen_one models_metrics = evaluate_model.expand( model_path=["/data/model1", "/data/model2", "/data/model3"] ) chose_model(models_metrics)
  • 41.