Version 1.0
Prefect
In Data Engineer's Lunch #44, we will discuss Prefect and how
it compares to Airflowwhen scheduling tasks alongside a demo
from their documentation.
Josh Barnes
Jr. Engineer @ Anant
Prefect
● Workflow Management System
○ Retries, Logging, Dynamic Mapping, Caching, Failure Notifications
● Free to start: 10,000 tasks/month
● Prefect takes your code and transforms it into a robust, distributed pipeline.
○ Continue using existing tools, languages, infrastructure, and scripts.
○ Prefect builds a fully introspectable and customizable DAG definition for every
workflow, users are never required to interact with it if they don't want to. Instead,
Python is the API. Users define functions and call them as they would in any script, and
Prefect does the work to figure out the workflow structure.
● Prefect has more unit tests and greater test coverage than any other workflow engine,
including the entire Airflow platform.
● UI to visualize every aspect your workflow and even run off schedule tasks trying altered
variables.
Workflows and Tasks
● Workflows “Flows” represent the dependency structure between tasks, but do not
perform logic
● Prefect tasks are functions that have rules about when they should run.
○ Process data, Orchestrate external systems, call out to other environments or
languages.
○ Decorate a python function with @task and it’s been created.
○ Receives metadata about upstream dependencies prior to running.
○ Code agnostic without restriction to inputs or outputs.
● Both Tasks and Workflows produce State to reflect the behavior at any time.
● Designed to run at any time with concurrency.
● https://docs.prefect.io/core/task_library/overview.html#task-library-in-action
Where Prefect Improves upon Airflow
● DAGs which need to be run off-schedule or with no schedule at all
● DAGs that run concurrently with the same start time
● DAGs with complicated branching logic
● DAGs with many fast tasks
● DAGs which rely on the exchange of data
● Parametrized DAGs
● Dynamic DAGs
Flow Diagram
Demo
● https://github.com/PrefectHQ/prefect
● Simple Flow Deployed to Prefect Cloud
Resources
● https://docs.prefect.io/core/
Strategy: Scalable Fast Data
Architecture: Cassandra, Spark, Kafka
Engineering: Node, Python, JVM,CLR
Operations: Cloud, Container
Rescue: Downtime!! I need help.
www.anant.us | solutions@anant.us | (855) 262-6826
3 Washington Circle, NW | Suite 301 | Washington, DC 20037

Data Engineer's Lunch #44: Prefect

  • 1.
    Version 1.0 Prefect In DataEngineer's Lunch #44, we will discuss Prefect and how it compares to Airflowwhen scheduling tasks alongside a demo from their documentation. Josh Barnes Jr. Engineer @ Anant
  • 2.
    Prefect ● Workflow ManagementSystem ○ Retries, Logging, Dynamic Mapping, Caching, Failure Notifications ● Free to start: 10,000 tasks/month ● Prefect takes your code and transforms it into a robust, distributed pipeline. ○ Continue using existing tools, languages, infrastructure, and scripts. ○ Prefect builds a fully introspectable and customizable DAG definition for every workflow, users are never required to interact with it if they don't want to. Instead, Python is the API. Users define functions and call them as they would in any script, and Prefect does the work to figure out the workflow structure. ● Prefect has more unit tests and greater test coverage than any other workflow engine, including the entire Airflow platform. ● UI to visualize every aspect your workflow and even run off schedule tasks trying altered variables.
  • 3.
    Workflows and Tasks ●Workflows “Flows” represent the dependency structure between tasks, but do not perform logic ● Prefect tasks are functions that have rules about when they should run. ○ Process data, Orchestrate external systems, call out to other environments or languages. ○ Decorate a python function with @task and it’s been created. ○ Receives metadata about upstream dependencies prior to running. ○ Code agnostic without restriction to inputs or outputs. ● Both Tasks and Workflows produce State to reflect the behavior at any time. ● Designed to run at any time with concurrency. ● https://docs.prefect.io/core/task_library/overview.html#task-library-in-action
  • 4.
    Where Prefect Improvesupon Airflow ● DAGs which need to be run off-schedule or with no schedule at all ● DAGs that run concurrently with the same start time ● DAGs with complicated branching logic ● DAGs with many fast tasks ● DAGs which rely on the exchange of data ● Parametrized DAGs ● Dynamic DAGs
  • 5.
  • 6.
  • 7.
  • 8.
    Strategy: Scalable FastData Architecture: Cassandra, Spark, Kafka Engineering: Node, Python, JVM,CLR Operations: Cloud, Container Rescue: Downtime!! I need help. www.anant.us | solutions@anant.us | (855) 262-6826 3 Washington Circle, NW | Suite 301 | Washington, DC 20037