Airflow 2.3

9/15/2023

Every Airflow workflow consists of tasks (regular Python functions or Airflow operators) that are executed by the scheduler in sequential order (so Apache workflows are also called DAGs – Directed Acyclic Graphs). Now that our Apache Airflow environment is ready, we can start creating Apache Airflow workflows.

This is enough for the worker Pod to run delegated tasks from the Airflow workflow, which we will develop and describe in the following sections. With this Pod template file, our Airflow worker Pod will run an Apache Airflow version 2.3.4 container and it will run LocalExecutor. The rest of the specifications are related to the Airflow Executor and persistent volumes. To do so, we need to set the ‘containers’ keyword in the YAML file above to the value ‘ base’ and we need to set the ‘image’ keyword to the value ‘apache/airflow:2.3.4’. In our case, the image containing Apache Airflow version 2.3.4 will be downloaded from the internet and imported into the worker Pod. It’s important when defining this YAML file to set the container name to the value ‘ base’, and to define which image will be running inside that container (this is explained in the Airflow official documentation at this link). name: AIRFLOW_KUBERNETES_DELETE_WORKER_PODS_ON_FAILURE name: AIRFLOW_KUBERNETES_DELETE_WORKER_PODS The Pod template file is similar to the Airflow deployment file from the previous blog post ( airflow-deployment.yaml), and its content is shown below: Kubernetes will use that file to spawn the worker Pods when the Airflow workflow is triggered. There, we specified the path to pod-creator.yaml, our custom Pod template file, also visible below: The Airflow configuration file is located on the AIRFLOW_HOME path, usually on /opt/airflow/airflow.cfg, as shown in the picture below. In our case, we took the second approach. Setting the Pod template file path in the Airflow configuration file in the ‘ kubernetes’ section.Setting the value of the environment variable ‘AIRFLOW_KUBERNETES_POD_TEMPLATE_FILE’.The Kubernetes server needs to know how to dynamically provide more worker Pods if there are a lot of task instances from the DAG, so we need to configure Kubernetes Executor with a custom Pod template file, which will be used by the Kubernetes server when creating worker Pods.Īccording to the Apache Airflow documentation, this can be done by either: This, however, is not enough for Airflow to leverage Kubernetes Executor when there is a big demand for ‘ power’. In the previous blog post, we configured Airflow to use Kubernetes Executor when running task instances from the DAG, by setting the environment variable of the Airflow Pod to ‘ KubernetesExecutor’. In this second part, we will demonstrate how to make Airflow on Kubernetes ready-to-run workflows (Directed Acyclic Graphs – DAGs) using Kubernetes Executor, and we will also show you how to monitor developed workflows using the Apache Airflow webserver UI. In our first blog post, we demonstrated how to build the required Kubernetes resources to deploy and run Apache Airflow on a Kubernetes cluster. In this series, our goal is to show how to deploy Apache Airflow on a Kubernetes cluster, to look at the options for making it secure, and to make it production-ready. Apache Airflow is a platform which enterprises use to schedule and monitor workflows running on their infrastructures, providing a high level of observability to users and sysadmins.

0 Comments

Airflow 2.3

Leave a Reply.

Author

Archives

Categories