Running Airflow in Docker


Prerequisites

1. installation environment preparation

1.1 environment preparation

$ mkdir airflow
$ cd airflow
$ curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.0.1/docker-compose.yaml'
$ mkdir ./dags ./logs ./plugins
$ chmod 777 ../airflow/ -R

Don't create .env file mentioned in official documents, perhaps its deployment scripts have some problems.

1.2 program init

$ docker-compose up airflow-init

2. start airflow

$ docker-compose up

3. view airflow

Visit localhost:8080 via browser

FAQ

ModuleNotFoundError: No module named 'airflow'

Init airflow failed, it tips ModuleNotFoundError: No module named 'airflow'.

# docker-compose up airflow-init
Removing airflow_airflow-init_1
airflow_postgres_1 is up-to-date
airflow_redis_1 is up-to-date
Recreating cebf6e7abfc9_airflow_airflow-init_1 ... done
Attaching to airflow_airflow-init_1
airflow-init_1       | BACKEND=postgresql+psycopg2
airflow-init_1       | DB_HOST=postgres
airflow-init_1       | DB_PORT=5432
airflow-init_1       |
airflow-init_1       | Traceback (most recent call last):
airflow-init_1       |   File "/home/airflow/.local/bin/airflow", line 5, in <module>
airflow-init_1       |     from airflow.__main__ import main
airflow-init_1       | ModuleNotFoundError: No module named 'airflow'
airflow-init_1       | Traceback (most recent call last):
airflow-init_1       |   File "/home/airflow/.local/bin/airflow", line 5, in <module>
airflow-init_1       |     from airflow.__main__ import main
airflow-init_1       | ModuleNotFoundError: No module named 'airflow'
airflow-init_1       | Traceback (most recent call last):
airflow-init_1       |   File "/home/airflow/.local/bin/airflow", line 5, in <module>
airflow-init_1       |     from airflow.__main__ import main
airflow-init_1       | ModuleNotFoundError: No module named 'airflow'
airflow-init_1       | Traceback (most recent call last):
airflow-init_1       |   File "/home/airflow/.local/bin/airflow", line 5, in <module>
airflow-init_1       |     from airflow.__main__ import main
airflow-init_1       | ModuleNotFoundError: No module named 'airflow'
airflow-init_1       | Traceback (most recent call last):
airflow-init_1       |   File "/home/airflow/.local/bin/airflow", line 5, in <module>
airflow-init_1       |     from airflow.__main__ import main
airflow-init_1       | ModuleNotFoundError: No module named 'airflow'
airflow_airflow-init_1 exited with code 1

When I delete .env file mentioned in official documents, it tips
Permission denied: '/opt/airflow/logs/scheduler.

# rm -f .env
# docker-compose up airflow-init
airflow_redis_1 is up-to-date
airflow_postgres_1 is up-to-date
Recreating airflow_airflow-init_1 ... done
Attaching to airflow_airflow-init_1
airflow-init_1       | BACKEND=postgresql+psycopg2
airflow-init_1       | DB_HOST=postgres
airflow-init_1       | DB_PORT=5432
airflow-init_1       |
airflow-init_1       | Unable to load the config, contains a configuration error.
airflow-init_1       | Traceback (most recent call last):
airflow-init_1       |   File "/usr/local/lib/python3.6/pathlib.py", line 1248, in mkdir
airflow-init_1       |     self._accessor.mkdir(self, mode)
airflow-init_1       |   File "/usr/local/lib/python3.6/pathlib.py", line 387, in wrapped
airflow-init_1       |     return strfunc(str(pathobj), *args)
airflow-init_1       | FileNotFoundError: [Errno 2] No such file or directory: '/opt/airflow/logs/scheduler/2021-03-20'
airflow-init_1       |
airflow-init_1       | During handling of the above exception, another exception occurred:
airflow-init_1       |
airflow-init_1       | Traceback (most recent call last):
airflow-init_1       |   File "/usr/local/lib/python3.6/logging/config.py", line 565, in configure
airflow-init_1       |     handler = self.configure_handler(handlers[name])
airflow-init_1       |   File "/usr/local/lib/python3.6/logging/config.py", line 738, in configure_handler
airflow-init_1       |     result = factory(**kwargs)
airflow-init_1       |   File "/home/airflow/.local/lib/python3.6/site-packages/airflow/utils/log/file_processor_handler.py", line 46, in __init__
airflow-init_1       |     Path(self._get_log_directory()).mkdir(parents=True, exist_ok=True)
airflow-init_1       |   File "/usr/local/lib/python3.6/pathlib.py", line 1252, in mkdir
airflow-init_1       |     self.parent.mkdir(parents=True, exist_ok=True)
airflow-init_1       |   File "/usr/local/lib/python3.6/pathlib.py", line 1248, in mkdir
airflow-init_1       |     self._accessor.mkdir(self, mode)
airflow-init_1       |   File "/usr/local/lib/python3.6/pathlib.py", line 387, in wrapped
airflow-init_1       |     return strfunc(str(pathobj), *args)
airflow-init_1       | PermissionError: [Errno 13] Permission denied: '/opt/airflow/logs/scheduler'

It turns out that it is a permission problem. It is solved by setting the permission to 777.

# chmod 777 airflow/ -R
# docker-compose up airflow-init
......
airflow-init_1       | Admin user airflow created
airflow-init_1       | 2.0.1
airflow_airflow-init_1 exited with code 0

reference