Installation of the CELERY Task Queue¶
Some tasks of CATMAID (e.g. cropping) are done in the background.
These are managed by the task queue Celery. The job of telling
Celery about a new task to process is done by a message broker –
different brokers are supported by Celery. By default a simple
Python module is utilized that uses the Django data base to store
messaging information. Alternatively, one may use message brokers
like RabbitMQ, which can be configured in the settings.py
file.
In case Celery is not running it is no problem for Django/CATMAID. As long as the message broker is around, CATMAID will accept tasks (e.g a cropping job). They will get executed when Celery is running again.
This first section guides you through the setup of Celery and the simplest message broker provided by Kombu, it used Django’s database to store messages. Afterwards, an alternative broker, RabbitMQ, is described.
Prerequisites¶
The setup of Django needs to be completed before configuring Celery.
Installation¶
The configuration of celery and the message broker happens in the
settings.py
file. The example file contains a configuration that
should be ready to go. A quick run through the relevant settings
follows.
To be able to use Celery, it needs to be imported and initialized first. This is done by this lines:
import djcelery
djcelery.setup_loader()
To specify how many concurrent tasks Celery should execute, you can use the following variable:
CELERYD_CONCURRENCY = 1
By default this is the number of CPUs available and the above line sets it to one.
Then Celery and Django need some information about the message broker in use. Since we here refer to kombu, the following settings are important to get this specific broker to work:
INSTALLED_APPS += ("kombu.transport.django",)
BROKER_URL = 'django://'
This will make kombu use the Django database for its messages. Although, this is fine for a small and simple setup, it is recommended to use a different message broker for larger setups. There is more information on the limitations of and alternatives to this approach in Celery’s documentation.
To initialize Celery, call the migrate command of your manage.py
(from within the virtualenv):
python manage.py migrate
This will create some tables for Celery and django-kumbo in the Django data base. You should then be able to run the Celery daemon (also from within the virtualenv):
python manage.py celeryd -l info
The Celery daemon should be integrated in your system to be started
automatically. There are is a init
script available in the Celery code
base that could be used here. Also, make sure that this Celery daemon
process has the permissions to write to the temporary directory
(TMP_DIR
).
Message Brokers¶
It is the so called message broker who takes tasks and tells Celery to execute them. There are several ones around and the section uses RabbitMQ as an alternative to the simple Django based one used above. RabbitMQ is very fast and reliable and can be configured to be manageable through Django’s admin interface.
First, the RabbitMQ server has to be installed:
sudo apt-get install rabbitmq-server
This should start it automatically. RabbitMQ comes with a plugin infrastructure and one particular useful plugin is one that adds support for management commands. Based on this one is able to get information on Celery workers through the broker from within Django’s admin interface. To enable it, call:
sudo /usr/lib/rabbitmq/lib/rabbitmq-server-3.2.3/sbin/rabbitmq-plugins enable rabbitmq_management
After enabling or disabling plugins, RabbitMQ has to be restarted:
sudo service rabbitmq-server restart
To display a list of all available plugin and whether they are enabled, call:
sudo /usr/lib/rabbitmq/lib/rabbitmq-server-3.2.3/sbin/rabbitmq-plugins list
This also enables a web-interface will be available on port 15672. The default user and password combination is guest/guest.
To collect worker events, one has to start celeryd
with the -E
argument,
e.g.:
python manage.py celeryd -l info -E
And to retrieve event snapshots from all workers, start celerycam
:
python manage.py celerycam
All tasks will then be manageable from with Django’s admin interface.
Periodic Tasks¶
The Celery infrastructure can also be used to execute tasks periodically.
For example, it might be wanted that the clean-up of cropped stacks
should take place every night. This can be realized without changing any
source code, but add very little Python code to two files. First the
settings.py
file needs to be extended to let Celery workers import a
tasks file:
# Disable automatic clean-up of the cropping tool
CROP_AUTO_CLEAN = False
# Let Celery workers import our tasks module
CELERY_IMPORTS = ("tasks", )
The code above also disables the automatic cleaning which is done on every download request for a cropped stack.
Next we need to create a new file tasks.py
in the folder where the
settings.py
file resides. The name “tasks” is used by convention, but
is in fact arbitrary. If it is changed the CELERY_IMPORTS
variable
needs to be adjusted, too. This file contains the task definitions:
from celery.schedules import crontab
from celery.task import periodic_task
# Define a periodic task that runs every day at midnight and noon.
# It removes all cropped stacks that are older than 12 hours.
from catmaid.control.cropping import cleanup as cropping_cleanup
@periodic_task( run_every=crontab( hour="0,12" ) )
def cleanup_cropped_stacks():
twelve_hours = 43200 # seconds
cropping_cleanup( twelve_hours )
return "Cleaned cropped stacks directory"
One can also use the datetime.timedelta
function to specify when and
how often the task should be run.
Despite defining such a task, the Celery process needs to be run in so-called “beat” mode:
python manage.py celeryd -B -l info
This mode requires that celeryd
can write to the project directory.
By default it will create there a file called celerybeat-schedule
.
To adjust this file name and path, have a look in the Celery manual.
Again, an init
script for automatic starting is available in the
Celery code base.
Celery Daemon¶
It is not very convenient to have Celery run manually all the time. After all, a server reboot wouldn’t bring it up again. Therefore it is desirable to have Celery run as an automatically started as a daemon.
If you don’t care whether Celery is automatically stated after booting, you
can run it as a daemon also from your terminal as well. Make sure you have
a folder ready where the user running Celery has permissions to write.
Here we assume that there is a folder run
in which log and pid files
are created:
python manage.py celeryd --logfile run/celeryd.log --pidfile run/celeryd.pid -l info
Or when using celerybeat
as well:
python manage.py celeryd --logfile run/celeryd.log --pidfile run/celeryd.pid -B -l info
Now this could be run in a Screen session and you can safely disconnect from the server. However, like said before, this won’t survive a server reboot.
Supervisord¶
Supervisord is a process management tool which makes setting up processes very
easy. This documentation talks here in detail about it. A
script that can be used with the example provided there would look like this
(run-celery.sh
in the example):
#!/bin/bash
# Virtualenv location
ENVDIR=/path/to/catmaid/django/env
# Django project directory
DJANGODIR=/path/to/catmaid/django/projects
# Which settings file should Django use
DJANGO_SETTINGS_MODULE=mysite.settings
echo "Starting celery as `whoami`"
# Activate the virtual environment
cd $DJANGODIR
source $ENVDIR/bin/activate
export DJANGO_SETTINGS_MODULE=$DJANGO_SETTINGS_MODULE
export PYTHONPATH=$DJANGODIR:$PYTHONPATH
# Run Celery
exec ./mysite/manage.py celery worker -l info -E
Init¶
Depending on your operating system manages the boot process, you can use
the init
scripts provided in the Celery source. A detailed description
can be found in the
Celery documentation.
In short, you need to to do the following: First, get the following file:
https://github.com/ask/celery/blob/master/contrib/generic-init.d/celeryd
Copy it to the folder /etc/init.d/
and mark it executable. Then you need
to create a default configuration file /etc/default/celeryd
(taken from
the Celery documentation):
# Name of nodes to start, here we have a single node
CELERYD_NODES="w1"
# or we could have three nodes:
#CELERYD_NODES="w1 w2 w3"
# Where to chdir at start. (CATMAID Django project dir.)
CELERYD_CHDIR="/path/to/CATMAID/django/projects/mysite/"
# Python interpreter from environment. (in CATMAID Django dir)
ENV_PYTHON="/path/to/CATMAID/django/env/bin/python"
# How to call "manage.py celeryd_multi"
CELERYD_MULTI="$ENV_PYTHON $CELERYD_CHDIR/manage.py celeryd_multi"
# How to call "manage.py celeryctl"
CELERYCTL="$ENV_PYTHON $CELERYD_CHDIR/manage.py celeryctl"
# Extra arguments to celeryd
CELERYD_OPTS="--time-limit=300 --concurrency=1"
# Name of the celery config module.
CELERY_CONFIG_MODULE="celeryconfig"
# %n will be replaced with the nodename.
CELERYD_LOG_FILE="/var/log/celery/%n.log"
CELERYD_PID_FILE="/var/run/celery/%n.pid"
# Workers should run as an unprivileged user.
CELERYD_USER="celery"
CELERYD_GROUP="celery"
# Name of the projects settings module.
export DJANGO_SETTINGS_MODULE="settings"
Please adjust the CELERY_CHDIR
variable and the --concurrency
parameters to your situation. Also, this configuration expects that an
unprivileged user and group with the name celery
has been created.
If this hasn’t been done already, you can do this as follows:
sudo adduser --system --no-create-home --disabled-login --disabled-password --group celery
Finally, you have to tell the system about the new init
script:
sudo update-rc.d celeryd defaults
Now you (and the system while booting up) should be able to start celery:
sudo service celeryd start
Note, that the celery
user needs to have read and write access
to the temporary directory of CATMAID. E.g the cropping tool will
save its cropped sub-stacks there.
If you want to have periodic tasks managed by a celerybeat
daemon, some steps are yet to be done. First, you need to get another
init
script. The Celery repository provides one as well:
https://github.com/ask/celery/blob/master/contrib/generic-init.d/celerybeat
Again, this needs to be moved to the folder /etc/init.d/
and
marked executable. Finally, tell the operating system about it:
sudo update-rc.d celerybeat defaults
Next, append the following lines to your Celery configuration file
/etc/default/celeryd
:
# Where to chdir at start.
CELERYBEAT_CHDIR="$CELERYD_CHDIR"
# Path to celerybeat
CELERYBEAT="$ENV_PYTHON $CELERYD_CHDIR/manage.py celerybeat"
# Extra arguments to celerybeat
CELERYBEAT_OPTS="--schedule=/var/run/celerybeat-schedule"
CELERYBEAT_LOG_FILE="/var/log/celery/celerybeat.log"
CELERYBEAT_PID_FILE="/var/run/celery/celerybeat.pid"
# Celery beat should run as an unprivileged user
CELERYBEAT_USER="celery"
CELERYBEAT_GROUP="celery"
A “beating” Celery can now be started additionally:
sudo service celerybeat start
With these settings periodic tasks get executed after a reboot as well.