Django DB Queue

In the most recent edition of the excellent Django newsletter, this blog post from Manos Pitsidianakis caught my eye. It demonstrates how to effectively ‘roll your own’ asynchronous job queue within Django, using the database, without the need to install something like Celery.

Speaking of which, Celery is obviously the foremost solution when it comes to asynchronous task processing in Python web apps, but, (truth-be-told), it can be a bit of a pain to get up and running. There are several things to do, like pick a broker and result backend, hack around to get hot re-loading, add monitoring and then you are probably dockerizing it, given this complexity.

I often find myself working on projects and thinking to myself, ‘I really need Celery now, but I’ll put it off because introducing it is so daunting’. Dramatiq seems to occupy ‘second place’, behind Celery, and whilst it's designed with a focus on greater simplicity, it still requires Redis or RabbitMQ to act as a message broker.

I had been wondering if there was a solution which could achieve roughly what Celery offers, without any of the associated set-up and dependancies, and I eventually stumbled across the Django DB Queue package. A quick review of the docs and it seemed to me to offer the lowest friction possible for having background tasks in a Django project.

The aforementioned blog post reminded me that I had been meaning to give it a try for a while, so here is my record of taking it for a spin, for the first time...


Initial Test

Glossing over the setup, I pip install the package, update my new Django project's settings and then perform a migration (we are using the database here as the queue, after all). If you are unsure about any of this, initial steps are on the project’s README

After creating an app, I then add an app level jobs.py file, which contains functions intended to be run in the background. This first example is incredibly similar to the example provided by the developers of Django DB Queue. 

 

#jobs.py

import time

def first_task(job):
    print("This is the job starting...")
    time.sleep(5)
    print("This is the job ending...")

I then go back to my settings file and define the following configuration:

# settings.py

...

JOBS = {
    "first_job": {
        "tasks": ["initial.jobs.first_task"],
    },
}

...

The convention in DDBQ is that a 'job' can contain multiple ‘tasks'. In this trivial example, my ‘first_job’ function is the task and it's a one task job. You’ll see from the settings file that the ‘tasks’ value is a list, so it could easily be expanded to contain multiple. Woth noting here that 'intial' in the name of my app, so 'initial.jobs.first_task' is the dot notiation path to the function. 

If required, the definition of the jobs and their tasks can be further enriched with creation and failure ‘callbacks’, so further code can be executed, depending on the state of the given tasks. See the DDBQ docs for more information. 

In order to create a job and send it to the queue, (which is the application’s DB), you just have to use the familiar Django Model ‘create’ method, passing in the job name, (as defined in the settings). Anyone familiar with Django’s ORM will appreciate how easy this is. Here is an example of me doing so, in the views.py file of this test application.

#views.py

from django.http import JsonResponse
from django_dbq.models import Job


def initial_test(request):
    Job.objects.create(name="first_job")
    resp = {'Ping':'Pong'}
    return JsonResponse(resp)

Next I just open up another shell, and use a management command to begin a ‘worker’ process, which will be running our jobs.

$ python manage.py worker
$ Starting job worker for queue "default" with rate limit 1/s

If I then start the Django development server and use a browser to hit the endpoint I’ve hooked up to the above view, it will return {"Ping": “Pong"} to me, but looking at the terminal where the worker process is running, I see:

$ This is the job starting...
$ This is the job ending...

Great, so my initial test worked! It has a nice, intuitive API and it really is that easy to get simple background tasks up and running, using the DB as the queue.


DISCOGRAPHY PORTFOLIO APP

The next thing I like to do, after running a little arbitrary test of a new package, is to try and build something more meaningful, just to get more of a feel for how I might use it ‘IRL'.

My main hobby is playing in bands, so creating web applications for musical artists is a conundrum which is never far from my mind. All my friends tend to make weird and noisy music, (to little commercial attention), so whilst they are undoubtedly committed artists, by and large, they have other jobs alongside their music. This means they might only release, say, one record a year and perform a handful of shows. This dynamic means updating any website becomes not only a bit of a chore, (as you are only doing it less than a dozen times a year), but also a big duplication of effort. There are multiple platforms for concert listings, social media announcements, archiving releases, etc, etc, etc that musicians are just expected to have and keep up to date, so lots of crossposting! 

For a while now I’ve had an idea for a band website, which just collates information from these external platforms, (OK I'll admit its not the most original idea, but bear with me). There could be background jobs which call the external APIs, check for changes and then update a database if required.

I am going to use DDBQ to build an integration with Discogs, the internets primary source of truth for recorded audio!

Inspecting the 'GET Artist' endpoint for the Discogs' API for my band Circuit Breaker, it returns the following response body: 

{
   "name":"Circuit Breaker (7)",
   "id":3552343,
   "resource_url":"https://api.discogs.com/artists/3552343",
   "uri":"https://www.discogs.com/artist/3552343-Circuit-Breaker-7",
   "releases_url":"https://api.discogs.com/artists/3552343/releases",
   "images":[
      {
         "type":"primary",
         "uri":"",
         "resource_url":"",
         "uri150":"",
         "width":500,
         "height":296
      },
      {
         "type":"secondary",
         "uri":"",
         "resource_url":"",
         "uri150":"",
         "width":600,
         "height":521
      }
   ],
   "profile":"Circuit Breaker are an Industrial band, consisting of London based brothers Peter and Edward Simpson. ",
   "urls":[
      "http://circuitbreakerband.tumblr.com/",
      "https://www.facebook.com/pages/Circuit-Breaker/143351282344677#",
      "https://soundcloud.com/circuitbreakerindustrial"
   ],
   "members":[
      {
         "id":4013401,
         "name":"Edward Simpson",
         "resource_url":"https://api.discogs.com/artists/4013401",
         "active":true
      },
      {
         "id":4013402,
         "name":"Peter Simpson (8)",
         "resource_url":"https://api.discogs.com/artists/4013402",
         "active":true
      },
      {
         "id":5792890,
         "name":"Els (7)",
         "resource_url":"https://api.discogs.com/artists/5792890",
         "active":true
      }
   ],
   "data_quality":"Needs Vote"
}

You can see that the 'releases_url' in the JSON response, gives us the endpoint to inspect all the recordings which we have released over the years:

{
   "pagination":{
      "page":1,
      "pages":1,
      "per_page":50,
      "items":7,
      "urls":{
         
      }
   },
   "releases":[
      {
         "id":5099743,
         "status":"Accepted",
         "type":"release",
         "format":"Cass, Ltd, C30",
         "label":"Tombed Visions",
         "title":"Grid",
         "resource_url":"https://api.discogs.com/releases/5099743",
         "role":"Main",
         "artist":"Circuit Breaker (7)",
         "year":2013,
         "thumb":"",
         "stats":{
            "community":{
               "in_wantlist":8,
               "in_collection":13
            }
         }
      },
      {
         "id":19910101,
         "status":"Accepted",
         "type":"release",
         "format":"CDr, Album",
         "label":"Not On Label (Circuit Breaker Self-Released)",
         "title":"Cairn",
         "resource_url":"https://api.discogs.com/releases/19910101",
         "role":"Main",
         "artist":"Circuit Breaker (7)",
         "year":2013,
         "thumb":"",
         "stats":{
            "community":{
               "in_wantlist":2,
               "in_collection":0
            }
         }
      },
      {
         "id":6109393,
         "status":"Accepted",
         "type":"release",
         "format":"12\"",
         "label":"Tombed Visions",
         "title":"TV12 ",
         "resource_url":"https://api.discogs.com/releases/6109393",
         "role":"Main",
         "artist":"Circuit Breaker (7)",
         "year":2014,
         "thumb":"",
         "stats":{
            "community":{
               "in_wantlist":9,
               "in_collection":25
            }
         }
      },
      {
         "id":884186,
         "title":"My Descent Into Capital",
         "type":"master",
         "main_release":7354287,
         "artist":"Circuit Breaker (7)",
         "role":"Main",
         "resource_url":"https://api.discogs.com/masters/884186",
         "year":2015,
         "thumb":"",
         "stats":{
            "community":{
               "in_wantlist":22,
               "in_collection":74
            }
         }
      },
      ...
   ]
}

So you have an 'artist' resource and that artist can have 'releases'. It will be easy enough to define models based on this relationship, to persist in our data base. 

# models.py

from django.db import models

class Artist(models.Model):
    name = models.CharField(max_length=200)
    biog = models.CharField(max_length=800)
    external_id = models.IntegerField()

class Release(models.Model):
    title = models.CharField(max_length=200)
    year = models.IntegerField()
    link = models.URLField()
    external_id = models.IntegerField()
    artist_external_id = models.ForeignKey(Artist,on_delete=models.CASCADE)

Next in the app level jobs.py file, I define a job to go and fetch the artist information from Discogs:

# jobs.py

import requests
from django.conf import settings
from django.core.exceptions import ObjectDoesNotExist
from .models import Artist,Release

def fetch_artist_details(job):
    id = settings.DISCOGS_ARTIST_ID
    req = requests.get(f"https://api.discogs.com/artists/{id}")
    artist_details = req.json()
    # create artist object in db if not already exists
    try:
        _ = Artist.objects.get(external_id=id)
        pass
    except ObjectDoesNotExist:
        Artist.objects.create(
            name = artist_details['name'],
            biog = artist_details['profile'],
            external_id = id
        )
    # pass the releases url to the workspace
    job.workspace['release_uri'] = artist_details['releases_url']
    print('fetch artist done!')

I have saved the unique identifier for Circuit Breaker in the Discogs system, in the settings.py file, which you can see we are obtaining via the django.conf module’s settings object. Obviously I've pip installed requests in order to make the http call to the discogs system. If this artist doesn't exist in our DB, we will create it, otherwise we pass the aforementioned release_uri to the job's workspace. A workspace is a dictionary which can be accessed by all tasks in a job. Think of it as a job-wide store for metadata which you may want to share across tasks. The next task I will create will be to get the artist's releases from discogs, so we are passing the url ‘discovered’ by the first task, to the second:

# jobs.py
...

def fetch_artist_releases(job):
    id = settings.DISCOGS_ARTIST_ID
    artist = Artist.objects.get(external_id=id)
    release_uri = job.workspace['release_uri']
    req = requests.get(release_uri)
    release_details = req.json()
    release_list = release_details['releases']
    for release in release_list:
        try:
            _ = Release.objects.get(external_id=release['id'])
            pass
        except ObjectDoesNotExist:
            Release.objects.create(
                title=release['title'],
                year = release['year'],
                link = release['resource_url'],
                external_id = release['id'],
                artist_external_id = artist
            )
    print('fetch releases done!')

You can see the same pattern again here. We only create release objects in our DB if they don’t already exist.

Why bother persisting this information at the DB level at all? If Discogs are already storing this information, why not just build an app which fetches resources from their servers each time that the page is loaded? Well, I’ve created web applications like this in the past and there is no getting away from it; the overhead of calling an external API, waiting for it's results and then marshalling a response back to a user, is often very slow. Sometimes an app can take that hit, sometimes not.

Given that I’m expecting the information at the Discogs end to change very, very rarely, for this use-case it makes more sense to store at my end, then I could replicate the data into some kind of read-only DB for improved performance, or it could be to an in memory datastore, cache, etc…

Here the new discogs job as defined in my settings and how I then hook it up to two different views:

# settings.py

...

JOBS = {
    "populate_db": {
        "tasks": [
            "discogs.jobs.fetch_artist_details",
            "discogs.jobs.fetch_artist_releases"
            ],
    },
}

...
# views.py
...

def populate_db(request):
    if request.method == "POST":
        Job.objects.create(name="populate_db")
        return TemplateResponse(request, template='populate.html',context={'job':'dispatched'})
    else:
        return TemplateResponse(request, template='populate.html')

This view is for the ‘admin’ page of the ‘CMS’ (if I can even call it that). Now rather than filling out endless forms, the site admin can hit a button to get our worker process to go and fetch all the relevant info from discogs. The pattern should be known to anyone who has implemented function based view in a python web app. If it’s a POST request (i.e the admin has hit that button), we create an instance of the job, otherwise we are just giving the plain html template to that user.

The only other view I have is the actually display the release information to an end user:

# views.py 
...

def display_discogs(request):
    id = settings.DISCOGS_ARTIST_ID
    artist  = Artist.objects.get(external_id=id)
    releases = Release.objects.filter(artist_external_id=artist)
    return TemplateResponse(request, template='discogs.html', context={'artist':artist,'releases':releases})

You can see this just gets what is stored in the DB, and pushes it into a template.

<!-- discogs.html -->

<h1>{{ artist.name }}</h1>

<table>
  <thead>
    <tr>
      <th scope="col">Releases</th>
    </tr>
  </thead>
  <tbody>
  {% for release in releases %}
    <tr>
      <td>{{ release.title }}</td>
    </tr>
    {% endfor %}
  </tbody>
</table>

Et Voila!

Possibly the worst looking band discography page ever? But one which used a background job to fetch all the info from an external server. You can compare it to our actual entry in Discogs.


Obviously if this eventually becomes a pattern which I plan to employ in a 'real' project, I would build this out to a far greater degree. For starters, I would give it an actual frontend to make it look nice and with authenticated calls to the Discogs API, I could get urls to the images hosted there, and more… One obvious downside is that having an admin user go in and intermittently hit a button should really be automated to run at a specific intervals. Celery has Celery-Beat, whilst Dramatiq reccomends using third party packages in conjunction to achieve scheduling. Maybe my next post could be to see if I can get reoccurring tasks running at a regular intervals? 

I enjoyed building this little POC with Django DB Queue. It has a really nice API; easy to understand and does exactly what you want and nothing more. I will definitely be looking to use it in a future project.

Django DB Queue is authored and maintained by the DabApps agency. You can checkout their Github page, as well as listen to an intriguing chat with their founder Jamie Matthews on the Django Chat podcast.

You may also like: