A Tour Of Tornado (Part 2)

As mentioned in my previous post, Tornado is not like most other Python web frameworks. The dominant paradigm in the Python web world is WSGI, which is a specification that will allow for different web servers to communicate with different Python applications, via a WSGI server. Incoming HTTP requests are chaperoned from the Web-server/reverse-proxy, to the WSGI-compliant Python app and then responses returned by the same route, via something like Unicorn, uWSGI, etc, etc. 

Tornado doesn’t conform to the WSGI standard. It comes with its own production-quality server, right out of the box. So far, so good, but a WSGI server like Gunicorn has multiple worker processes, allowing it to serve many incoming, overlapping requests. How do we overcome this problem for a Tornado App? Although Tornado permits you a lot of freedom to make this decision, the general consensus seems to be to run several instances of the same process and then load-balancing incoming requests across them.

In this post, I’m going to explore a simple way to achieve this.

PART 1 Get the tornado app up and running on a remote server.

In the repo, I’ve added a new ‘HomeHandler’ and routed it to ‘/‘, so there an endpoint which serves a json for us to easily debug.

# app.py

...

class HomeHandler(tornado.web.RequestHandler): 
    def get(self): 
        self.write({'foo': ‘bar’})

...

Next, I spin up an Ubuntu (20.10) VM from my favourite IAAS provider and then SSH in. It’s worth noting here that Python and Git are handily already pre-installed on this operating system. The first step is to create the directory for the web application. I will do so under /var/www/, as one convention dictates.

$ Sudo mkdir /var/www/tornado

Then I cd into that directory and pull my repository from version control.

$ git clone https://github.com/pj-simpson/basic-tornado.git

In this instance I am going to use Systemd to run the Tornado App. Systemd is, (amongst other things), a service manager. This allows us to easily stop and start programs as background (a.k.a daemon) processes. These processes are defined as ‘Units’ in a ‘.service’ file. I found this good definition HERE.

A unit configuration file whose name ends in ".service" encodes information about a process controlled and supervised by systemd.

Create one of these files like so, using the Nano text editor (which also comes with Ubuntu):

$ sudo nano /etc/systemd/system/tornado.service
[Unit]
Description=Test Tornado App

[Service]
Type=simple
Restart=always
ExecStart=/usr/bin/python3 /var/www/tornado/basic-tornado/app.py

[Install]
WantedBy=multi-user.target

You’ll notice that the file has an ini-like syntax. Description is a user-friendly description of the process, whilst the WantedBy declaration specifies that the process should start after the linux machine boots up. The [Service] block is the most important thing here and ExecStart defines the command which gets run when the service is started.

Save this file and then close it,  (from the nano text editor that’s ‘ctrl+O’ then ‘ctrl+X’). Then we can run these commands to interact with Systemd directly. Systemctl is the command line interface for Systemd.

$ systemctl daemon-reload

This restarts and reloads all the unit files, (including our new one), and generate the services. We can now start our service like so:

$ systemctl start tornado.service

Test to see that its running locally: 

$ curl http://127.0.0.1:8001/
{ 'foo' : 'bar' }

Good! So we can demonise our Tornado app and manage it via Systemd. The next step is to get a number of these up and running, all on different ports.

$ systemctl stop tornado.service

PART 2 run multiple instances

Systemd has the concept of a template unit file. Template file naming convention is {service_name}@{argument}.service, allowing for a variable to be passed after ‘@‘ and before ‘.service’. This has been designed specifically for running multiple instances of the same service and suits our use-case perfectly, as we want to do exactly this, with only one detail altering between each instance of each running process, (the port).

$ sudo nano /etc/systemd/system/tornado@.service
[Unit]
Description=Test Tornado App
PartOf=tornado.target

[Service]
Type=simple
Restart=always
ExecStart=/usr/bin/python3 /var/www/tornado/basic-tornado/app.py --port=%i

[Install]
WantedBy=multi-user.target

%i is the variable passed in when the service is started. The Tornado app parses  —port arguments given to it and will dynamically listen on whichever port in passed (falling back on a default of 8000).

# app.py

...

from tornado.options import define, options

define("port", default=8000, type=int)

...

app.listen(options.port)

So, for example, we could run our tornado app on port 8002 of the localhost like so:

$ systemctl start tornado@8002.service

And then also repeat this however many times we wish, for whichever other ports we want to use, but for multiple related processes, it would be far more efficient to predefine this from a single location.

The other type of unit file needed to achieve this is a target file. Target files group units together, (and explains ‘PartOf=’ in the template unit file). 

$ sudo nano /etc/systemd/system/tornado.target
[Unit]
Description=Multiple Tornado Processes
Requires=tornado@8001.service tornado@8002.service tornado@8003.service tornado@8004.service 

[Install]
WantedBy=multi-user.target

You can see from the Requires directive that we are going to run 4 instances of our tornado service (defined in the template), running from ports 8001-8004. They can now be started from a single command:

$ systemctl start tornado.target

We can use curl to ensure that we are serving the app from each port correctly.

$ curl http://127.0.0.1:8001
{'foo' : 'bar'}
$ curl http://127.0.0.1:8002
{'foo' : 'bar'}
$ curl http://127.0.0.1:8003
{'foo' : 'bar'}
$ curl http://127.0.0.1:8004
{'foo' : 'bar'}

It is also worth noting that using systemctl, we can still manage the individual processes, independent of the target file.

Here is the result of inspecting an individual daemon: 

$ systemctl status tornado@8003.service


root@peter-testing:/var/www/tornado/basic-tornado# systemctl status tornado@8003.service
● tornado@8003.service - Test Tornado App
     Loaded: loaded (/etc/systemd/system/tornado@.service; disabled; vendor preset: enabled)
     Active: active (running) since Tue 2022-10-04 18:29:44 UTC; 2min 22s ago
   Main PID: 17467 (python3)
      Tasks: 1 (limit: 1119)
     Memory: 19.8M
        CPU: 288ms
     CGroup: /system.slice/system-tornado.slice/tornado@8003.service
             └─17467 /usr/bin/python3 /var/www/tornado/basic-tornado/app.py --port=8003

Oct 04 18:29:44 peter-testing systemd[1]: Started Test Tornado App.
Oct 04 18:30:44 peter-testing python3[17467]: [I 221004 18:30:44 web:2239] 200 GET / (127.0.0.1) 0.82ms

Part 3 Load Balance & Reverse Proxy

 

Now that there are 4 instances of our Tornado App running on the server, we need a point of connection for the outside world to reach our server, but then have those http requests distributed across the different Tornado processes.

A reverse proxy seems to most appropriate tool here as its a type of web server which sits in-front of other servers and forwards on external requests to the appropriate backends. However, we also need to distribute the requests across the 4 servers, as its no good to send the request to an app which is busy at the time.

In this instance I’m going to use the Apache Webserver, as this is incredibly popular and flexible, (according to Wikipedia its the most popular).

$ sudo apt install apache2

We then need to install some modules which will give us extra functionality. 

mod_proxy along with mod_proxy_http, allow us to use the web server as a reverse proxy for HTTP requests. 

proxy_balancer and lbmethod_byrequests give us load balancing capabilities (distributing them according to a request counting algorithm). 

$ sudo a2enmod proxy_balancer
$ sudo a2enmod lbmethod_byrequests
$ sudo a2enmod proxy
$ sudo a2enmod proxy_http

 

We can now edit the Apache configuration file, to define the behaviour of our load-balancer/reverse-proxy. 
 

$ sudo nano /etc/apache2/sites-available/000-default.conf

The contents of the file should be like so. This isnt an Apache tutorial, so I wont go over this in detail, but sufficed to say I think the XML format speaks for itself: 
 

<VirtualHost *:80>


        <Proxy balancer://tornadocluster>
                BalancerMember http://127.0.0.1:8001
                BalancerMember http://127.0.0.1:8002
                BalancerMember http://127.0.0.1:8003
                BalancerMember http://127.0.0.1:8004
        </Proxy>

        ProxyPreserveHost On

        ProxyPass / balancer://tornadocluster/
        ProxyPassReverse / balancer://tornadocluster/

        ErrorLog ${APACHE_LOG_DIR}/error.log
        CustomLog ${APACHE_LOG_DIR}/access.log combined

</VirtualHost>

The service will need restarting: 

$ sudo systemctl restart apache2

I have also updated my handler, in order to better demonstrate the loadbalancing in action:

# app.py 

... 

class HomeHandler(tornado.web.RequestHandler):
    def get(self):
        self.write({"You have been load balanced to the following port":options.port})

...

 

$ curl http://mydomain.com
{"You have been load balanced to the following port":8001}
$ curl http://mydomain.com
{"You have been load balanced to the following port":8002}
$ curl http://mydomain.com
{"You have been load balanced to the following port":8003}
$ curl http://mydomain.com
{"You have been load balanced to the following port":8004}

A final good tip is the journalctl command, which allows us to inspect the logs written for a particular service. This is handy for debugging ('-u' flag filters by unit) 

$ journalctl -u tornado@8001.service

...

Oct 04 20:26:30 peter-testing python3[18833]: [I 221004 20:26:30 web:2239] 200 GET / (127.0.0.1) 1.09ms
Oct 04 20:26:59 peter-testing python3[18833]: [E 221004 20:26:59 web:1789] Uncaught exception GET /domains/peter (127.0.0.1)
Oct 04 20:26:59 peter-testing python3[18833]:     HTTPServerRequest(protocol='http', host='178.62.27.86', method='GET', uri='/domains/peter', version='HTTP/1.1'

...

And there you have it! A brutally simple implementation of a load balancer and multiple services. Its quite nice interfacing from a web server directly to a Python app, without having to faff around with any WSGI stuff. 
 

ADDENDUM (Roll your own LoadBalancer).

I had a go at making my own HTTP Loadbalancer, using Tornado itself! 

You may also like: