Private PyPI

As part of my company’s ‘Hackathon’, some colleagues and I mashed together a Python client library for our REST API. As it was for internal use only, we wanted to make it installable as a package, but not to put it on PyPI. The answer is obviously to run your own version of PyPI!

There is an amazing project called pypiserver; which is actually a Bottle web app, and is compatible with ‘pip install’. There are plenty of tutorials out there as to how to get pypiserver up and running, but none quite how I wanted to do it. Bottle is a WSGI framework, so we’d expect to run it behind something like Gunicorn, however the docs don’t go into much detail about that and the tutorials I could find relied on Bottle’s own WSGI server, (which isn’t production quality).

Let's do a quick speed run. On an Ubuntu VM, I make a virtualenv:

$ python3 -m venv venv

Activate it

$ source venv/bin/activate

Install pypiserver and then Gunicorn

$ pip install pypiserver
$ pip install gunicorn

And lastly I make the packages directory which is where pypiserver will store the package tar files. 

I then start Gunicorn server, (with 4 worker processes). The 'root' keyword arg is the absolute path to the aforementioned dir (I’m on Oracle cloud here, ubuntu is the user and I’m putting all this in a directory called pps):

$ gunicorn -w4 'pypiserver:app(root="/home/ubuntu/pps/packages")'
[2023-01-04 20:24:30 +0000] [56475] [INFO] Starting gunicorn 20.1.0
[2023-01-04 20:24:30 +0000] [56475] [INFO] Listening at: http://127.0.0.1:8000 (56475)
[2023-01-04 20:24:30 +0000] [56475] [INFO] Using worker: sync
[2023-01-04 20:24:30 +0000] [56476] [INFO] Booting worker with pid: 56476

SSH-ing into a new terminal and curl the loopback address on the right port, to check its working as expected: 

$ curl http://127.0.0.1:8000/
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Welcome to pypiserver!</title>
  </head>
  <body>
    <h1>
      Welcome to pypiserver!
    </h1>
    <p>
      This is a PyPI compatible package index serving 0 packages.
    </p>
    <p>
      To use this server with <code>pip</code>, run the following command:
      <pre>
        <code>pip install --index-url http://127.0.0.1:8000/simple/ PACKAGE [PACKAGE2...]</code>
      </pre>
    </p>
    <p>
      To use this server with <code>easy_install</code>, run the following command:
      <pre>
        <code>easy_install --index-url http://127.0.0.1:8000/simple/ PACKAGE [PACKAGE2...]</code>
      </pre>
    </p>
    <p>
      The complete list of all packages can be found <a href="/packages/">here</a> or via the <a href="/simple/">simple</a> index.
    </p>
    <p>
      This instance is running version 1.5.1 of the <a href="https://pypi.org/project/pypiserver/">pypiserver</a> software.
    </p>
  </body>
</html>

That’s all working fine! Now to make it reachable from the outside world, so next we’ll reach for Nginx.
 


NGINX


Nginx is going to be playing the role of a reverse proxy here, forwarding on requests from the public internet to the Gunicorn service.
 

$ sudo apt update
$ sudo apt install nginx
$ sudo ufw allow 'Nginx HTTP’ 

I point my domain name at the IP address of my Linux box and can see that we get the Nginx default page.


We can change the default nginx conf to point at the local server.
 

$ sudo nano /etc/nginx/sites-available/default
server {
        listen 80;
        server_name www.peterprivatepypi.xyz;
        location / {
                proxy_pass http://127.0.0.1:8000/;
        }
}



That's all very well and good, but the next step is to ensure that the website gets served securely with HTTPS. 

 



LETS ENCRYPT & CERTBOT.

Certbot is a tool which enables sys admins to automate ‘Lets Encrypt’ certificates to serve sites over HTTPS.

I am going to rename the default config file to the name of my domain and then create a sym link from the sites-available folder to the sites-enabled folder. This is but one convention for running Nginx, which ultimately could allow for multiple sites to be served off the same machine.

$ sudo mv default peterprivatepypi.xyz
$ sudo ln -s /etc/nginx/sites-available/peterprivatepypi.xyz /etc/nginx/sites-enabled

I install certbot as well as the python3-certbot-nginx package (Hopefully its obvious why that’s useful).

$ sudo apt-get install certbot
$ sudo apt-get install python3-certbot-nginx

The certbot command (with extra config passed to it), goes off and does the hardwork getting the correct certificates for the domain I control:

$ sudo certbot --nginx -d peterprivatepypi.xyz -d www.peterprivatepypi.xyz

Subjective I know, but there is some lovely output: 
 

Saving debug log to /var/log/letsencrypt/letsencrypt.log
Requesting a certificate for peterprivatepypi.xyz and www.peterprivatepypi.xyz

Successfully received certificate.
Certificate is saved at: /etc/letsencrypt/live/peterprivatepypi.xyz/fullchain.pem
Key is saved at:         /etc/letsencrypt/live/peterprivatepypi.xyz/privkey.pem
This certificate expires on 2023-04-05.
These files will be updated when the certificate renews.
Certbot has set up a scheduled task to automatically renew this certificate in the background.

Deploying certificate
Successfully deployed certificate for peterprivatepypi.xyz to /etc/nginx/sites-enabled/peterprivatepypi.xyz
Successfully deployed certificate for www.peterprivatepypi.xyz to /etc/nginx/sites-enabled/peterprivatepypi.xyz
Congratulations! You have successfully enabled HTTPS on https://peterprivatepypi.xyz and https://www.peterprivatepypi.xyz

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
If you like Certbot, please consider supporting our work by:
 * Donating to ISRG / Let's Encrypt:   https://letsencrypt.org/donate
 * Donating to EFF:                    https://eff.org/donate-le

If we inspect the Nginx config file, we can see that Certbot has written to it: 

server {
	server_name peterprivatepypi.xyz  www.peterprivatepypi.xyz;
	location / {
		proxy_pass http://127.0.0.1:8000/;
	}

    listen [::]:443 ssl ipv6only=on; # managed by Certbot
    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/peterprivatepypi.xyz/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/peterprivatepypi.xyz/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot


}

server {
    if ($host = www.peterprivatepypi.xyz) {
        return 301 https://$host$request_uri;
    } # managed by Certbot


    if ($host = peterprivatepypi.xyz) {
        return 301 https://$host$request_uri;
    } # managed by Certbot


	listen 80 default;
	listen [::]:80 default;
	server_name peterprivatepypi.xyz  www.peterprivatepypi.xyz;
    return 404; # managed by Certbot


}

Nginx will need a restart* but peterprivatepypi is now secure, which is essential for a repository from which people will be downloading software packages!

*

$ sudo systemctl restart nginx


 



 



TIDYING UP
 

The last remain things that I'd like to do here are; make it so that a password is required to upload packages to my repository, ensure that packages uploaded can be overwritten, (for some reason this isn’t the default), and then run the pypiserver, (with all this config), as a systemd service. 

The pypiserver docs recommend configuring ‘Apache-like’ authentication. I’m not entirely sure what that means but we install ‘apache2-utils’ (extra apache web server plugins), so that we can in turn get ‘htpasswd’, (a utility which stores basic authentication credentials in a special file).

$ sudo apt install apache2-utils
$ pip install passlib

The following command creates the file with the username/password combo (password will be entered as a prompt).

$ htpasswd -sc htpasswd.txt me@example.com

and the Gunicorm process running pypiserver now needs to be told about where to find the user names and passwords, so the startup command is now: 

$ gunicorn -w4 'pypiserver:app(root="/home/ubuntu/pps/packages", passwords=“/home/ubuntu/pps/htpasswd.txt”)

'passwords' is a path to wherever you created the htpasswd.txt file! 

I can use twine to try and upload some nonsense package to idiot-check that the authentication does actually kick in:

$ twine upload --repository-url https://peterprivatepypi.xyz/ dist/*
Uploading distributions to https://peterprivatepypi.xyz/
Enter your username: 
WARNING  Your username is empty. Did you enter it correctly?                            
WARNING  See https://twine.readthedocs.io/#entering-credentials for more information.   
Enter your password: 
WARNING  Your password is empty. Did you enter it correctly?                            
WARNING  See https://twine.readthedocs.io/#entering-credentials for more information.   
Uploading dummy-0.1.0-py3-none-any.whl
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.9/4.9 kB • 00:00 • ?
WARNING  Error during upload. Retry with the --verbose option for more details.         
ERROR    HTTPError: 401 Unauthorized from https://peterprivatepypi.xyz/                 
         Unauthorized


Whilst we are on the subject of uploading, the default config of pypiserver doesn’t allow for overwrites of the same package. I want newer versions of the package I am uploading to replace the old ones, so that’s an extra argument for Gunicorn:

$ gunicorn -w4 'pypiserver:app(root="/home/ubuntu/pps/packages", passwords=“/home/ubuntu/pps/htpasswd.txt",overwrite="1")

Now seems like a good time to finally make running the pypiserver a system service. I've done another blog post where I go into a bit more detail about systemd and python, so read that if you like

Create the 'unit' file: 

$ sudo nano /etc/systemd/system/pypi.service
[Unit]
Description=Gunicorn process for Pypi server

[Service]
Type=simple
Restart=always
ExecStart=/home/ubuntu/pps/venv/bin/gunicorn -w4 'pypiserver:app(root="/home/ubuntu/pps/packages")'

Reload the config: 

$ sudo systemctl daemon-reload

Start the service: 

$ systemctl start pypi.service

...and there you have it; we have a secure instance of pypiserver running under Gunicorn and Nginx which can be easily controlled and monitored through the systemd service manager. For the purposes of my project this worked really well, I was able to push up new versions of my Python package and my colleagues were able to pip install it and use the package in their Python programs! 
 

$ systemctl status pypi.service
● pypi.service - Gunicorn process for Pypi server
     Loaded: loaded (/etc/systemd/system/pypi.service; static)
     Active: active (running) since Thu 2023-01-26 23:20:54 UTC; 4 weeks 1 day ago
   Main PID: 311440 (gunicorn)
      Tasks: 5 (limit: 1076)
     Memory: 72.4M
        CPU: 7min 47.687s
     CGroup: /system.slice/pypi.service
             ├─311440 /home/ubuntu/pps/venv/bin/python3 /home/ubuntu/pps/venv/bin/gunicorn -w4 "pypiserver:app(root=\"/home/ubuntu/pps/packages\", passwords=\"/home/ubuntu/pps/htpasswd.txt\",overwrite=\"1\")"
             ├─311442 /home/ubuntu/pps/venv/bin/python3 /home/ubuntu/pps/venv/bin/gunicorn -w4 "pypiserver:app(root=\"/home/ubuntu/pps/packages\", passwords=\"/home/ubuntu/pps/htpasswd.txt\",overwrite=\"1\")"
             ├─311443 /home/ubuntu/pps/venv/bin/python3 /home/ubuntu/pps/venv/bin/gunicorn -w4 "pypiserver:app(root=\"/home/ubuntu/pps/packages\", passwords=\"/home/ubuntu/pps/htpasswd.txt\",overwrite=\"1\")"
             ├─311444 /home/ubuntu/pps/venv/bin/python3 /home/ubuntu/pps/venv/bin/gunicorn -w4 "pypiserver:app(root=\"/home/ubuntu/pps/packages\", passwords=\"/home/ubuntu/pps/htpasswd.txt\",overwrite=\"1\")"
             └─311445 /home/ubuntu/pps/venv/bin/python3 /home/ubuntu/pps/venv/bin/gunicorn -w4 "pypiserver:app(root=\"/home/ubuntu/pps/packages\", passwords=\"/home/ubuntu/pps/htpasswd.txt\",overwrite=\"1\")"

TIDBIT:

As inferred Pypi itself isnt pypiserver, but actually a Pyramid web app called Warehouse. The source code can be viewed HERE.

You may also like: