The Startup CTO’s Guide to Ops (3 of 3): A Minimal Production and Deployment Setup

This is part 3 of a 3-part series on operations setups for early-stage startups. Previously, part 1 discussed guiding principles and requirements, and part 2 toured 3rd party services and open-source tools.

In this last post of our series, we’ll take a deep dive into a case study of production hosting and deployment.

Note that your goals and tools are undoubtedly different from ours, so this should not be read as a proscriptive “here’s what you should do…” guide. Our approach is very much at the scrappy, cheap, and DIY end of the spectrum. That being said, we will do the following:

  • Keep costs as low as possible
  • Have a fast website that handles our target load
  • Make deployments painless
  • Ensure internal and external monitoring
  • Track business metrics
As long as it meets the requirements (art source)

Outline

Here’s what I’ll be covering in this post:

Production configuration

Deployment and versioning

Conclusion: Focus on what matters

Production setup

I’ll walk through our production setup first, then I’ll discuss our deployment and versioning process.

Our application stack

  • Language: Python 2.7
  • Web framework: Pyramid (WSGI)
  • Javascript: our site isn’t front-end heavy, but we use a fair bit of JQuery, JQuery datatables, and Chart.js
  • Web server: Waitress (Pyramid default, works fine)
  • OS: Ubuntu Linux 16.04
  • Database: PostgreSQL 9.5 (extensions: PostGIS, foreign data wrappers for Sqlite and CSV)
  • Load balancer/web server: Nginx

SSD

Unless you have so much data it would be prohibitively expensive, use SSD storage to speed up your database and any host needing significant disk IO.

Codero hosting

In the previous post I explained why it would be reasonable for an early-stage website startup to run on a single dedicated leased server.

We lease one Quad Xeon (8 core), 12Gb RAM, SSD server for about $130/mo. We picked Codero as our hosting provider because they had good reviews and prices. They also offer cloud VMs in addition to dedicated hosts, which gives us more scaling options. We’ve been happy so far!

Nginx

We run staging and production versions of our site. All traffic is served out of HTTPS. Nginx will handle SSL and proxy requests to our web server processes, which run locally. For now, we run two processes each for prod and staging; and each process has its own thread pool.

The Nginx config for prod-ssl looks like:

upstream myservice_prod {
server 127.0.0.1:8000;
server 127.0.0.1:8001;
}
server {
server_name mydomain.com www.mydomain.com;
listen 443 ssl;
ssl on;
ssl_certificate /etc/nginx/ssl/bundle.crt;
ssl_certificate_key /etc/nginx/ssl/star_mydomain_com.key;
location / {
proxy_pass http://myservice_prod;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}

The proxy headers aren’t required, but these values forward headers to our Python Pyramid server so we can log where requests come from.

Separately, an Nginx rule redirects http to https:

server {
listen 80 default_server;
listen [::]:80 default_server;
server_name _;
return 301 https://$host$request_uri;
}

Where things are located, and users

Production is deployed to /var/myservice-prod, and staging to /var/myservice-staging. We create a Linux user (e.g. “myservice”) to own these directories, and also to run the website processes.

Environments maintain isolated Python package dependencies. When we do our initial setup, we create a Python virtualenv in each environment which lives in a /venv subdirectory. Among other things, the virtual environment will include a /venv/bin directory which includes an environment-specific /venv/bin/python and /venv/bin/pserve (for running the server).

Secrets are managed outside of the deployment process or source control. I manually create a small file /var/myservice-prod/venv/config/prod-secrets.ini, with permissions set so only the “myservice” user can read this file (chmod 600). This secrets file looks like:

[app:main]
use = egg:myservice
sqlalchemy.url = postgresql+psycopg2://user:pass@localhost/db_name
stripe.api_key = sk_live_stuff
stripe.public_key = pk_live_stuff
...and so on...

The application’s configuration files (staging.ini and production.ini) are checked into source control, and are bundled into the deployment package. These configs inherit from the local secrets.ini file as:

[app:main]
use = config:secrets-prod.ini
...

Web server

We run our web server using the /venv/bin/pserve Pyramid command. This starts a WSGI server using a PasteDeploy .ini configuration file. The web server options are configured as:

[server:main]
use = egg:waitress#main
host = %(http_host)s
port = %(http_port)s
url_scheme = https
threads = 8

The “%(http_host)s” and “%(http_port)s ” lines allow for variable substitution from command line arguments to pserve. The url_scheme=https line tells Pyramid that urls should be written as https (because our upstream proxy is managing ssl for us).

Putting these pieces together, we could fire up a web server bound to localhost on port 8000 by running:

sudo -u myservice /var/myservice-prod/venv/bin/pserve /var/myservice-prod/venv/config/production.ini http_host=localhost http_port=8000

Systemd

Of course, we won’t actually be starting our server from the command line because we need a keep-alive mechanism, a way to manage logging (and logfile rotation), etc. We’ll use systemd to do all those things.

We create the systemd unit /etc/systemd/system/myserviceprod.service

[Unit]
Description=Our Amazing Service, Production
[Service]
ExecStart=/var/myservice-prod/venv/bin/pserve /var/myservice-prod/venv/configs/production.ini http_host=localhost http_port=8000
WorkingDirectory=/var/myservice-prod
User=myservice
Group=myservice
Restart=always
[Install]
WantedBy=multi-user.target

Then we tell systemd to pick up this new file as:

$ sudo systemctl daemon-reload

Now we can start and stop our service as:

$ sudo systemctl start myserviceprod
$ sudo systemctl stop myserviceprod

(The astute reader will notice that this example only creates a server on one port; a fancier setup with multiple ports using a systemd template is recommended, but this is left as an exercise for the reader).

Actually, neither you nor my dog wants to exercise.

Logging

Under this systemd setup, our logs will being managed by journald. As we discussed earlier, the Python app ships a copy of log entries to GrayLog.

Deployment and Versioning

Deployment systems can be complicated. You don’t need to boil the ocean and get a perfect system in place on day 1, but you should invest the time to have something. We started with a simple shell script, but as we grow we’ll migrate to better tools.

Package and Version Requirements

I expect deployments to be integrated with a packaging and source control system such that:

  • Packages have dependencies: a package should specify dependencies which will be automatically installed as part of deployment.
  • Configurations are treated like code: config files should be managed by the deployment system, either by bundling them as part of an application package (the approach we’ll be taking) or as their own deployable unit.
  • Deployments are versioned: each deployable release candidate is marked with a version like “1.0.32”. This version is visible to the application (it “knows” that it is 1.0.32), and also the version is used as a git tag so we have a clean historical record.
  • Enable pre-package hooks: sometimes your build has preparation tasks like minifying and combining css and js.

Application version

Maybe I’m a Python n00b, but I couldn’t find a great way to make Python “aware” of the latest git tag. I settled on the approach of modifying a file version.py in tandem with setting a git tag. version.py is a single line like:

__version__ = '1.0.32'

In our main application __init__.py main() method, we pluck the version from this file as:

from version import __version__
#...
def main(global_config, **settings):
#...
config.registry['version'] = __version__
#...

Downstream in our application views and templates, the version is then available as request.registry['version']. We use the application version for:

  • The application reports what version it is running, which makes it easy to check what-is-running-where.
  • Many cache keys include the version, so on deployment we bust the cache.
  • We append “...?v=<version>” to static web asset URLs. This forces clients to pick up the latest version after a deployment.

Bump version

Before building a package, we run a script locally to bump the application version.

  • Get the current git tag, increment it, and set a new tag
  • Update our Python version.py file with the new version
  • Push these changes to origin

I’ll skip the full script here (it’s not pretty and is rather specific to our setup), but here are the two useful git commands it uses:

  • Get the current tag: git describe --tags --abbrev=0
  • Get the toplevel directory for your git repo: git rev-parse --show-toplevel

Build package

We deploy our application as a Python package. pip does all the heavy work of bundling and installing the application, and managing dependencies.

We build our application as a source distribution (sdist), which produces a deployable file like dist/myservice-1.0.32.tar.gz.

# Pre-processing: minify our JS and CSS
./minify.sh
# Create source distribution
python setup.py sdist

The file setup.py manages package configuration:

  • There is a list of Python package dependencies. If we add packages or change versions, Pip will install the new packages as part of deployment.
  • We will use our versioning scheme.
  • Our config files are included in the build.
# Load the version (which is updated by our bump version script)
from myservice.version import __version__
requires = [
# List all required python packages here. If you
# add a new package, the deployment process will
# pick it up.
'pyramid',
'pyramid_jinja2',
#...etc...
]
# Add a few things to the setup() method
setup(
#...boilerplate stuff...
# Use our versioning scheme
version=__version__,
# Copy .ini files to venv/configs
data_files=[
('configs', ['staging.ini']),
('configs', ['production.ini']),
],
)

Deployment

We wrote a simple shell script to deploy the latest package to production or staging:

ops/deploy.sh [username] [--staging or --production]

The essential parts of this script are:

# Get the path of the most recent package 
distfile=$(ls -t myservice/dist | head -n 1)
# Copy the bundle to our remote host
scp $distfile $user@$host:/tmp
# Remotely:
# 1) Pip install the new bundle
# 2) Restart the service
# 3) Print the status (sanity check)
if [[ $2 == '--production' ]]
then
ssh -t $user@$host "sudo -u myservice /var/myservice-prod/venv/bin/pip install /tmp/$distfile --no-cache-dir; sudo systemctl restart myserviceprod; sleep 1; sudo systemctl status myserviceprod | cat;"
fi

Note that a slightly more advanced script would do rolling deployments, allow us to specify a past version, etc.

In Conclusion: Focus on What Matters

At the start of this series, I set out to help startup CTOs think about “what is a minimal decent starting point?” I gave my base requirements, then stepped through a minimal production setup to meet these goals. The essential features we check-off are:

  • Our setup is cheap: we spend about $140/month, plus a few annual fees.
  • The site runs well: in the course of a few weeks, we’ve steadily grown revenue and traffic, and easily managed the load when we appeared on Hacker News. Uptime has been 99.99%, and the production box has plenty of capacity.
  • Deployments are easy: we run a command line script which reliably works. Even though it’s rather basic, our system manages package dependencies, application versioning, configuration files, and SCM tags.
  • We have monitoring: there are external status checks and log file monitoring.
  • There are extensive metrics: we use Google Analytics and GrayLog dashboards to get real-time insights about our product.

Our setup has known issues:

  • We are not built for scale: but we could scale as needed; I’d start by moving the web servers to two or more virtual machines. The database has years of headroom; and honestly, I’d probably be much more inclined to throw more RAM and SSD at the database than move to a distributed system because having a single relational database keeps our code and system so much simpler.
  • We have single points of failure: but even in a catastrophe we could build a replacement within 2 hours. At our current size the business impact would be tolerable.
  • Our deployment scripts are simplistic: we don’t have clearly defined roles or host configuration management, or a package repository. As we grow, the next steps will be to set up a local Python package repository, and to move our deployment management to Ansible.

All things considered, our system has been reliable and easy to maintain. With this simple setup we’re making decent money while spending very little. And we’re not painted into any corners because we know our limitations and how we’d grow.

I’ll leave you with this: as long as your service works, nobody on the outside cares whether the internals are magic fairy dust or a rickety assemblage of duct tape and bailing wire. Your time is limited, so plan an ops setup to match the expected size and growth of your business.

This corgi accepts you AND your infrastructure. (source)

Consulting CTO open to projects. I’m a serial entrepreneur, software engineer, and leader at early- and mid-stage companies. https://www.chuckgroom.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store