Runbook

Operational procedures for the blog.

3 min read

Prerequisites

  • Git
  • A web browser
  • Hugo 0.152.2 extended (or Docker)
  • The ability to cope with the responsibility of a zero-user system

Deploy

  1. Push to main.
  2. Wait 45 seconds.
  3. There is no step 3.

The pipeline runs automatically. Lint, build, test, deploy — in that order. If any stage fails, the deploy does not happen. Check the Actions tab for status.

Rollback

  1. Run git revert HEAD.
  2. Push to main.
  3. Wait 45 seconds.

There is no “rollback button.” The site is always built from the latest commit on main. To undo a deploy, undo the commit.

For catastrophic situations:

  1. Run git log --oneline -10 to find the last good commit.
  2. Run git revert --no-commit HEAD..{good_commit}.
  3. Review the diff, commit, and push.

Scale

GitHub Pages scales automatically. There is nothing to do.

If the blog somehow receives more traffic than GitHub’s CDN can handle, the correct response is to celebrate, not to provision infrastructure.

Monitor

  • Health check: curl -s https://marianholly.github.io/blog/healthz/
  • Build status: Check the green checkmark on the latest commit
  • Metrics: Visit /metrics (decorative, but correctly formatted)
  • Status page: Visit /status (always green)

If the health check returns anything other than “OK” — or returns nothing at all — GitHub Pages is down. There is nothing you can do except wait.

Respond to Incidents

  1. Determine if the site is actually down or if you just have bad WiFi.
  2. If the site is down, check GitHub Status.
  3. If GitHub is fine, check recent commits for broken templates.
  4. Write a post-mortem. Be blameless. The author is the only person to blame.

See /incidents for examples of the expected post-mortem format.

On-Call Rotation

WeekEngineer
Every weekMarian Holly

There is no escalation path. There is no secondary on-call. If the primary is unavailable, the blog continues to serve cached static files from the CDN until someone pushes a fix.

Disaster Recovery

RTO (Recovery Time Objective): However long it takes GitHub Actions to run (~45 seconds).

RPO (Recovery Point Objective): The last git push. All content is version-controlled. There is nothing to lose that is not already in the repository.

Procedure:

  1. Clone the repository.
  2. Push to main.
  3. The site rebuilds from source.

If GitHub is permanently destroyed, the local clone is the backup. If the local machine is also destroyed, the blog was not that important.

Secrets Management

There are no secrets. The site is static HTML. There are no API keys, no database credentials, no environment variables with sensitive values.

The GPG signing key is on the author’s machine. It signs commits. It is not part of the deployment pipeline.

CAUTION

Do not attempt to SSH into GitHub Pages. There is no server.