v1.0.0
Deploy Guide CLI Ref API Ref

Shipyard

A self-hosted internal deployment platform — think Vercel, running on your own Kubernetes cluster. Supports Next.js, Node.js, Spring Boot, PHP, and static sites with preview environments, one-click rollbacks, and a unified developer CLI. Projects can be organized into groups (e.g., "frontend", "backend") and tagged with labels for filtering and cost attribution.

ComponentTechnology
Control plane APIFastify 4, TypeScript, Node 22
DashboardReact 19, TanStack Query + Router, Vite
CLIship — Node.js, distributed via npm
DatabasePostgreSQL 16, Drizzle ORM
QueueBullMQ v5, Redis 7
Container registryGoogle Artifact Registry (prod) / local registry (dev)
Static storageMinIO (S3-compatible)
RoutingTraefik v3 with IngressRoute CRDs
RuntimeKubernetes — one namespace per project

Architecture

Every deployment flows through three async stages: build → deploy → live. Each stage is a separate BullMQ worker process. A fourth cleanup worker runs cron jobs for maintenance.

Developergit push / ship deploy │ ▼ Webhook / API (Fastify · /api/v1) │ creates deployment record, enqueues build job ▼ build-queue (BullMQ · max 3 concurrent) │ git clone → detect buildpack → install → build │ → push image to Artifact Registry (containers) │ → upload dist/ to MinIO (static sites) ▼ deploy-queue (BullMQ · max 2 retries) │ decrypt env vars → apply k8s Deployment + Service │ → apply Traefik IngressRoute → wait for rollout ▼ proj-<slug> namespace (Kubernetes) │ Deployment · Service · ConfigMap · Secret · IngressRoute ▼ Traefik routes traffic → pod │ cleanup-queue (cron · hourly/6hr) │ preview-ttl → image-gc → stale-deploy-check

Queue Details

QueueConcurrencyRetriesPurpose
build-queue30Clone, detect buildpack, install deps, build, push image
deploy-queue22Decrypt env vars, apply k8s resources, wait for rollout
cleanup-queue10Cron jobs: preview TTL, image GC, stale deployment cleanup

Cleanup Routines

RoutineScheduleDescription
preview-ttlHourlyCancels preview deployments older than 7 days (PREVIEW_MAX_AGE_DAYS) and tears down k8s resources
image-gcEvery 6 hoursRetains only 10 builds (IMAGES_PER_BRANCH) per project/branch, deletes older build records
stale-deploy-checkHourlyForce-fails deployments stuck: queued >30min, building >70min, deploying >15min

Namespaces

NamespaceContents
shipyard-systemAPI, dashboard, all three queue workers
shipyard-infraPostgres, Redis, MinIO, Traefik
proj-<slug>Production deployment for each project
proj-<slug>-previewPR preview deployments, isolated by NetworkPolicy

URL Scheme

PatternPurpose
<project>.apps.shipyard.wake.co.keProduction alias (updated on each prod deploy)
<project>-<sha7>.deploy.shipyard.wake.co.kePermanent per-deployment link
<project>-pr-<n>.deploy.shipyard.wake.co.kePR preview environments
<custom-domain>After DNS TXT verification + cert issuance

Secrets model

Two-tier encryption: project env vars are AES-256-GCM encrypted with a per-project DEK, itself encrypted with a platform-wide KEK stored only in the cluster Secret. The raw KEK never appears in logs, API responses, or application code paths beyond the single encrypt/decrypt call site. Git source OAuth tokens are encrypted at rest with the same KEK.

Authentication

JWT access tokens (15-minute expiry) + opaque refresh tokens (30-day expiry, rotated on each use). API tokens for CI/CD are prefixed ship_ and stored as SHA-256 hashes. Both clients (dashboard and CLI) auto-refresh silently on 401 before surfacing an error.

Local Development

Requirements

  • Node 22 LTS (.nvmrc pins the version)
  • pnpm ≥ 9.x
  • Docker Desktop

Start the full stack

# 1. Install all dependencies
pnpm install

# 2. Start Postgres, Redis, MinIO, and local registry
docker-compose -f docker-compose.dev.yml up -d

# 3. Run database migrations
pnpm --filter=@shipyard/db migrate

# 4. Seed local data
pnpm tsx scripts/seed.ts

# 5. Start the API (port 3001)
pnpm --filter=@shipyard/api dev

# 6. Start the dashboard (port 5173)
pnpm --filter=@shipyard/dashboard dev

Common commands

# Build all packages
pnpm turbo build

# Type-check everything
pnpm turbo typecheck

# Lint everything
pnpm turbo lint

# Run all tests
pnpm turbo test

# Run tests for one package
pnpm turbo test --filter=packages/buildpack

# Run a single migration generation
pnpm --filter=@shipyard/db migrate:generate

Environment flags

VariableDefaultEffect
K8S_ENABLEDfalseApply real Kubernetes resources on deploy
CONTAINER_BUILD_ENABLEDfalseRun real docker build + push to registry
STATIC_STORAGE_ENABLEDfalseUpload static assets to MinIO
CACHE_ENABLEDfalseRestore/save Docker layer cache and npm cache

All four default to false so the workers run in sim mode locally — deployments complete without needing a real cluster.

Completed Sprints

Each sprint is a focused feature slice shipped as a single commit.

1

Sprint 1 — Auth, RBAC, and Project CRUD

Foundation: user registration and login with bcrypt + JWT, team creation, role-based access control (owner / admin / developer / viewer), and full project CRUD. All routes protected from day one.

Key files: apps/api/src/routes/auth.ts, apps/api/src/lib/rbac.ts

2

Sprint 2 — Deployment CRUD, Webhooks, BullMQ Wiring

Deployment lifecycle (queued → building → deploying → live → failed / cancelled), BullMQ queue setup for build and deploy queues, and a stub webhook handler for GitHub and GitLab.

3

Sprint 3 — Build Runner Sim, Deploy Worker, SSE Log Streaming

Build worker that clones repos and streams output line-by-line. Deploy worker that applies k8s resources. Real-time log streaming via Server-Sent Events over Redis pub/sub. CLI --follow flag with long-poll fallback.

4

Sprint 4 — Encrypted Env Vars and API Tokens

AES-256-GCM env var encryption with per-project DEK + platform KEK. API tokens with ship_ prefix stored as SHA-256 hashes. Reveal endpoint (admin-only, audit-logged). CI/CD token authentication flow.

Key files: apps/api/src/lib/crypto.ts, apps/api/src/routes/env-vars.ts

5

Sprint 5 — React Dashboard

React 19 SPA with TanStack Query and TanStack Router. Project list, deployment table with live polling, real-time build log viewer, env var management with masked values, domain management panel.

6

Sprint 6 — ship CLI

Full ship CLI: login, deploy --follow, logs --follow, status, ps, rollback, env list/set/unset. Auto-detects project from git remote. Streams build logs over SSE.

Key files: apps/cli/src/commands/

7

Sprint 7 — PR Preview Environments

Each open PR gets its own isolated deployment at <project>-pr-<n>.deploy.*. Separate k8s namespace (proj-<slug>-preview), scoped env vars, and proper status tracking. Preview scope separated from production scope in all deployment queries.

8

Sprint 8 — Cleanup Queue Worker

BullMQ cron worker with three routines: preview-ttl (cancel previews older than 7 days, hourly), image-gc (prune old build records beyond last 10 per branch, every 6 hours), log-archive (flag logs older than 90 days, nightly).

9

Sprint 9 — Custom Domain Management

Add custom domains to projects with DNS TXT verification. Traefik IngressRoute is updated to serve the domain once verified. Cert issuance via Let's Encrypt ACME handled by Traefik automatically.

10

Sprint 10 — Real Kubernetes Deploy Worker

Full k8s resource lifecycle using @kubernetes/client-node: namespace, ConfigMap, Secret (with base64-encoded env vars), ClusterIP Service, Deployment with readiness probe, and Traefik IngressRoute CRD. 3-minute rollout timeout with automatic rollback to previous image on failure.

Key files: apps/api/src/lib/k8s.ts, apps/api/src/workers/deploy.ts

11

Sprint 11 — Team Member Management

Invite members by email, change roles, remove members. Guards against demoting the last owner. Full UI in Team Settings page.

13

Sprint 13 — Git Sources and API Token UI

Git source management (GitHub / GitLab installation IDs + webhook secrets). API token creation and revocation UI for CI/CD. Webhook secret shown exactly once at creation.

14

Sprint 14 — Observability

Prometheus metrics: deployments_total, builds_total, build_duration_seconds. Paginated build log browsing endpoint (GET /deployments/:id/logs?after=<seq>). Metrics endpoint at /metrics.

15–16

Sprints 15–16 — shipyard.json Config and Production Helm Chart

A shipyard.json (or .shiprc) at the repo root overrides buildpack detection entirely. Full production Helm chart with ClusterRole/ClusterRoleBinding, ConfigMap, Secret, all four Deployments, Service, ServiceAccount, and Traefik IngressRoutes.

Key files: infra/helm/

17

Sprint 17 — E2E Test Fixtures and Runner

End-to-end test runner (scripts/e2e/run.ts) that logs in, creates temp git repos from fixtures, triggers deployments, and polls for live status. Fixture projects for: static HTML, Node.js app, Next.js, Spring Boot (Maven), and PHP (legacy).

Key files: scripts/e2e/

19

Sprint 19 — Static Site MinIO Upload and Serving

Static and static-node (Vite/Next.js export) builds upload dist/ to MinIO bucket shipyard-static-<project-id>. Deploy worker spins up an nginx:alpine pod that reverse-proxies to MinIO with SPA fallback. _current pointer updated atomically on each deploy.

Key files: apps/api/src/lib/storage.ts

Dockerfile Templates + Real Container Build Pipeline

Multi-stage Dockerfile templates for every buildpack type: Next.js (3-stage), Node, Spring Boot Maven and Gradle, PHP-FPM + NGINX via BuildKit heredocs. Build worker runs real docker build + docker push to Harbor/Artifact Registry with --password-stdin.

Key files: packages/buildpack/src/dockerfile.ts

Build Caching

Docker layer cache via BuildKit --cache-from / --cache-to type=registry referencing a :cache tag per project. npm dependency cache stored as a tarball in MinIO bucket shipyard-buildcache, keyed by project + buildpack type. Restore before install, save after — transparent to build scripts.

Key files: apps/api/src/lib/cache.ts

20

Sprint 20 — One-Click Rollback

Restore any past deployment as the live version in under 10 seconds — it's a pointer update, not a new build. API validates the target has a build artifact, updates the MinIO _current pointer (static) or patches the k8s Deployment image (containers), then swaps statuses atomically. Dashboard shows inline "Restore sha? Yes / No" confirmation.

Key files: apps/api/src/routes/deployments.ts, apps/dashboard/src/pages/ProjectPage.tsx

21

Sprint 21 — Audit Log

audit_events table persists security events to Postgres. Six instrumented actions: env var reveal, member role change, member removal, API token creation, API token revocation, domain verification. GET /teams/:teamId/audit-log endpoint (admin-only). Dashboard "Audit log" tab in Team Settings.

Key files: apps/api/src/lib/audit.ts, packages/db/migrations/0004_audit_events.sql

22

Sprint 22 — Webhook Auto-Deploy + Preview Teardown

GitHub and GitLab webhook handlers fully wired: push events trigger production or branch deploys; PR/MR open/reopen/synchronize creates preview environments; PR/MR close/merge tears them down. On PR close: cancels DB records, drains the deploy queue, and deletes all k8s resources (Deployment, Service, ConfigMaps, Secret, IngressRoute). On synchronize: cancels any queued build for the same PR before starting a new one.

Key files: apps/api/src/routes/webhooks.ts, apps/api/src/lib/k8s.ts

Production Hardening — Token Refresh + Credential Encryption

Dashboard and CLI both auto-refresh JWT access tokens on 401 before surfacing errors. Dashboard deduplicates concurrent refresh attempts via a singleton promise. CLI persists the rotated token pair back to ~/.shipyard/config.json. Git source OAuth/PAT tokens now encrypted at rest with the platform KEK using AES-256-GCM.

Key files: apps/dashboard/src/api/client.ts, apps/cli/src/client.ts, apps/api/src/lib/crypto.ts

Bug fix — Deploy Always Used main Branch

Manual "Deploy now" and the triggerDeploymentSchema had ref defaulting to 'main'. Because the schema filled in the default before the API's body.ref ?? project.defaultBranch fallback, the project's configured branch was silently ignored. Fixed by making ref optional in the schema so the nullish coalescing correctly falls through to project.defaultBranch.

Key files: packages/shared/src/schemas.ts

Bitbucket Support + Private Repo Cloning (v1.1.0)

Added Bitbucket Cloud as a third git provider alongside GitHub and GitLab. POST /api/v1/webhooks/bitbucket handles repo:push, pullrequest:created/updated/fulfilled/rejected events with HMAC-SHA256 (X-Hub-Signature) verification. Pull-request teardown and stale-build cancellation follow the same path as GitHub/GitLab.

Private repo cloning is now supported for all three providers. The build worker receives a cloneToken (decrypted from the git source, stored ephemerally in Redis — never in the DB) and injects credentials into the clone URL using provider-specific prefixes: x-access-token (GitHub), oauth2 (GitLab), x-token-auth (Bitbucket). The token is never written to build logs. Migration 0005_add_bitbucket_provider.sql extends the git_provider enum.

Key files: apps/api/src/routes/webhooks.ts, apps/api/src/workers/build.ts, packages/db/migrations/0005_add_bitbucket_provider.sql

Upcoming Sprints

Prioritised backlog — each is a self-contained sprint.

Sprint 23 · Done
Cleanup Worker k8s Integration
The preview-ttl cron now calls teardownPreviewResources() for each age-expired preview after cancelling its DB record, deleting the Deployment, Service, ConfigMaps, Secret, and IngressRoute. Matches the same teardown path used by the webhook PR-close handler.
· Done
Bitbucket Support + Private Repo Cloning
Bitbucket Cloud webhook handler, HMAC-SHA256 signature verification, PR preview environments. Private repo cloning for all three providers via decrypted per-team token injected into the clone URL. DB migration extends the git_provider enum. Shipped as v1.1.0.
· Done
GitHub Actions CI/CD Pipeline
Push to main automatically typechecks, builds a linux/amd64 image on the runner, pushes to Artifact Registry, runs pending migrations via a schema_migrations tracker, and Helm-upgrades the cluster. Auth uses Workload Identity Federation — no SA keys. Total run time ~3m30s.
Sprint 24 · Done
Git Provider Deployment Status
Commit status posted to GitHub, GitLab, and Bitbucket at every lifecycle transition: queued → pending/"Build queued", building → pending/"Building…", live → success, failed/cancelled → failure. teamId threaded through all callers (manual deploy + all 6 webhook paths). Best-effort — never throws.
Sprint 25 · Done
Project Deletion
DELETE /projects/:id (owner-only): drains in-flight BullMQ jobs, tears down proj-<slug> and proj-<slug>-preview k8s namespaces (best-effort), empties and deletes the MinIO static bucket (best-effort), then DB cascade-deletes all child records. Dashboard confirmation panel was already in place.
Sprint 26 · Done
Harbor Image GC
New lib/registry.ts implements deleteRegistryImage() using the Docker Registry HTTP API v2 — resolves the tag to a digest via HEAD /v2/{name}/manifests/{tag}, then DELETE /v2/{name}/manifests/{digest}. Works with both Harbor and Artifact Registry. The image-gc cron fetches each build's imageRef and calls it (best-effort) before pruning the DB record. Skips sim and static-only builds automatically.
Sprint 27 · Done
Graceful Worker Shutdown
process.once('SIGTERM', …) added to all three workers. Build worker tracks the active deployment ID; if the 25s grace period expires before the build finishes, it marks the deployment failed and removes the workdir before exiting. Deploy and cleanup workers follow the same close-or-timeout pattern. A hard-exit timer is .unref()'d so it doesn't block the event loop if the worker drains cleanly.
Sprint 28 · Done
HPA — Horizontal Pod Autoscaler
Optional "scaling": {"minReplicas": 1, "maxReplicas": 5, "targetCPU": 60} in shipyard.json creates an autoscaling/v2 HPA targeting the project Deployment on CPU utilisation. Build worker extracts and validates the config; deploy worker calls applyHpa() when present or deleteHpa() when removed. HPA is also deleted during preview teardown.
· Done
Metrics Gaps — deployments_active + rollbacks_total
Added two missing Prometheus metrics from TRD U-14: shipyard_deployments_active (Gauge, polled from DB every 15s alongside queue depth, grouped by project slug) and shipyard_rollbacks_total (Counter with project + reason labels). Incremented in the deploy worker on auto-rollback and in the manual rollback API route.
· Done
Log Archive Routine
Replaced the runLogArchive() stub in the cleanup worker with a real implementation: for each deployment older than LOG_RETENTION_DAYS (90d), fetches its build_logs rows, gzips them, uploads to MinIO as shipyard-logs/<deployment-id>.log.gz, then deletes the DB rows to keep Postgres lean. Skipped gracefully when STATIC_STORAGE_ENABLED is false (local dev).
U-18 · Done
Notifications — Slack + Git PR Comments
New lib/notify.ts: notifySlack() reads SHIPYARD_SLACK_WEBHOOK from encrypted project env vars and posts a Slack message. New postPrComment() in lib/git-status.ts posts comments to GitHub PRs, GitLab MRs, and Bitbucket PRs. Wired into both workers: build failure → Slack + PR comment; production deploy success → Slack; preview deploy success → PR comment with URL; auto-rollback → Slack critical. All best-effort.
Sprint 29 · Upcoming
Runtime Adapter Abstraction (packages/runtime)
Introduce a RuntimeAdapter interface in a new packages/runtime package covering deploy(), teardown(), rollback(), launchBuild(), and getLogs(). Move all existing @kubernetes/client-node logic in the deploy and build workers into a KubernetesRuntime class that implements the interface. Active adapter selected via RUNTIME=kubernetes|docker env var. All business logic above the adapter — queue workers, API, buildpack detection, storage, dashboard, CLI — remains unchanged. Prerequisite for all subsequent deployment-target sprints. Estimated effort: 1 week.
Sprint 30 · Upcoming
Multi-Cloud k8s Targets — DigitalOcean & AWS
No application code changes required — KubernetesRuntime works identically on any conformant k8s cluster. Deliverables: (1) values.do.yaml targeting DOKS + DO Container Registry + DO Spaces; (2) values.aws.yaml targeting EKS + ECR + S3; (3) a parameterised CI/CD workflow that branches only on auth steps (doctl vs aws vs gcloud) based on a CLOUD_TARGET secret. Storage adapter already supports S3-compatible endpoints via env vars — config change only. Production deployment guides for both providers added to docs. Estimated effort: 1 week.
Sprint 31 · Upcoming
Docker Runtime — k8s-free Deployment Target
Implement DockerRuntime using the Docker socket (Dockerode) as a second RuntimeAdapter. Containers replace k8s Deployments; named Docker networks replace namespaces; Traefik switches to its Docker label provider in place of IngressRoute CRDs. Build jobs launch as short-lived containers instead of k8s Jobs. Rollback swaps the running container image. Isolation is enforced via Docker networks rather than NetworkPolicy — functional but less strict. HPA is not supported in this mode. Enables running the full Shipyard platform on a single VPS with no k8s dependency. Depends on Sprint 29. Estimated effort: 3–5 weeks.

Production Deployment Guide

Deploying Shipyard itself to GKE on Google Cloud Platform. Target domain: shipyard.wake.co.ke.

Use GKE Standard, not Autopilot. The build worker needs to mount the host Docker socket (/var/run/docker.sock) to run docker build. Autopilot blocks hostPath volumes.

Step 1 — GCP Setup

Install gcloud CLI

brew install --cask google-cloud-sdk
gcloud init   # opens browser, log in, select project

Enable required APIs

gcloud services enable container.googleapis.com artifactregistry.googleapis.com compute.googleapis.com cloudbuild.googleapis.com --project=shipyard-254

Run each gcloud command on a single line. Backslash line continuations break silently in zsh when pasted into the terminal.

Step 2 — GKE Cluster

Create a Standard (not Autopilot) cluster in africa-south1 (Johannesburg — lowest latency for Kenya). The --scopes=cloud-platform flag is critical: without it, nodes get only devstorage.read_only scope and cannot pull from Artifact Registry.

gcloud container clusters create shipyard-prod --project=shipyard-254 --region=africa-south1 --release-channel=stable --cluster-version=latest --num-nodes=2 --machine-type=e2-standard-4 --enable-autoscaling --min-nodes=2 --max-nodes=6 --workload-pool=shipyard-254.svc.id.goog --scopes=cloud-platform

Do not use GKE Autopilot. Autopilot blocks hostPath volumes and privileged containers, which the build worker requires for Docker socket access. Always use a Standard cluster.

Connect kubectl

gcloud components install gke-gcloud-auth-plugin
gcloud container clusters get-credentials shipyard-prod --region=africa-south1 --project=shipyard-254
kubectl get nodes   # should show 6 nodes (2 per zone × 3 zones)

Create namespaces

kubectl create namespace shipyard-system
kubectl create namespace shipyard-infra

Step 3 — Artifact Registry

gcloud artifacts repositories create shipyard --repository-format=docker --location=africa-south1 --project=shipyard-254 --description="Shipyard platform images"

# Authenticate Docker
gcloud auth configure-docker africa-south1-docker.pkg.dev

Image URL: africa-south1-docker.pkg.dev/shipyard-254/shipyard/platform:<tag>

Step 4 — Build and Push the Platform Image

The same image runs all four processes (API + 3 workers). The k8s Deployment command: selects which one starts.

GKE nodes are linux/amd64. If you are building on an Apple Silicon Mac, a plain docker build produces an arm64 image that GKE will refuse with "no match for platform in manifest". Use Google Cloud Build (recommended) or docker buildx --platform linux/amd64 to produce the correct architecture.

Option A — Google Cloud Build (recommended)

Builds on Google infrastructure, natively linux/amd64, pushes straight to Artifact Registry. Enable the API once, then use this for all future builds:

gcloud services enable cloudbuild.googleapis.com --project=shipyard-254
gcloud builds submit --tag africa-south1-docker.pkg.dev/shipyard-254/shipyard/platform:1.0.0 --project=shipyard-254 .

Option B — Local cross-platform build

docker buildx build --platform linux/amd64 -t africa-south1-docker.pkg.dev/shipyard-254/shipyard/platform:1.0.0 --push .

Dockerfile (repo root)

FROM node:22-alpine AS builder
RUN corepack enable pnpm
WORKDIR /app
COPY . .
RUN pnpm install --frozen-lockfile
RUN pnpm turbo build

FROM node:22-alpine
RUN apk add --no-cache git docker-cli
RUN corepack enable pnpm
WORKDIR /app
COPY --from=builder /app/package.json /app/pnpm-lock.yaml /app/pnpm-workspace.yaml /app/turbo.json ./
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/packages ./packages
COPY --from=builder /app/apps/api ./apps/api
COPY --from=builder /app/apps/cli ./apps/cli
RUN pnpm install --frozen-lockfile --prod --ignore-scripts
CMD ["node", "apps/api/dist/index.js"]

Patch the build worker for Docker socket access

Edit infra/helm/templates/deployment-worker-build.yaml and add under volumeMounts and volumes:

          volumeMounts:
            - name: build-workspace
              mountPath: {{ .Values.workers.build.workdir }}
            - name: docker-socket
              mountPath: /var/run/docker.sock
      volumes:
        - name: build-workspace
          emptyDir: {}
        - name: docker-socket
          hostPath:
            path: /var/run/docker.sock
            type: Socket

Step 5 — Install Infrastructure Dependencies

Add Helm repos

helm repo add traefik https://traefik.github.io/charts
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo add minio https://charts.min.io/
helm repo update

Traefik v3

Uses a values file (infra/helm/traefik-values.yaml) to avoid shell line-continuation issues. Do not add ports.websecure.tls — Traefik v3 chart schema rejects that key; TLS on websecure is enabled by default.

Do not use TLS-ALPN-01 or HTTP-01 ACME challenges on GKE africa-south1. Let's Encrypt's validation servers cannot reach this region — both challenge types time out with "Timeout during connect (likely firewall problem)" even though the firewall is open. Use DNS-01 via Cloudflare API instead. This completely bypasses connectivity and works regardless of region.

Create the Cloudflare API token

In the Cloudflare dashboard → My Profile → API Tokens → Create Token → Custom token:

  • Permissions: Zone → DNS → Edit
  • Zone Resources: Include → Specific zone → wake.co.ke

Store the token as a k8s secret before installing Traefik:

kubectl create secret generic traefik-cloudflare-token \
  --from-literal=CF_DNS_API_TOKEN=<your-token> \
  -n shipyard-infra
# infra/helm/traefik-values.yaml
providers:
  kubernetesCRD:
    enabled: true
    allowCrossNamespace: true
  kubernetesIngress:
    enabled: true

certificatesResolvers:
  letsencrypt:
    acme:
      email: alberto@zaoshinani.com
      storage: /data/acme.json
      dnsChallenge:
        provider: cloudflare
        resolvers:
          - "1.1.1.1:53"
          - "8.8.8.8:53"

env:
  - name: CF_DNS_API_TOKEN
    valueFrom:
      secretKeyRef:
        name: traefik-cloudflare-token
        key: CF_DNS_API_TOKEN

persistence:
  enabled: true
  size: 128Mi

podSecurityContext:
  fsGroup: 65532
  fsGroupChangePolicy: "OnRootMismatch"
helm upgrade --install traefik traefik/traefik --namespace shipyard-infra -f infra/helm/traefik-values.yaml

acme.json permissions: Traefik requires exactly 600 on acme.json. Kubernetes's default fsGroup behaviour resets files to 660 on every pod restart, causing Traefik to refuse to start. The fsGroupChangePolicy: "OnRootMismatch" option prevents this by skipping the recursive chmod when the volume root is already owned by the fsGroup. The recommended initContainer fix (runAsUser: 0) is blocked by GKE's non-root pod security policy.

PostgreSQL 16

helm upgrade --install postgres bitnami/postgresql --namespace shipyard-infra --set auth.username=shipyard --set auth.password=f2de63dbdda2af7c90c813999a32605e --set auth.database=shipyard

Redis 7

helm upgrade --install redis bitnami/redis --namespace shipyard-infra --set auth.password=78d00185c6a72eb62f715daba56ae34f --set architecture=standalone

MinIO

Must set explicit resource limits — the default chart memory request is 16 GiB which will not schedule on standard e2-standard-4 nodes.

helm upgrade --install minio minio/minio --namespace shipyard-infra --set rootUser=shipyard --set rootPassword=77fdc7516052eb7d883b747b65e49b73 --set mode=standalone --set persistence.size=20Gi --set resources.requests.memory=512Mi --set resources.requests.cpu=250m --set resources.limits.memory=2Gi --set resources.limits.cpu=1

Step 6 — Secrets

🔑

Back up ENCRYPTION_KEK before continuing. If it is ever lost, all project environment variables become permanently unreadable. Store it in a password manager or secrets vault, not just in this file.

The secrets generated for this deployment:

VariableValue
POSTGRES_PASSWORDf2de63dbdda2af7c90c813999a32605e
REDIS_PASSWORD78d00185c6a72eb62f715daba56ae34f
MINIO_PASSWORD77fdc7516052eb7d883b747b65e49b73
JWT_SECRET97dc32a1cb5fa01e157f44b330dcf9a1a9410fd8a67fde0eb218b8e069718d0a
ENCRYPTION_KEK3eeb918a1c26ea8b86d741602b914ba6c6dfaa827518a34248d3e98b84bde38e

Step 7 — Deploy Shipyard via Helm

Run database migrations

Migrations are SQL files in packages/db/migrations/. Pipe them directly into the postgres pod — no migration job image required:

cat packages/db/migrations/0000_past_joseph.sql \
    packages/db/migrations/0001_true_landau.sql \
    packages/db/migrations/0002_milky_jamie_braddock.sql \
    packages/db/migrations/0003_open_nehzno.sql \
    packages/db/migrations/0004_audit_events.sql \
    packages/db/migrations/0005_add_bitbucket_provider.sql \
  | sed 's/--> statement-breakpoint/;/g' \
  | kubectl exec -i -n shipyard-infra postgres-postgresql-0 \
    -- env PGPASSWORD=f2de63dbdda2af7c90c813999a32605e psql -U shipyard -d shipyard

Create the platform secret

Secrets are stored in a separate k8s Secret that Helm references via existingSecret — never committed to git:

kubectl create secret generic shipyard-secrets --namespace shipyard-system --from-literal=jwtSecret="97dc32a1cb5fa01e157f44b330dcf9a1a9410fd8a67fde0eb218b8e069718d0a" --from-literal=encryptionKek="3eeb918a1c26ea8b86d741602b914ba6c6dfaa827518a34248d3e98b84bde38e" --from-literal=databaseUrl="postgres://shipyard:f2de63dbdda2af7c90c813999a32605e@postgres-postgresql.shipyard-infra.svc.cluster.local:5432/shipyard" --from-literal=redisUrl="redis://:78d00185c6a72eb62f715daba56ae34f@redis-master.shipyard-infra.svc.cluster.local:6379"

Production values file

# infra/helm/values.production.yaml
image:
  repository: africa-south1-docker.pkg.dev/shipyard-254/shipyard/platform
  tag: "1.1.0"
  pullPolicy: IfNotPresent

namespace:
  create: false
  name: shipyard-system

api:
  replicaCount: 1
  extraEnv:
    - name: MINIO_ENDPOINT
      value: "http://minio.shipyard-infra.svc.cluster.local:9000"
    - name: MINIO_ACCESS_KEY
      value: "shipyard"
    - name: MINIO_SECRET_KEY
      value: "77fdc7516052eb7d883b747b65e49b73"
    - name: MINIO_INTERNAL_URL
      value: "http://minio.shipyard-infra.svc.cluster.local:9000"
    - name: STATIC_STORAGE_ENABLED
      value: "true"
    - name: CONTAINER_BUILD_ENABLED
      value: "false"
    - name: HARBOR_REGISTRY
      value: "africa-south1-docker.pkg.dev"
    - name: HARBOR_PROJECT
      value: "shipyard-254/shipyard"

config:
  platformDomain: deploy.shipyard.wake.co.ke
  internalDomain: apps.shipyard.wake.co.ke
  corsOrigin: https://shipyard.wake.co.ke
  k8sEnabled: "true"
  logLevel: info

secrets:
  existingSecret: "shipyard-secrets"

ingress:
  enabled: true
  certResolver: letsencrypt
  entryPoints:
    - websecure
  api:
    hostname: api.shipyard.wake.co.ke
  dashboard:
    hostname: shipyard.wake.co.ke

Helm install

helm upgrade --install shipyard ./infra/helm --namespace shipyard-system -f infra/helm/values.production.yaml --wait --timeout=3m

Fix image pull — GKE node pool scope limitation

GKE Standard node pools are created with devstorage.read_only OAuth scope. This scope only covers the old Container Registry (gcr.io), not Artifact Registry (pkg.dev). IAM permissions alone cannot override this OAuth scope limit. Pods will fail with 403 Forbidden on image pull even with roles/artifactregistry.reader granted.

The permanent fix is to recreate the node pool with --scopes=cloud-platform. The immediate workaround is an image pull secret using a GCP OAuth token.

Create the pull secret (run in GCP Cloud Shell to use your user credentials — gcloud auth print-access-token returns a token with full Artifact Registry access):

# Run in GCP Cloud Shell — single line, no wrapping
kubectl create secret docker-registry artifact-registry-key --docker-server=africa-south1-docker.pkg.dev --docker-username=oauth2accesstoken --docker-password="$(gcloud auth print-access-token)" --namespace=shipyard-system

Patch all deployments to use it:

for d in shipyard-api shipyard-worker-build shipyard-worker-cleanup shipyard-worker-deploy; do kubectl patch deployment $d -n shipyard-system -p '{"spec":{"template":{"spec":{"imagePullSecrets":[{"name":"artifact-registry-key"}]}}}}'; done

OAuth tokens expire after ~1 hour. To renew: delete the secret and recreate it. For a permanent solution, recreate the node pool with --scopes=cloud-platform and remove the image pull secret patches.

Verify

kubectl get pods -n shipyard-system

Step 8 — DNS

Get Traefik's external IP:

kubectl get svc traefik -n shipyard-infra

In your DNS provider (for wake.co.ke), add these records pointing to that IP:

TypeNameValue
Ashipyard.wake.co.keTraefik external IP
Aapi.shipyard.wake.co.keTraefik external IP
Agrafana.shipyard.wake.co.keTraefik external IP — Grafana monitoring dashboard
A (wildcard)*.deploy.shipyard.wake.co.keTraefik external IP
A (wildcard)*.apps.shipyard.wake.co.keTraefik external IP

The two wildcard records are what make per-deployment preview URLs and production aliases work automatically without any per-project DNS change.

Step 9 — First Run

Register the first user

curl -X POST https://api.shipyard.wake.co.ke/api/v1/auth/register \
  -H 'Content-Type: application/json' \
  -d '{"name":"Your Name","email":"you@example.com","password":"your-password"}'

Log in with the CLI

The CLI stores tokens in ~/.shipyard/config.json. If the interactive prompt doesn't work (e.g. in scripts), write the file directly after calling the register endpoint above.

ship login --api-url https://api.shipyard.wake.co.ke

Create your first project and deploy

# From inside a git repo
ship deploy --follow

Configure GitHub webhooks

In your GitHub repo → Settings → Webhooks → Add webhook:

  • Payload URL: https://api.shipyard.wake.co.ke/api/v1/webhooks/github
  • Content type: application/json
  • Secret: the webhookSecret returned when you created the git source
  • Events: Pushes + Pull requests

Step 10 — CI/CD Pipeline

Every push to main automatically builds, migrates, and deploys the platform. Run time is ~3m30s. Auth uses Workload Identity Federation — no service account keys are stored anywhere.

This step only needs to be done once on a fresh GCP project. Once the pipeline is wired, deployments are fully hands-off.

How the pipeline works

StepWhat happens
Typecheckpnpm turbo typecheck — fails fast before any build if types are broken
AuthenticateGitHub OIDC token → Workload Identity Federation → impersonate shipyard-cicd SA. No JSON key ever created or stored.
Build imagedocker build runs on the GitHub Actions runner (linux/amd64, correct for GKE). Image tagged with the git SHA and pushed to Artifact Registry.
Refresh pull secretRegenerates the artifact-registry-key k8s Secret using a fresh GCP OAuth token. Needed because the node pool uses devstorage.read_only scope (see Operational Notes).
MigrateIdempotent migration runner: creates a schema_migrations tracking table on first run, seeds it with already-applied migrations, then applies only new *.sql files in order.
Helm upgradeDetects and recovers any stuck pending-* Helm state, then upgrades with --set image.tag=<sha>. Image tag in values.production.yaml is overridden by the git SHA.
Verify rolloutWaits for all four Deployments to report ready before the job completes.

One-time GCP setup

Run these once on the GCP project. They are already done for shipyard-254.

1. Create the CI/CD service account

gcloud iam service-accounts create shipyard-cicd --display-name="Shipyard CI/CD" --project=shipyard-254

2. Create the Workload Identity Pool and GitHub provider

gcloud iam workload-identity-pools create github-actions --location=global --display-name="GitHub Actions" --project=shipyard-254

gcloud iam workload-identity-pools providers create-oidc github --location=global --workload-identity-pool=github-actions --issuer-uri="https://token.actions.githubusercontent.com" --attribute-mapping="google.subject=assertion.sub,attribute.repository=assertion.repository,attribute.ref=assertion.ref" --attribute-condition="assertion.repository=='Alisao/shipyard'" --project=shipyard-254

The attribute-condition scopes the pool to a single repository. Any fork or other repo cannot exchange tokens against this pool.

3. Bind the pool to the SA and grant IAM roles

# Allow GitHub Actions to impersonate the SA
gcloud iam service-accounts add-iam-policy-binding shipyard-cicd@shipyard-254.iam.gserviceaccount.com --role=roles/iam.workloadIdentityUser --member="principalSet://iam.googleapis.com/projects/346003678070/locations/global/workloadIdentityPools/github-actions/attribute.repository/Alisao/shipyard" --project=shipyard-254

# GCP roles: Artifact Registry writer, GKE access, token exchange
for role in roles/artifactregistry.writer roles/container.developer roles/iam.serviceAccountTokenCreator; do
  gcloud projects add-iam-policy-binding shipyard-254 --member="serviceAccount:shipyard-cicd@shipyard-254.iam.gserviceaccount.com" --role="$role" --condition=None
done

4. Grant Kubernetes cluster-admin to the SA

The Helm chart manages ClusterRoles and ClusterRoleBindings, so the CI/CD SA needs cluster-admin within Kubernetes (separate from GCP IAM):

kubectl create clusterrolebinding shipyard-cicd-cluster-admin --clusterrole=cluster-admin --user=shipyard-cicd@shipyard-254.iam.gserviceaccount.com

5. Add the GitHub repository secret

In GitHub → Alisao/shipyard → Settings → Secrets and variables → Actions, add:

Secret nameValue
PG_PASSWORDThe Postgres password from Step 6

Or via the CLI: gh secret set PG_PASSWORD --repo Alisao/shipyard --body "<value>"

Workflow file

The full pipeline is at .github/workflows/deploy.yml. Key variables at the top of the file — update these if the project ID, region, cluster name, or repository changes:

env:
  PROJECT: shipyard-254
  REGION: africa-south1
  REGISTRY: africa-south1-docker.pkg.dev
  IMAGE: africa-south1-docker.pkg.dev/shipyard-254/shipyard/platform
  CLUSTER: shipyard-prod
  WIF_PROVIDER: projects/346003678070/locations/global/workloadIdentityPools/github-actions/providers/github
  SA: shipyard-cicd@shipyard-254.iam.gserviceaccount.com

Migration tracking

The runner maintains a schema_migrations table in the shipyard database. On the very first pipeline run against a pre-existing database, it seeds the table with all already-applied migration filenames (detected by checking whether the users table exists). From that point on, only files not yet in the table are applied. Adding a new migration is as simple as dropping a new *.sql file in packages/db/migrations/ — the next push picks it up automatically.

Known constraints

Cloud Build source upload is blocked. The org policy prevents the CI/CD SA from accessing the shipyard-254_cloudbuild GCS bucket. The pipeline therefore builds the image directly on the GitHub Actions runner (ubuntu-latest, which is linux/amd64) instead of using gcloud builds submit. Do not attempt to restore Cloud Build without first granting roles/storage.objectAdmin on that bucket.

Artifact Registry pull secret is refreshed on every deploy. The GKE node pool uses devstorage.read_only OAuth scope, which does not cover Artifact Registry. The pipeline deletes and recreates artifact-registry-key in shipyard-system on each run using a fresh token from gcloud auth print-access-token. The permanent fix is to recreate the node pool with --scopes=cloud-platform.

CLI Reference

Install: npm install -g @shipyard/cli. All commands require ship login first.

CommandDescription
ship loginAuthenticate and save tokens to ~/.shipyard/config.json
ship initGenerate shipyard.json manifest by detecting project type
ship pushRegister or update project with Shipyard (reads shipyard.json)
ship deployTrigger a deployment for the current project (auto-detected from git remote)
ship deploy --project <slug> --followDeploy and stream build logs; exit 0/1 based on result
ship logs <deploymentId>Show build logs for a deployment
ship logs <deploymentId> --followStream build logs live via long-polling
ship status --project <slug>Show latest deployment status and URL
ship psList all projects and their live deployment status
ship rollback <deploymentId>Promote a past deployment back to live (<10s)
ship env list --project <slug>List env vars (values masked)
ship env set KEY=value --project <slug>Set an env var (production scope by default)
ship env set KEY=value --target <target>Set env var with target: production, preview, or all
ship env unset KEY --project <slug>Remove an env var

Environment Variable Targets

Env vars can be scoped to production (default), preview, or all (applies to both environments). Preview-scoped vars are only injected into PR/MR preview deployments.

API Reference

Base URL: https://api.shipyard.wake.co.ke/api/v1. Full OpenAPI docs at /api/docs. All endpoints except /auth/* and /webhooks/* require Authorization: Bearer <token>.

Auth

MethodPathDescription
POST/auth/registerCreate account + personal team
POST/auth/loginReturns access token (15m) + refresh token (30d)
POST/auth/refreshRotate refresh token, issue new access token
POST/auth/logoutInvalidate refresh token

Deployments

MethodPathDescription
GET/projects/:id/deploymentsList deployments (last 50)
POST/projects/:id/deploymentsTrigger manual deploy
GET/deployments/:idGet deployment detail
POST/deployments/:id/cancelCancel queued or building deployment
POST/deployments/:id/rollbackRestore this deployment as live
GET/deployments/:id/logsPaginated logs (?after=<seq>&limit=500)
GET/deployments/:id/logs/streamSSE live log stream

Webhooks

MethodPathDescription
POST/webhooks/githubGitHub push + pull_request events (HMAC-SHA256 via X-Hub-Signature-256)
POST/webhooks/gitlabGitLab Push Hook + Merge Request Hook (token via X-Gitlab-Token)
POST/webhooks/bitbucketBitbucket repo:push + pullrequest:* events (HMAC-SHA256 via X-Hub-Signature)

Teams & Members

MethodPathMin Role
GET/teamsviewer
GET/teams/:id/membersviewer
POST/teams/:id/membersadmin
PATCH/teams/:id/members/:userIdadmin
DELETE/teams/:id/members/:userIdadmin
GET/teams/:id/audit-logadmin

API Tokens

API tokens for CI/CD authentication. Tokens are prefixed ship_ and stored as SHA-256 hashes. The raw value is shown only once at creation.

MethodPathDescription
GET/teams/:id/api-tokensList tokens (hash excluded)
POST/teams/:id/api-tokensCreate token; returns raw ship_ value once
DELETE/teams/:id/api-tokens/:tokenIdRevoke token

Tokens can be team-scoped (access to all team projects) or project-scoped (single project only). Set projectId when creating for project scope.

Git Sources & OAuth

MethodPathDescription
POST/oauth/github/initiateStart GitHub OAuth flow
GET/oauth/github/callbackOAuth callback (redirects to dashboard)
POST/oauth/bitbucket/initiateStart Bitbucket OAuth flow
GET/oauth/bitbucket/callbackOAuth callback
GET/teams/:id/git-sourcesList connected git sources
GET/git-sources/:id/reposList repositories from connected source
GET/projects/:id/branchesList branches for project's repo

DNS Providers

Cloudflare API token management for automatic DNS record creation.

MethodPathDescription
GET/teams/:id/dns-providersList configured providers
POST/teams/:id/dns-providersConfigure Cloudflare provider (upsert)
DELETE/teams/:id/dns-providers/:idRemove provider

RBAC Permissions

Four role levels control access to team and project resources.

RolePermissions
ownerFull access including team deletion, member role changes, DNS provider management
adminManage git sources, API tokens, DNS providers, invite members, view audit log
developerCreate projects, trigger deployments, manage env vars, add domains
viewerView projects, deployments, and logs; cannot modify anything

Permission Matrix

Actionviewerdeveloperadminowner
View projects/deployments
View logs
Create project
Trigger deployment
Manage env vars
Add custom domains
Reveal env var values
Manage git sources
Manage API tokens
Manage DNS providers
Invite/change members
View audit log
Delete team

Commit Status Reporting

Build and deployment status is reported back to GitHub and Bitbucket as commit status checks. Developers see ✓ or ✗ directly in PRs without visiting the dashboard.

Status Flow

Deployment StatusGitHub CheckBitbucket Build Status
queuedpending — "Build queued"INPROGRESS — "Queued"
buildingpending — "Building"INPROGRESS — "Building"
livesuccess — "Deploy live"SUCCESSFUL — "Live"
failed / cancelledfailure — "Deploy failed"FAILED — "Failed"

Implementation

Status updates are sent via provider APIs:

  • GitHub: POST /repos/{owner}/{repo}/statuses/{sha}
  • Bitbucket: POST /2.0/repositories/{fullName}/commit/{hash}/statuses/build

Requires a connected git source with valid OAuth token. Statuses are updated in real-time as deployments progress through the queue.

DNS Providers & Custom Domains

Teams can configure Cloudflare API tokens for automatic DNS management. When configured, adding a custom domain automatically creates the required DNS records.

Setup

In Team Settings → DNS Providers, add a Cloudflare API token with these permissions:

  • Zone: DNS → Edit
  • Zone Resources: Include → Specific zone → your domain

Automatic DNS Management

When a DNS provider is configured:

  1. Adding a domain auto-creates the TXT verification record (_shipyard-verify.<domain>)
  2. Upon verification, an A record pointing to PLATFORM_EXTERNAL_IP is created
  3. Domain status flows: pending → provisioning → active

Required Environment Variables

VariableDescription
PLATFORM_EXTERNAL_IPCluster ingress IP for A records
GITHUB_CLIENT_IDGitHub OAuth app client ID
GITHUB_CLIENT_SECRETGitHub OAuth app secret
BITBUCKET_CLIENT_KEYBitbucket OAuth consumer key
BITBUCKET_CLIENT_SECRETBitbucket OAuth consumer secret

Audit Events

Security-relevant actions are recorded in the audit_events table with immutable timestamps. Accessible to team admins and owners via Team Settings.

Recorded Actions

ActionDescription
api_token.createdNew API token created
api_token.revokedAPI token deleted
domain.verifiedCustom domain passed DNS verification
domain.deletedCustom domain removed
member.invitedNew member invited to team
member.role_changedMember role modified
member.removedMember removed from team
env_var.revealedAdmin/owner viewed decrypted env var value

API

GET /teams/:id/audit-log — Returns paginated audit events (admin/owner only).

Buildpack Detection

Detection runs on every build. Rules are evaluated in priority order — first match wins. A shipyard.json at the repo root overrides detection entirely.

PrioritySignalTypeStrategy
0shipyard.json with typeexplicitUse config
1package.json + next.config.*nextjsnpm run build → container
2package.json + (vite.config.* or react-scripts)static-nodeBuild → upload dist/ to MinIO
3package.json onlynodeContainerise, npm start
4pom.xmlspring-boot-mavenmvn package → JAR → JRE container
5build.gradlespring-boot-gradle./gradlew bootJar → JAR → JRE container
6composer.json + *.phpphp-composercomposer install → PHP-FPM + NGINX; NGINX root auto-detected (public/ if present, repo root otherwise)
7*.php onlyphp-legacyCopy → PHP-FPM + NGINX (root at /var/www/html)
8index.html onlystaticUpload entire dir to MinIO
9Dockerfile presentdockerfiledocker build as-is

shipyard.json override

{
  "type": "nextjs",
  "port": 3000,
  "healthCheckPath": "/api/health",
  "buildCommand": "npm run build:prod",
  "outputDirectory": ".next"
}

PHP webRoot

For php-composer projects, Shipyard auto-detects the NGINX document root:

  • public/ directory present → NGINX serves from /var/www/html/public (Laravel / Symfony convention)
  • No public/ directory → NGINX serves from /var/www/html (flat structure)

Override explicitly in shipyard.json or via project buildConfig:

{
  "type": "php-composer",
  "webRoot": "web"
}

Operational Notes

🔑

ENCRYPTION_KEK is irreplaceable. All project env vars are encrypted with it. Losing it means they are permanently unreadable. Rotating it requires decrypting every valueEncrypted row in the database first. Back it up in a separate secrets manager.

TopicDetail
Database backupsBack up Postgres before every migration. The audit_events and env_vars tables are the most sensitive.
Access tokens15-minute expiry. Both clients auto-refresh. Refresh tokens are valid 30 days — after expiry users must re-run ship login.
Preview cleanupThe cron worker cancels previews older than 7 days and deletes all k8s resources (Deployment, Service, ConfigMaps, Secret, IngressRoute) via teardownPreviewResources().
Harbor image GCThe DB-side GC runs every 6 hours. Actual Harbor/Artifact Registry image deletion is Sprint 26 — storage grows until then.
Build worker nodeThe build worker mounts the host Docker socket. Schedule it on a dedicated node pool with a taint/toleration to isolate untrusted build code from production workloads.
Metrics & monitoringPrometheus metrics at GET /metrics (requires Authorization: Bearer <METRICS_TOKEN>). kube-prometheus-stack deployed in the monitoring namespace — scrapes every 30s. Alerts (API down, high error rate, pod crash-looping, queue backlog, high deploy failure rate) fire to isabokea@gmail.com via AlertManager. Grafana dashboard at grafana.shipyard.wake.co.ke (login: admin + GRAFANA_PASSWORD secret).
Audit logSecurity events in audit_events table. API at GET /teams/:id/audit-log. Forward to a SIEM via a cron export if needed.
Cloudflare API tokenStored as k8s secret traefik-cloudflare-token in shipyard-infra. Cloudflare API tokens do not expire by default. If the token is ever rotated: kubectl delete secret traefik-cloudflare-token -n shipyard-infra, recreate it, then kubectl rollout restart deployment/traefik -n shipyard-infra.
Dashboard API proxyThe dashboard SPA uses relative /api/v1 paths. The nginx ConfigMap in infra/helm/templates/deployment-dashboard.yaml proxies /api/ to the API ClusterIP service internally. If you ever change the API service name or port, update that proxy rule and redeploy the dashboard.
Artifact Registry pull secretThe artifact-registry-key secret in shipyard-system uses a GCP OAuth token that expires in ~1 hour. Renew it from Cloud Shell: kubectl delete secret artifact-registry-key -n shipyard-system && kubectl create secret docker-registry artifact-registry-key --docker-server=africa-south1-docker.pkg.dev --docker-username=oauth2accesstoken --docker-password="$(gcloud auth print-access-token)" --namespace=shipyard-system. For a permanent fix, recreate the node pool with --scopes=cloud-platform.
Bitbucket webhooksIn the Bitbucket repository → Settings → Webhooks → Add webhook: set Payload URL to https://api.shipyard.wake.co.ke/api/v1/webhooks/bitbucket, add the webhookSecret shown at git source creation as the Secret. Enable Repository → Push and all Pull Request triggers. The token field when adding a Bitbucket git source should be a Repository Access Token (or Workspace Access Token) with repository:read scope — this is used to clone private repos.
Private repo cloningAll three providers support private repos via the PAT/token stored on the git source. Tokens are decrypted at enqueue time and passed to the build job as cloneToken in Redis (ephemeral). They are never written to the DB in plaintext or emitted in build logs. GitHub uses x-access-token:<token>, GitLab uses oauth2:<token>, Bitbucket uses x-token-auth:<token> as the URL credential prefix.
CI/CD pipelineEvery push to main on Alisao/shipyard triggers the Deploy workflow. Monitor at github.com/Alisao/shipyard/actions. If a run gets stuck mid-upgrade the Helm release may end up in pending-upgrade state — the pipeline auto-recovers on the next push. Manual recovery: helm rollback shipyard 0 -n shipyard-system.
Adding a new migrationDrop a new *.sql file in packages/db/migrations/. The CI/CD pipeline applies it automatically on the next push via the schema_migrations tracker. Never run ALTER TYPE ... ADD VALUE without IF NOT EXISTS — enum additions are not transactional in Postgres and cannot be rolled back.