docs(reference-ansible): add docs/ tree and document repo, playbooks, Makefile

Addresses the WKS PoC review (Notion 2026-05-26). All docs in English.
- README: purpose, docs table of contents, annotated repo tree
- docs/getting_started.md: prerequisites (WKS account, OIDC, SSH, VPN) + first deploy
- docs/ansible.md: playbook table, "Running Ansible", service parameters, cheatsheet
- docs/secrets.md: canonical Bao login (moved out of README) + demo defaults
- docs/operations.md: full Makefile reference
- docs/inventories.md: repo layout, topology, standard folder structure, walkthrough
- docs/testing.md: static checks, inventory resolution, smoke test / dry run
- remove ARCHITECTURE.md (architecture docs live externally)

Also includes the gymburgdorf inventory build-out (bookstack, homarr,
opnform, send) and scripts/bao-seed.sh. site.yml keeps a third traefik
play (traefik_servers minus the vagrant _dmz/_backend split) so the demo
inventories still configure their reverse proxy after the rebase onto main.
This commit is contained in:
Simon Bärlocher 2026-05-27 18:08:52 +02:00
parent c67e9aac43
commit 2ba0c07cd3
No known key found for this signature in database
GPG key ID: 63DE20495932047A
24 changed files with 1541 additions and 525 deletions

209
docs/ansible.md Normal file
View file

@ -0,0 +1,209 @@
<!-- markdownlint-disable MD013 MD060 MD051 -->
# Playbooks & Parameters
[← Documentation index](README.md)
Central reference: which plays [playbooks/site.yml](../playbooks/site.yml)
runs, which service parameters are relevant per role, and where they are
located in the inventory. Example used throughout: `demo-gymburgdorf`.
## Playbook `site.yml`
The only playbook is [playbooks/site.yml](../playbooks/site.yml). It
consists of a sequence of plays, each applying one role from
`digitalboard.core` to a host group. All plays run with
`become: yes`. Plays whose group has no members in an inventory run as a
**no-op**.
| # | Play / role | `hosts:` | Target in `demo-gymburgdorf`? |
| --- | --- | --- | --- |
| 1 | `digitalboard.core.base` | `all_servers` | ✅ all 4 hosts |
| 2 | `digitalboard.core.traefik` | `traefik_servers_backend` | — no-op (vagrant-only group) |
| 3 | `digitalboard.core.traefik` | `traefik_servers_dmz` | — no-op (vagrant-only group) |
| 4 | `digitalboard.core.traefik` | `traefik_servers:!traefik_servers_dmz:!traefik_servers_backend` | ✅ all 4 (dmz on `reverseproxy`, otherwise backend) |
| 5 | `digitalboard.core.httpbin` | `httpbin_servers` | — no-op |
| 6 | `digitalboard.core.389ds` | `ds389_servers` | — no-op |
| 7 | `digitalboard.core.keycloak` | `keycloak_servers` | — no-op |
| 8 | `digitalboard.core.garage` | `garage_servers` | ✅ `storage` |
| 9 | `digitalboard.core.collabora` | `collabora_servers` | ✅ `application` |
| 10 | `digitalboard.core.authentik` | `authentik_servers` | ✅ `application` |
| 11 | `digitalboard.core.authentik_outpost_ldap` | `authentik_outpost_ldap_servers` | ✅ `application` |
| 12 | `digitalboard.core.nextcloud` | `nextcloud_servers` | ✅ `application` |
| 13 | `digitalboard.core.drawio` | `drawio_servers` | ✅ `application` |
| 14 | `digitalboard.core.send` | `send_servers` | ✅ `application` |
| 15 | `digitalboard.core.opnform` | `opnform_servers` | ✅ `application` |
| 16 | `digitalboard.core.homarr` | `homarr_servers` | ✅ `application` |
| 17 | `digitalboard.core.bookstack` | `bookstack_servers` | ✅ `application` |
| 18 | `digitalboard.core.opencloud` | `opencloud_servers` | — no-op (no group) |
> **Three traefik plays, two topologies.** `vagrant` splits the reverse
> proxy into `traefik_servers_dmz` + `traefik_servers_backend` (plays 2
> and 3). The demo inventories (e.g. `demo-gymburgdorf`) instead group
> all hosts under `traefik_servers` and select dmz/backend per host via
> `traefik_mode`; play 4's `:!…` intersection targets exactly those
> hosts and stays a no-op for the vagrant split. Each topology thus
> triggers only the traefik play(s) that fit it — no host runs traefik
> twice.
>
> Which plays take effect for a tenant is controlled **solely through
> group membership** in `hosts.yml`. A service becomes active as soon as
> its `<service>_servers` group contains a host and a matching
> `host_vars/<host>/<service>.yml` exists.
## Running Ansible
**Prerequisite:** collections installed (`make install`) and logged in
to OpenBao in the **same shell** (`VAULT_TOKEN` set) — without a token,
the `community.hashi_vault` lookups fail. Login procedure:
[secrets.md § OpenBao login](secrets.md#openbao-login).
Initial setup step by step: [getting_started.md](getting_started.md).
### Via Makefile (recommended)
```bash
make ping_demo # Smoke test (ping) against all demo inventories
make deploy_site_demo_gymburgdorf # single demo site
make deploy_site_demo # all three demo sites in sequence
```
The Make targets encapsulate the full `ansible-playbook` invocation
including `--diff` and the macOS fork env var. All targets:
[operations.md § Makefile reference](operations.md#makefile-reference).
### Direct `ansible-playbook` invocation
When you need flags that the targets do not set:
```bash
# Full deploy of an inventory
ansible-playbook playbooks/site.yml \
-i inventories/demo-gymburgdorf/hosts.yml --diff
# Only one host (e.g. just the application machine)
ansible-playbook playbooks/site.yml \
-i inventories/demo-gymburgdorf/hosts.yml --limit application
# Dry run without changes
ansible-playbook playbooks/site.yml \
-i inventories/demo-gymburgdorf/hosts.yml --check --diff
```
> Because service selection runs through the groups in `hosts.yml` (not
> through tags), `--limit <host>` is the usual way to narrow down a
> deploy. `--check` is only of limited value with the Docker Compose-based
> roles — some tasks report "changed" because they only learn the actual
> container state at runtime.
Deploy flow and play order: [operations.md § Deploy](operations.md#deploy).
## Where parameters belong
| Variable group | File in `demo-gymburgdorf/` | Why |
| --- | --- | --- |
| `vault_addr`, `vault_mount` | `group_vars/all/vault.yml` | Bao endpoint applies site-wide |
| `docker_registry_mirrors` | `group_vars/all/docker.yml` | Pulls from mirror on all hosts |
| `traefik_acme_*`, `traefik_use_ssl`, `traefik_cert_mode`, `traefik_log_level` | `group_vars/traefik_servers/traefik.yml` | applies to all Traefik instances (dmz + backend) |
| `traefik_mode: backend` | `group_vars/backend_servers/traefik.yml` | default for app + storage |
| `traefik_mode: dmz`, `traefik_dmz_exposed_services` | `host_vars/reverseproxy/traefik.yml` | host-specific override, only meaningful there |
| `nextcloud_*`, `authentik_*`, `collabora_*`, `drawio_*`, `send_*`, `opnform_*`, `homarr_*`, `bookstack_*` | `host_vars/application/<service>.yml` | service runs on `application` |
| `garage_*` | `host_vars/storage/garage.yml` | service runs on `storage` |
| Secrets (passwords, tokens, keys) | inline var with `lookup('community.hashi_vault.hashi_vault', …)` | single source of truth via Bao, see [secrets.md](secrets.md) |
## Service parameters in detail
Complete variable lists are in the `defaults/main.yml` of the respective
role in `digitalboard.core`. Below are the parameters maintained in the
demo inventories per service — as guidance on which fields a new tenant
typically needs to set.
### traefik
| Variable | Example / default | Purpose |
| --- | --- | --- |
| `traefik_mode` | `dmz` \| `backend` | Provider mode: `dmz` = file provider (public-facing, no Docker socket), `backend` = docker provider (auto-discovery via container labels) |
| `traefik_cert_mode` | `acme` \| `selfsigned` | Certificate source |
| `traefik_use_ssl` | `true` | TLS active |
| `traefik_ssl_email` | `hostmaster@digitalboard.ch` | ACME contact |
| `traefik_log_level` | `DEBUG` (role default `INFO`) | reduce for prod |
| `traefik_network` | `proxy` | Docker network for backend mode |
| `traefik_acme_dns_zone` | `demo-gymb._acme.digitalboard.ch` | RFC2136 update zone |
| `traefik_acme_dns_nameserver` | from Bao / `172.16.9.169` (DMZ override) | TSIG update target |
| `traefik_acme_tsig_algorithm` / `_key` / `_secret` | `hmac-sha256` / Bao | TSIG signature |
| `traefik_acme_tcp_only` | `true` | force DNS lookups over TCP/53 |
| `traefik_acme_disable_ans_checks` | `true` (DMZ only) | skip NS propagation poll |
| `traefik_dmz_exposed_services` | list (DMZ) | which backends the DMZ Traefik routes |
### authentik (IdP — OIDC + LDAP outpost backend)
| Variable | Purpose |
| --- | --- |
| `authentik_domains` | public FQDNs (`auth.gymb.souveredu.ch`) |
| `authentik_host_rewrite_domains` | internal `*.int.*` names for LAN server-to-server |
| `authentik_secret_key`, `authentik_postgres_password` | Bao lookup |
| `authentik_ldap_apps`, `authentik_ldap_outpost` | LDAP app + outpost definition (base_dn, token) |
| `authentik_proxy_apps` | ForwardAuth apps (slug, external/internal_host, allowed_groups) |
### authentik_outpost_ldap
| Variable | Purpose |
| --- | --- |
| `authentik_outpost_ldap_host` | internal Authentik host (`https://auth.int.…`) |
| `authentik_outpost_ldap_token` | outpost token (Bao, identical to `authentik.ldap_outpost_token`) |
### nextcloud
| Variable | Purpose |
| --- | --- |
| `nextcloud_image` | image tag (pin to patched version) |
| `nextcloud_domains` | first entry = canonical public FQDN, further `*.int.*` |
| `nextcloud_admin_user` / `_password`, `nextcloud_postgres_password` | admin + DB (Bao) |
| `nextcloud_use_s3_storage`, `nextcloud_s3_*` | S3 primary storage via Garage (key/secret via `garage_credentials` lookup) |
| `nextcloud_enable_collabora`, `nextcloud_collabora_domain` / `_public_domain` | WOPI integration |
| `nextcloud_enable_drawio`, `nextcloud_drawio_url` | Draw.io integration |
| `nextcloud_oidc_providers` | OIDC login via Authentik (discovery_url, client_id/secret) |
| `nextcloud_ldap_enabled`, `nextcloud_ldap_config` | LDAP backend against Authentik outpost |
| `nextcloud_apps_to_install` | app list (groupfolders, richdocuments, spreed, user_ldap, …) |
| `nextcloud_allow_local_remote_servers`, `nextcloud_extra_hosts` | LAN-only routing for server-to-server calls |
### collabora
| Variable | Purpose |
| --- | --- |
| `collabora_domains` | public + internal FQDN |
| `collabora_allowed_domains`, `collabora_frame_ancestors` | allowed WOPI hosts / iframe embedding |
### drawio
| Variable | Purpose |
| --- | --- |
| `drawio_domain`, `drawio_extra_domains` | public + internal FQDN |
| `drawio_authentik_forward_auth`, `_url` | access protection via Authentik ForwardAuth |
### garage (S3 object store)
| Variable | Purpose |
| --- | --- |
| `garage_s3_domains` | first entry = public S3 FQDN, further `*.int.*` |
| `garage_webui_domain`, `garage_webui_enabled` | admin WebUI |
| `garage_webui_authentik_forward_auth`, `_url` | WebUI behind Authentik (admins only) |
| `garage_rpc_secret`, `garage_admin_token`, `garage_metrics_token` | Bao lookup |
| `garage_bootstrap_*` | single-node cluster bootstrap (zone, capacity) |
| `garage_s3_keys` | keys + buckets + permissions (e.g. `nextcloud`) |
### send / opnform / homarr / bookstack
Same pattern: `<service>_domain`/`_domains` (+ `*.int.*`),
`<service>_base_url`, admin credentials and app keys via Bao lookup,
plus OIDC integration with Authentik (`<service>_oidc_*`: issuer, client_id,
client_secret, admin_group). For the concrete fields, see the respective
`host_vars/application/<service>.yml`.
## Variable cheatsheet
Short form of the location table above — "which variable goes where":
- **Site-wide**`group_vars/all/` (Bao endpoint, Docker mirror)
- **All Traefik**`group_vars/traefik_servers/`
- **app + storage**`group_vars/backend_servers/`
- **Single host**`host_vars/<host>/<service>.yml`
- **Secrets** → always Bao lookup, never plaintext (see [secrets.md](secrets.md))