docs(reference-ansible): add docs/ tree and document repo, playbooks, Makefile

Addresses the WKS PoC review (Notion 2026-05-26). All docs in English.
- README: purpose, docs table of contents, annotated repo tree
- docs/getting_started.md: prerequisites (WKS account, OIDC, SSH, VPN) + first deploy
- docs/ansible.md: playbook table, "Running Ansible", service parameters, cheatsheet
- docs/secrets.md: canonical Bao login (moved out of README) + demo defaults
- docs/operations.md: full Makefile reference
- docs/inventories.md: repo layout, topology, standard folder structure, walkthrough
- docs/testing.md: static checks, inventory resolution, smoke test / dry run
- remove ARCHITECTURE.md (architecture docs live externally)

Also includes the gymburgdorf inventory build-out (bookstack, homarr,
opnform, send) and scripts/bao-seed.sh. site.yml keeps a third traefik
play (traefik_servers minus the vagrant _dmz/_backend split) so the demo
inventories still configure their reverse proxy after the rebase onto main.
This commit is contained in:
Simon Bärlocher 2026-05-27 18:08:52 +02:00
parent c67e9aac43
commit 2ba0c07cd3
No known key found for this signature in database
GPG key ID: 63DE20495932047A
24 changed files with 1541 additions and 525 deletions

29
docs/README.md Normal file
View file

@ -0,0 +1,29 @@
<!-- markdownlint-disable MD013 -->
# Documentation — `reference-ansible`
Entry point for this repository's in-depth documentation. The
[`demo-gymburgdorf`](../inventories/demo-gymburgdorf/) inventory serves
as a running example throughout.
> **Demo-only.** All role defaults (passwords, tokens, RPC secrets) are
> insecure and intended exclusively for demo setups. See
> [secrets.md § Demo-Only-Defaults](secrets.md#demo-only-defaults--must-be-overridden).
## Table of contents
| Document | Content |
| --- | --- |
| [getting_started.md](getting_started.md) | Prerequisites (access, tools), first deploy step by step |
| [operations.md](operations.md) | Setup, prerequisites, deploy flow, smoke test, known gaps |
| [secrets.md](secrets.md) | OpenBao login, secret lookup pattern, demo-only defaults, threat boundaries |
| [inventories.md](inventories.md) | Repository layout, roles origin, inventory topology, new-tenant walkthrough |
| [ansible.md](ansible.md) | Playbooks (`site.yml`), per-service parameters, variable cheat sheet |
| [testing.md](testing.md) | Static checks, inventory resolution, smoke test/dry run before the deploy |
## Quick links
- **First time here?** → [getting_started.md](getting_started.md)
- **Create a new tenant** → [inventories.md § Walkthrough](inventories.md#walkthrough-creating-a-new-demo-tenant)
- **Which variable goes where?** → [ansible.md § Variable cheat sheet](ansible.md#variable-cheatsheet)
- **Store a secret in Bao** → [secrets.md § Secret pattern](secrets.md#secret-pattern-bao-lookup)
- **Run a deploy** → [operations.md § Deploy](operations.md#deploy)

209
docs/ansible.md Normal file
View file

@ -0,0 +1,209 @@
<!-- markdownlint-disable MD013 MD060 MD051 -->
# Playbooks & Parameters
[← Documentation index](README.md)
Central reference: which plays [playbooks/site.yml](../playbooks/site.yml)
runs, which service parameters are relevant per role, and where they are
located in the inventory. Example used throughout: `demo-gymburgdorf`.
## Playbook `site.yml`
The only playbook is [playbooks/site.yml](../playbooks/site.yml). It
consists of a sequence of plays, each applying one role from
`digitalboard.core` to a host group. All plays run with
`become: yes`. Plays whose group has no members in an inventory run as a
**no-op**.
| # | Play / role | `hosts:` | Target in `demo-gymburgdorf`? |
| --- | --- | --- | --- |
| 1 | `digitalboard.core.base` | `all_servers` | ✅ all 4 hosts |
| 2 | `digitalboard.core.traefik` | `traefik_servers_backend` | — no-op (vagrant-only group) |
| 3 | `digitalboard.core.traefik` | `traefik_servers_dmz` | — no-op (vagrant-only group) |
| 4 | `digitalboard.core.traefik` | `traefik_servers:!traefik_servers_dmz:!traefik_servers_backend` | ✅ all 4 (dmz on `reverseproxy`, otherwise backend) |
| 5 | `digitalboard.core.httpbin` | `httpbin_servers` | — no-op |
| 6 | `digitalboard.core.389ds` | `ds389_servers` | — no-op |
| 7 | `digitalboard.core.keycloak` | `keycloak_servers` | — no-op |
| 8 | `digitalboard.core.garage` | `garage_servers` | ✅ `storage` |
| 9 | `digitalboard.core.collabora` | `collabora_servers` | ✅ `application` |
| 10 | `digitalboard.core.authentik` | `authentik_servers` | ✅ `application` |
| 11 | `digitalboard.core.authentik_outpost_ldap` | `authentik_outpost_ldap_servers` | ✅ `application` |
| 12 | `digitalboard.core.nextcloud` | `nextcloud_servers` | ✅ `application` |
| 13 | `digitalboard.core.drawio` | `drawio_servers` | ✅ `application` |
| 14 | `digitalboard.core.send` | `send_servers` | ✅ `application` |
| 15 | `digitalboard.core.opnform` | `opnform_servers` | ✅ `application` |
| 16 | `digitalboard.core.homarr` | `homarr_servers` | ✅ `application` |
| 17 | `digitalboard.core.bookstack` | `bookstack_servers` | ✅ `application` |
| 18 | `digitalboard.core.opencloud` | `opencloud_servers` | — no-op (no group) |
> **Three traefik plays, two topologies.** `vagrant` splits the reverse
> proxy into `traefik_servers_dmz` + `traefik_servers_backend` (plays 2
> and 3). The demo inventories (e.g. `demo-gymburgdorf`) instead group
> all hosts under `traefik_servers` and select dmz/backend per host via
> `traefik_mode`; play 4's `:!…` intersection targets exactly those
> hosts and stays a no-op for the vagrant split. Each topology thus
> triggers only the traefik play(s) that fit it — no host runs traefik
> twice.
>
> Which plays take effect for a tenant is controlled **solely through
> group membership** in `hosts.yml`. A service becomes active as soon as
> its `<service>_servers` group contains a host and a matching
> `host_vars/<host>/<service>.yml` exists.
## Running Ansible
**Prerequisite:** collections installed (`make install`) and logged in
to OpenBao in the **same shell** (`VAULT_TOKEN` set) — without a token,
the `community.hashi_vault` lookups fail. Login procedure:
[secrets.md § OpenBao login](secrets.md#openbao-login).
Initial setup step by step: [getting_started.md](getting_started.md).
### Via Makefile (recommended)
```bash
make ping_demo # Smoke test (ping) against all demo inventories
make deploy_site_demo_gymburgdorf # single demo site
make deploy_site_demo # all three demo sites in sequence
```
The Make targets encapsulate the full `ansible-playbook` invocation
including `--diff` and the macOS fork env var. All targets:
[operations.md § Makefile reference](operations.md#makefile-reference).
### Direct `ansible-playbook` invocation
When you need flags that the targets do not set:
```bash
# Full deploy of an inventory
ansible-playbook playbooks/site.yml \
-i inventories/demo-gymburgdorf/hosts.yml --diff
# Only one host (e.g. just the application machine)
ansible-playbook playbooks/site.yml \
-i inventories/demo-gymburgdorf/hosts.yml --limit application
# Dry run without changes
ansible-playbook playbooks/site.yml \
-i inventories/demo-gymburgdorf/hosts.yml --check --diff
```
> Because service selection runs through the groups in `hosts.yml` (not
> through tags), `--limit <host>` is the usual way to narrow down a
> deploy. `--check` is only of limited value with the Docker Compose-based
> roles — some tasks report "changed" because they only learn the actual
> container state at runtime.
Deploy flow and play order: [operations.md § Deploy](operations.md#deploy).
## Where parameters belong
| Variable group | File in `demo-gymburgdorf/` | Why |
| --- | --- | --- |
| `vault_addr`, `vault_mount` | `group_vars/all/vault.yml` | Bao endpoint applies site-wide |
| `docker_registry_mirrors` | `group_vars/all/docker.yml` | Pulls from mirror on all hosts |
| `traefik_acme_*`, `traefik_use_ssl`, `traefik_cert_mode`, `traefik_log_level` | `group_vars/traefik_servers/traefik.yml` | applies to all Traefik instances (dmz + backend) |
| `traefik_mode: backend` | `group_vars/backend_servers/traefik.yml` | default for app + storage |
| `traefik_mode: dmz`, `traefik_dmz_exposed_services` | `host_vars/reverseproxy/traefik.yml` | host-specific override, only meaningful there |
| `nextcloud_*`, `authentik_*`, `collabora_*`, `drawio_*`, `send_*`, `opnform_*`, `homarr_*`, `bookstack_*` | `host_vars/application/<service>.yml` | service runs on `application` |
| `garage_*` | `host_vars/storage/garage.yml` | service runs on `storage` |
| Secrets (passwords, tokens, keys) | inline var with `lookup('community.hashi_vault.hashi_vault', …)` | single source of truth via Bao, see [secrets.md](secrets.md) |
## Service parameters in detail
Complete variable lists are in the `defaults/main.yml` of the respective
role in `digitalboard.core`. Below are the parameters maintained in the
demo inventories per service — as guidance on which fields a new tenant
typically needs to set.
### traefik
| Variable | Example / default | Purpose |
| --- | --- | --- |
| `traefik_mode` | `dmz` \| `backend` | Provider mode: `dmz` = file provider (public-facing, no Docker socket), `backend` = docker provider (auto-discovery via container labels) |
| `traefik_cert_mode` | `acme` \| `selfsigned` | Certificate source |
| `traefik_use_ssl` | `true` | TLS active |
| `traefik_ssl_email` | `hostmaster@digitalboard.ch` | ACME contact |
| `traefik_log_level` | `DEBUG` (role default `INFO`) | reduce for prod |
| `traefik_network` | `proxy` | Docker network for backend mode |
| `traefik_acme_dns_zone` | `demo-gymb._acme.digitalboard.ch` | RFC2136 update zone |
| `traefik_acme_dns_nameserver` | from Bao / `172.16.9.169` (DMZ override) | TSIG update target |
| `traefik_acme_tsig_algorithm` / `_key` / `_secret` | `hmac-sha256` / Bao | TSIG signature |
| `traefik_acme_tcp_only` | `true` | force DNS lookups over TCP/53 |
| `traefik_acme_disable_ans_checks` | `true` (DMZ only) | skip NS propagation poll |
| `traefik_dmz_exposed_services` | list (DMZ) | which backends the DMZ Traefik routes |
### authentik (IdP — OIDC + LDAP outpost backend)
| Variable | Purpose |
| --- | --- |
| `authentik_domains` | public FQDNs (`auth.gymb.souveredu.ch`) |
| `authentik_host_rewrite_domains` | internal `*.int.*` names for LAN server-to-server |
| `authentik_secret_key`, `authentik_postgres_password` | Bao lookup |
| `authentik_ldap_apps`, `authentik_ldap_outpost` | LDAP app + outpost definition (base_dn, token) |
| `authentik_proxy_apps` | ForwardAuth apps (slug, external/internal_host, allowed_groups) |
### authentik_outpost_ldap
| Variable | Purpose |
| --- | --- |
| `authentik_outpost_ldap_host` | internal Authentik host (`https://auth.int.…`) |
| `authentik_outpost_ldap_token` | outpost token (Bao, identical to `authentik.ldap_outpost_token`) |
### nextcloud
| Variable | Purpose |
| --- | --- |
| `nextcloud_image` | image tag (pin to patched version) |
| `nextcloud_domains` | first entry = canonical public FQDN, further `*.int.*` |
| `nextcloud_admin_user` / `_password`, `nextcloud_postgres_password` | admin + DB (Bao) |
| `nextcloud_use_s3_storage`, `nextcloud_s3_*` | S3 primary storage via Garage (key/secret via `garage_credentials` lookup) |
| `nextcloud_enable_collabora`, `nextcloud_collabora_domain` / `_public_domain` | WOPI integration |
| `nextcloud_enable_drawio`, `nextcloud_drawio_url` | Draw.io integration |
| `nextcloud_oidc_providers` | OIDC login via Authentik (discovery_url, client_id/secret) |
| `nextcloud_ldap_enabled`, `nextcloud_ldap_config` | LDAP backend against Authentik outpost |
| `nextcloud_apps_to_install` | app list (groupfolders, richdocuments, spreed, user_ldap, …) |
| `nextcloud_allow_local_remote_servers`, `nextcloud_extra_hosts` | LAN-only routing for server-to-server calls |
### collabora
| Variable | Purpose |
| --- | --- |
| `collabora_domains` | public + internal FQDN |
| `collabora_allowed_domains`, `collabora_frame_ancestors` | allowed WOPI hosts / iframe embedding |
### drawio
| Variable | Purpose |
| --- | --- |
| `drawio_domain`, `drawio_extra_domains` | public + internal FQDN |
| `drawio_authentik_forward_auth`, `_url` | access protection via Authentik ForwardAuth |
### garage (S3 object store)
| Variable | Purpose |
| --- | --- |
| `garage_s3_domains` | first entry = public S3 FQDN, further `*.int.*` |
| `garage_webui_domain`, `garage_webui_enabled` | admin WebUI |
| `garage_webui_authentik_forward_auth`, `_url` | WebUI behind Authentik (admins only) |
| `garage_rpc_secret`, `garage_admin_token`, `garage_metrics_token` | Bao lookup |
| `garage_bootstrap_*` | single-node cluster bootstrap (zone, capacity) |
| `garage_s3_keys` | keys + buckets + permissions (e.g. `nextcloud`) |
### send / opnform / homarr / bookstack
Same pattern: `<service>_domain`/`_domains` (+ `*.int.*`),
`<service>_base_url`, admin credentials and app keys via Bao lookup,
plus OIDC integration with Authentik (`<service>_oidc_*`: issuer, client_id,
client_secret, admin_group). For the concrete fields, see the respective
`host_vars/application/<service>.yml`.
## Variable cheatsheet
Short form of the location table above — "which variable goes where":
- **Site-wide**`group_vars/all/` (Bao endpoint, Docker mirror)
- **All Traefik**`group_vars/traefik_servers/`
- **app + storage**`group_vars/backend_servers/`
- **Single host**`host_vars/<host>/<service>.yml`
- **Secrets** → always Bao lookup, never plaintext (see [secrets.md](secrets.md))

93
docs/getting_started.md Normal file
View file

@ -0,0 +1,93 @@
<!-- markdownlint-disable MD013 -->
# Getting Started
[← Documentation index](README.md)
From zero to your first deploy. This page walks through prerequisites,
setup, and the first Ansible run. Deeper details are linked along the way.
## Prerequisites
### Access (to be set up out-of-band)
- **WKS account with OIDC access to OpenBao.** The login runs via
`bao login -method=oidc -path=Digitalboard`. Without an authorized
account, authentication fails — and without a token there is no
secret lookup, hence no deploy.
- **Bao policy / mount read.** The account needs **Read** on the
mount of the target inventory (e.g. `demo-gymburgdorf/data/*`). Which
paths an inventory reads is documented in the `host_vars/.../<service>.yml`
(see [secrets.md § Secret pattern](secrets.md#secret-pattern-bao-lookup)).
- **SSH key on the target hosts.** The hosts are provisioned as `root`
(`ansible_user: root`, no bastion/jump host). Your own public key
must be placed out-of-band as `root` on the hosts.
- **Network access (VPN).** `bao.digitalboard.ch` and the host networks
(`172.16.9.0/24` DMZ, `172.16.19.0/24` backend) are not publicly
reachable — access requires VPN/network access into the Digitalboard network.
### Tools on the control node
- `ansible` (Core ≥ 2.15) — `ansible --version` to check
- `bao` CLI ([OpenBao](https://openbao.org/)) — e.g.
`sudo pacman -S openbao python-hvac` (Arch) or via Homebrew
- `python-hvac` (for `community.hashi_vault` lookups)
- On macOS: `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` (set in the
[Makefile](../Makefile)) — without this env var, Ansible forks crash
on the first Bao lookup, because the Objective-C runtime is not
fork-safe.
## 1. Clone the repo and install collections
```bash
git clone https://git.digitalboard.ch/Digitalboard/reference-ansible
cd reference-ansible
make install # community.hashi_vault + digitalboard.core into ./collections/
```
There is **no** `roles/` directory in the repo — all roles come from
the `digitalboard.core` collection. See
[inventories.md § Repo layout](inventories.md#repo-layout-and-role-origin).
## 2. Log in to OpenBao
The login must happen in the **same shell** in which
`ansible-playbook` then runs — details and the `make bao` caveat in
[secrets.md § OpenBao login](secrets.md#openbao-login):
```bash
export BAO_ADDR=https://bao.digitalboard.ch
bao login -method=oidc -path=Digitalboard
export VAULT_TOKEN=$(bao print token)
```
## 3. Check connectivity (smoke test)
```bash
make ping_demo # ping module against all three demo inventories
```
If a host does not respond, it is usually due to the SSH key
(prerequisite above) or missing network access (VPN).
## 4. Run Ansible (deploy)
```bash
make deploy_site_demo_gymburgdorf # single demo site
```
At its core, `make` only calls `ansible-playbook` — the equivalent
direct invocation:
```bash
ansible-playbook playbooks/site.yml \
-i inventories/demo-gymburgdorf/hosts.yml --diff
```
All variants (direct invocation, `--limit`, `--tags`, check mode) and
the make targets are documented in [ansible.md § Running Ansible](ansible.md#running-ansible).
## Next steps
- **What happens during a deploy?** → [ansible.md § Playbook](ansible.md#playbook-siteyml)
- **Create a new tenant** → [inventories.md § Walkthrough](inventories.md#walkthrough-creating-a-new-demo-tenant)
- **Store secrets in Bao** → [secrets.md](secrets.md)

179
docs/inventories.md Normal file
View file

@ -0,0 +1,179 @@
<!-- markdownlint-disable MD013 MD060 MD051 -->
# Repo layout & inventories
[← Documentation index](README.md)
## Repo layout and role origin
```text
reference-ansible/
├── Makefile # deploy targets, OIDC login, OBJC fork workaround
├── ansible.cfg # collections_path, remote_user=root, hashi_vault auth_method=token
├── requirements.yml # community.hashi_vault + digitalboard.core (Git)
├── playbooks/site.yml # play sequence (see ansible.md)
├── scripts/bao-seed.sh # seed/merge OpenBao secrets per inventory
├── docs/ # this documentation
├── collections/ # ← installed by `make install`, gitignored
│ └── ansible_collections/
│ └── digitalboard/core/
│ └── roles/ # 🔑 THE ROLES LIVE HERE, NOT in the repo root
└── inventories/
├── demo-gymburgdorf/ # reference inventory of this documentation
├── demo-mbazürich/
├── demo-phbern/
└── vagrant/ # local test inventory with its own topology
```
> **Important:** There is **no** `roles/` directory in the repo root. All
> roles come from the `digitalboard.core` collection (see
> [requirements.yml](../requirements.yml)), installed via
> `make install` into `./collections/`. Plays reference them by
> FQCN `digitalboard.core.<role>`.
## Available inventories
| Inventory | Purpose |
| --- | --- |
| [`demo-gymburgdorf/`](../inventories/demo-gymburgdorf/) | Demo tenant — **recommended as the template for new tenants** |
| [`demo-mbazürich/`](../inventories/demo-mbazürich/) | Demo tenant |
| [`demo-phbern/`](../inventories/demo-phbern/) | Demo tenant |
| [`vagrant/`](../inventories/vagrant/) | local test VMs; **incompatible group topology** with the demo inventories |
## Inventory topology (`demo-gymburgdorf`)
```mermaid
flowchart LR
classDef dmz fill:#fee2e2,stroke:#991b1b,color:#000
classDef app fill:#dcfce7,stroke:#166534,color:#000
classDef stor fill:#dbeafe,stroke:#1e40af,color:#000
classDef turn fill:#fef9c3,stroke:#854d0e,color:#000
subgraph ALL["group: all_servers"]
direction LR
subgraph DMZ["DMZ 172.16.9.0/24"]
RP["<b>reverseproxy</b><br/>172.16.9.111<br/>traefik_mode: dmz"]:::dmz
TURN["<b>turn</b><br/>172.16.9.112<br/>(no role in site.yml yet)"]:::turn
end
subgraph BE["Backend 172.16.19.0/24<br/>group: backend_servers"]
APP["<b>application</b><br/>172.16.19.101<br/>traefik_mode: backend<br/>+ authentik, authentik_outpost_ldap,<br/> nextcloud, collabora, drawio, …"]:::app
ST["<b>storage</b><br/>172.16.19.102<br/>traefik_mode: backend<br/>+ garage (S3)"]:::stor
end
end
RP -.HTTPS in, HTTP out.-> APP
RP -.HTTPS in, HTTP out.-> ST
```
**Group memberships (from [hosts.yml](../inventories/demo-gymburgdorf/hosts.yml)):**
| Group | Members | Purpose |
| --- | --- | --- |
| `all_servers` | `reverseproxy`, `application`, `storage`, `turn` | base role for all hosts |
| `traefik_servers` | `children: all_servers` (all 4 hosts) | Traefik everywhere; DMZ/backend via `traefik_mode` |
| `backend_servers` | `application`, `storage` | sets `traefik_mode: backend` via group_var |
| `garage_servers` | `storage` | single-host wrapper for the Garage role |
| `nextcloud_servers`, `collabora_servers`, `drawio_servers`, `authentik_servers`, `authentik_outpost_ldap_servers`, `send_servers`, `opnform_servers`, `homarr_servers`, `bookstack_servers` | only `application` each | single-host wrappers |
> **Difference from the `vagrant` inventory:** `vagrant` structures
> Traefik differently — via the children groups `traefik_servers_dmz` and
> `traefik_servers_backend` instead of via `backend_servers` +
> `host_vars` override. The two topologies are **structurally
> incompatible**; a 1:1 mapping is not possible. For new tenants, therefore,
> take `demo-gymburgdorf` as the template.
## Standard folder structure of an inventory entry
A fully built-out inventory follows this layout (example
`demo-gymburgdorf`). Currently only this inventory is built out;
`demo-mbazürich` and `demo-phbern` so far contain only `hosts.yml`.
```text
inventories/demo-<kunde>/
├── hosts.yml # REQUIRED — hosts, IPs, group topology
├── group_vars/
│ ├── all/
│ │ ├── vault.yml # REQUIRED — vault_addr, vault_mount (Bao)
│ │ ├── ansible.yml # ansible_python_interpreter etc.
│ │ └── docker.yml # docker_registry_mirrors
│ ├── traefik_servers/
│ │ └── traefik.yml # ACME/TSIG, TLS — applies to ALL Traefik instances
│ └── backend_servers/
│ └── traefik.yml # traefik_mode: backend (default for app + storage)
└── host_vars/
├── reverseproxy/
│ └── traefik.yml # traefik_mode: dmz + DMZ-specific ACME overrides
├── application/
│ ├── main.yml # comment only: which services run here
│ ├── traefik.yml # traefik_dmz_exposed_services (what the DMZ routes)
│ └── <service>.yml # one file per service (nextcloud, authentik, …)
└── storage/
├── main.yml # same as above
├── traefik.yml # traefik_extra_hosts + traefik_dmz_exposed_services
└── garage.yml # service vars for garage
```
**Conventions:**
- **`hosts.yml` is the only hard required file.** Vars are
optional — if one is missing, the role defaults from
`digitalboard.core` take effect. A new inventory therefore starts minimally with
only `hosts.yml` (just like `demo-mbazürich`/`demo-phbern`).
- **`group_vars/all/vault.yml`** is effectively required as soon as
Bao lookups are supposed to work — without `vault_mount`/`vault_addr` the
secret lookups fail.
- **One file per service** under `host_vars/<host>/<service>.yml`. The
file name is free (Ansible loads all YAMLs in the directory); by
convention it is named like the role. Which variables belong where:
[ansible.md § Where parameters belong](ansible.md#where-parameters-belong).
- **`main.yml` per host** is pure documentation — a comment indicating which
services run on the host. Carries no productive vars.
- **`host_vars/<host>/traefik.yml`** declares via
`traefik_dmz_exposed_services` which local services the
DMZ Traefik should make reachable from outside. The DMZ reads this
list via `hostvars[<backend>]` and renders its routers from it. A new
service exposed externally = a new entry here. Mechanics:
[ansible.md § traefik](ansible.md#traefik).
## Walkthrough: Creating a new demo tenant
Recommended template: **`demo-gymburgdorf`** (not `vagrant`, because its
group topology is incompatible).
1. **Copy the inventory:**
```bash
cp -r inventories/demo-gymburgdorf inventories/demo-<kunde>
```
2. **Adjust `hosts.yml`:** IPs, hostnames per host.
3. **`group_vars/all/vault.yml`** — set `vault_mount` to the new
tenant mount (`demo-<kunde>`).
4. **`group_vars/traefik_servers/traefik.yml`** —
point `traefik_acme_dns_zone` and the `acme-tsig` lookup paths to the
new zone / the new Bao path.
5. Go through **`host_vars/application/*.yml`** and **`host_vars/storage/*.yml`**:
FQDNs to the new domain pattern (e.g.
`*.<kunde>.souveredu.ch`), Bao lookup paths to `demo-<kunde>/data/…`.
6. **Prepare OpenBao** (out-of-band, not via Ansible):
- Create a new KV-v2 mount `demo-<kunde>`.
- Write secrets: `acme-tsig`, `authentik`, `nextcloud`,
`garage`, … — conveniently via `make seed_bao_<kunde>` (see
[scripts/bao-seed.sh](../scripts/bao-seed.sh) and
[secrets.md § Demo-Only-Defaults](secrets.md#demo-only-defaults--must-be-overridden)).
- Policy for the deploy token: read on `demo-<kunde>/data/*`.
7. **DNS:** Create the TSIG update zone (`demo-<kunde>._acme.digitalboard.ch`) at
`ns1.digitalboard.ch`, CNAMEs
`_acme-challenge.*.<kunde>.<tld>` pointing there.
8. **Makefile** — add a new target modeled on
`deploy_site_demo_gymburgdorf` and add it to `deploy_site_demo`;
likewise a `seed_bao_<kunde>` target.
9. **Smoke test:** `ansible all -i inventories/demo-<kunde>/hosts.yml -m ping`.
10. **Deploy:** Bao login + `make deploy_site_demo_<kunde>`.

136
docs/operations.md Normal file
View file

@ -0,0 +1,136 @@
<!-- markdownlint-disable MD013 -->
# Setup & operations
[← Documentation index](README.md)
## Prerequisites (control node)
- `ansible` (Core ≥ 2.15)
- `bao` CLI ([OpenBao](https://openbao.org/)) — e.g.
`sudo pacman -S openbao python-hvac` (Arch) or via Homebrew
- `python-hvac` (for `community.hashi_vault` lookups)
- On macOS: `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` (set in the
[Makefile](../Makefile); without this env var, Ansible forks crash
on the first `community.hashi_vault` lookup, because the
Objective-C runtime is not fork-safe)
## Initial setup
```bash
git clone <repo>
cd reference-ansible
make install # Galaxy + digitalboard.core into ./collections/
```
`make install` installs `community.hashi_vault` and the
`digitalboard.core` collection (Git, see [requirements.yml](../requirements.yml))
into `./collections/`. There is **no** `roles/` directory in the
repo root — all roles come from the collection, see
[inventories.md § Repo layout](inventories.md#repo-layout-and-role-origin).
## Secrets (OpenBao)
Before every deploy, authenticate to OpenBao in the **same shell**. The
full login flow, the `make bao` caveat, the lookup pattern, and
tenant isolation are documented in **[secrets.md](secrets.md)**.
## Smoke test
```bash
make ping_demo # pings all three demo inventories (ping module)
```
## Deploy
```bash
make deploy_site_demo_gymburgdorf # single demo site
make deploy_site_demo_mbazürich
make deploy_site_demo_phbern
make deploy_site_demo # all three in sequence
```
`--diff` is set in the Gymburgdorf target → changes are visible per task.
The play order and which plays run as no-ops:
see [ansible.md § Playbooks](ansible.md#playbook-siteyml).
## Makefile reference
The [Makefile](../Makefile) bundles setup, secret handling, and deploy.
It defines no variables for passing in except `DRY_RUN` (for the
`seed_bao_*` targets) — control is via the chosen target.
### Exported env vars (apply to all targets)
| Variable | Value | Purpose |
| --- | --- | --- |
| `BAO_ADDR` | `https://bao.digitalboard.ch` | OpenBao endpoint for `bao` and `community.hashi_vault` calls |
| `OBJC_DISABLE_INITIALIZE_FORK_SAFETY` | `YES` | macOS fork safety: without this var, Ansible forks crash on the first `hashi_vault` lookup, because the Objective-C runtime is not fork-safe |
> Both are set via `export` at the top of the Makefile and thus
> inherited by every target shell process — regardless of which target runs.
### Setup & secrets
| Target | Effect |
| --- | --- |
| `make install` | `ansible-galaxy collection install -r requirements.yml -p collections` — installs `community.hashi_vault` + `digitalboard.core` into `./collections/` |
| `make bao` | `bao login -method=oidc -path=Digitalboard role=default` + sets `VAULT_TOKEN` via `$(eval …)`. ⚠️ The token only lives **within this single `make` invocation** — see caveat below |
| `make seed_bao_gymburgdorf` | Seed/merge OpenBao secrets for `demo-gymburgdorf` via [scripts/bao-seed.sh](../scripts/bao-seed.sh). Idempotent: existing keys remain, only missing ones are generated |
| `make seed_bao_mbazürich` | same for `demo-mbazürich` |
| `make seed_bao_phbern` | same for `demo-phbern` |
> The `seed_bao_*` targets understand `DRY_RUN=1` — shows the diff without
> writing: `make seed_bao_gymburgdorf DRY_RUN=1`. Requirement:
> `bao`, `jq`, `openssl` in `$PATH` and a valid `VAULT_TOKEN`.
### Smoke test & deploy
| Target | Effect |
| --- | --- |
| `make ping_demo` | `ansible … -m ping` against all three demo inventories in sequence; failures of individual hosts do not abort (`\|\| true`) |
| `make deploy_site_demo_gymburgdorf` | `ansible-playbook playbooks/site.yml -i …/demo-gymburgdorf/hosts.yml --diff` |
| `make deploy_site_demo_mbazürich` | same for `demo-mbazürich`**without** `--diff` |
| `make deploy_site_demo_phbern` | same for `demo-phbern`**without** `--diff` |
| `make deploy_site_demo` | calls the three `deploy_site_demo_*` targets in sequence |
> **Inconsistency:** Only the Gymburgdorf target sets `--diff`. For
> `mbazürich` and `phbern` you do not see the task changes — if
> needed, invoke directly with `ansible-playbook … --diff`, see
> [ansible.md § Running Ansible](ansible.md#running-ansible).
### Token caveat (`make bao`)
`make bao` alone is **not** enough for a deploy: each `make` target
runs in its own shell, the `VAULT_TOKEN` set there only lives
during `make bao` itself and is already gone in the next `make deploy_…`.
Two working approaches:
```bash
# Variant A — log in manually in the active shell (survives multiple make invocations)
export BAO_ADDR=https://bao.digitalboard.ch
bao login -method=oidc -path=Digitalboard
export VAULT_TOKEN=$(bao print token)
make deploy_site_demo_gymburgdorf
# Variant B — chain both as ONE make invocation (token lives for the chain)
make bao deploy_site_demo_gymburgdorf
```
Login details and the secret pattern: [secrets.md](secrets.md#openbao-login).
## Known gaps and trade-offs
- **`opencloud` in `demo-gymburgdorf`:** Play present, but no
`opencloud_servers` group — runs as a no-op. If needed, add a group +
`host_vars`, see [inventories.md](inventories.md#walkthrough-creating-a-new-demo-tenant).
- **`turn` host:** defined in the DMZ, but no STUN/TURN role in
[playbooks/site.yml](../playbooks/site.yml) — provisioned only via `base` +
`traefik`.
- **Idempotency:** Roles are Docker-Compose-based; re-runs can
trigger container restarts when Compose inputs change. No
rollback mechanism — on failure, roll back manually.
- **TLS renewal:** handled internally by Traefik via ACME, no external
renew cron in the repo.
- **CI/testing:** currently not in the repo; smoke test via `make ping_demo`.
- **Logging:** `traefik_log_level: DEBUG` in `demo-gymburgdorf` and
`vagrant` (role default `INFO`) — reduce before adapting to prod.

99
docs/secrets.md Normal file
View file

@ -0,0 +1,99 @@
<!-- markdownlint-disable MD013 MD060 MD051 -->
# Secrets & security
[← Documentation index](README.md)
> This repo is explicitly intended for **demo setups**. All
> default values in the roles are insecure and are overridden in
> `demo-*` inventories via Bao lookups or host_vars.
## OpenBao login
A prerequisite is a WKS account with OIDC access to OpenBao and a
read policy on the inventory mount — see
[getting_started.md § Vorbedingungen](getting_started.md#prerequisites).
Before each deploy, authenticate in **the same shell** in which
`ansible-playbook` then runs:
```bash
export BAO_ADDR=https://bao.digitalboard.ch
bao login -method=oidc -path=Digitalboard
export VAULT_TOKEN=$(bao print token)
```
> ⚠️ `make bao` alone is **not** enough — every `make` target runs in
> a new shell, and the `VAULT_TOKEN` set there lives only during
> `make bao` itself. Either run the three commands above manually
> or chain `make bao deploy_site_demo_gymburgdorf` as **one** call
> — otherwise the deploy has no token.
## Secret pattern (Bao lookup)
Secrets are never stored in plaintext, but read from
OpenBao at runtime:
```yaml
# host_vars/.../<service>.yml — one lookup per service path,
# individual keys as properties:
_nextcloud: "{{ lookup('community.hashi_vault.hashi_vault',
vault_mount + '/data/nextcloud', url=vault_addr) }}"
nextcloud_admin_password: "{{ _nextcloud.admin_password }}"
nextcloud_postgres_password: "{{ _nextcloud.postgres_password }}"
```
- `vault_mount` and `vault_addr` come from
[group_vars/all/vault.yml](../inventories/demo-gymburgdorf/group_vars/all/vault.yml).
- KV-v2 paths need an explicit `/data/` in the path — Ansible does not
resolve this on its own.
- `vault_mount` is unique per inventory (`demo-gymburgdorf`,
`demo-phbern`, …) → tenant isolation in Bao via mount + policy.
Secrets are seeded idempotently with [scripts/bao-seed.sh](../scripts/bao-seed.sh) (or
`make seed_bao_<kunde>`): existing keys remain,
only missing ones are generated. OIDC client secrets are kept in sync between
`<mount>/data/authentik` and the respective service secret.
## Demo-only defaults — must be overridden
These defaults in `digitalboard.core` are insecure. In every
**production-grade** deployment they must be overridden via a Bao lookup or host_var:
| Variable | Default | Where to override |
| --- | --- | --- |
| `keycloak_admin_password` | `changeme` | host_vars `keycloak_servers` |
| `keycloak_postgres_password` | `changeme` | same as above |
| `authentik_secret_key` | `changeme-generate-a-random-string` | `host_vars/application/authentik.yml` |
| `authentik_postgres_password` | `changeme` | same as above |
| `nextcloud_admin_password` | `admin` | `host_vars/application/nextcloud.yml` |
| `nextcloud_postgres_password` | `changeme` | same as above |
| `nextcloud_s3_key` / `nextcloud_s3_secret` | `changeme` / `changeme` | same as above |
| `garage_webui_password` | `admin` | `host_vars/storage/garage.yml` |
| `garage_rpc_secret` | `0123…cdef` (64-hex constant) | same as above |
| `garage_admin_token` | identical to `rpc_secret` | same as above |
| `garage_metrics_token` | identical to `rpc_secret` | same as above |
> **Convention:** Every value above **must** have a Bao lookup in
> `demo-*/host_vars/.../...yml` before the
> inventory counts as deployable.
## Threat boundaries (status: demo)
| Boundary | Status | Note |
| --- | --- | --- |
| DMZ ↔ backend (172.16.9 ↔ 172.16.19) | **plaintext HTTP** | auth bearer, OIDC code, session cookies travel unencrypted. Demo-ok; prod: mTLS or WireGuard overlay. |
| Host firewall | **missing** | The `base` role installs no UFW/nftables. Segmentation depends on the hypervisor/VLAN. |
| SSH | `ansible_user: root` | No bastion, no jump host. Key distribution out-of-band. |
| Authentik SPOF | **accepted** | IdP and SP services on the same host (`application`). Authentik outage = login outage including LDAP outpost. No break-glass path. |
| ACME TSIG key | Bao lookup | One TSIG key per demo zone (`acme_update_key_demo_gymb`), zone-isolated. Rotation manual. |
| Backup/DR | **out-of-scope** | Garage `replication_factor: 1`, no Postgres backup job, no Bao snapshot cron. |
## Add for production adaptation
- Host FW (extend the `base` role or a dedicated `firewall` role).
- mTLS or WireGuard between DMZ and backend.
- Authentik on a separate host, with a recovery admin token.
- Bao policies per inventory mount (read-only for the deploy token,
write-only for the bootstrap job).
- Backup cron for Postgres + Garage + Bao.
- SSH bastion + key rotation.

93
docs/testing.md Normal file
View file

@ -0,0 +1,93 @@
<!-- markdownlint-disable MD013 -->
# Testing
[← Documentation index](README.md)
> **Status:** This repo contains **no** automated test suite
> and **no** CI pipeline. It is the inventory/configuration
> layer — the testable logic (roles) lives in
> [`digitalboard.core`](https://git.digitalboard.ch/Digitalboard/digitalboard.core).
> Role tests (Molecule or similar) therefore belong in the core repo, not
> here.
What is sensibly testable here are **inventory and playbook errors
before the actual deploy**. There are three levels for that — from fast and
risk-free to fully against the hosts.
## 1. Static checks (no host access needed)
No `VAULT_TOKEN`, no network access required:
```bash
# YAML syntax of all inventory files
yamllint inventories/
# Ansible best-practice lint over the playbook
ansible-lint playbooks/site.yml
# Playbook syntax + inventory parsing (does not yet resolve Bao lookups)
ansible-playbook playbooks/site.yml \
-i inventories/demo-gymburgdorf/hosts.yml --syntax-check
```
`yamllint` and `ansible-lint` are not preconfigured in the repo —
they run with their defaults. If project-wide rules are desired,
a `.yamllint`/`.ansible-lint` in the repo root would be the place for it
(see [Open items](#open-items)).
## 2. Inspect inventory resolution
Shows the effectively merged variables per host — useful for seeing
precedence surprises (group_vars vs. host_vars) before the deploy.
Bao lookups are evaluated here, so `VAULT_TOKEN` is needed (see
[secrets.md](secrets.md#openbao-login)):
```bash
# group/host structure as a tree
ansible-inventory -i inventories/demo-gymburgdorf/hosts.yml --graph
# all vars of a host (merged)
ansible-inventory -i inventories/demo-gymburgdorf/hosts.yml --host application
```
## 3. Smoke test & dry run (against the hosts)
Requires SSH access and `VAULT_TOKEN` — for prerequisites see
[getting_started.md](getting_started.md#prerequisites):
```bash
# reachability of all demo hosts (ping module)
make ping_demo
# dry run: shows what WOULD change, without writing
ansible-playbook playbooks/site.yml \
-i inventories/demo-gymburgdorf/hosts.yml --check --diff
```
> `--check` is only of limited value with the Docker-Compose-based roles:
> some tasks report "changed" because they only know the real
> container state at runtime. As a plausibility check
> (does the playbook run through, are the vars correct?) it is still
> useful. More on the invocation variants:
> [ansible.md § Running Ansible](ansible.md#running-ansible).
## Recommended pre-deploy workflow
```bash
yamllint inventories/ && ansible-lint playbooks/site.yml # 1. static
ansible-playbook playbooks/site.yml -i <inv>/hosts.yml --syntax-check
make ping_demo # 2. reachability
ansible-playbook playbooks/site.yml -i <inv>/hosts.yml --check --diff # 3. dry run
# only then: real deploy
```
## Open items
- **No CI** — lint/syntax checks run only manually. A Gitea/
CI workflow that runs the static checks from level 1 on every push
would be the next step.
- **No lint configuration**`yamllint`/`ansible-lint` run with
defaults; project-wide rules (`.yamllint`, `.ansible-lint`) are missing.
- **Role tests** — belong in
[`digitalboard.core`](https://git.digitalboard.ch/Digitalboard/digitalboard.core),
not in this repo.