diff --git a/README.md b/README.md
index 6a5b32e..d49b004 100644
--- a/README.md
+++ b/README.md
@@ -1,35 +1,103 @@
-# π Documentation Repository
+
+# π Digitalboard Documentation
-This repository contains documentation, guides, and reference material.
+Welcome β this repository is the **central documentation hub** for the
+Digitalboard platform. It collects architecture notes, operational
+runbooks, integration guides, and troubleshooting recipes that span
+multiple repositories, so they have one stable home instead of being
+scattered across READMEs.
-## π Available Documentation
+## ποΈ Platform at a glance
-- **[Contribution guidelines](./contributing/)**
- Documentation and guides related to infrastructure configuration and best practices.
- - [Git](./contributing/git.md)
- Guidelines for contributing using git
+```mermaid
+flowchart LR
+ classDef docs fill:#dbeafe,stroke:#1e40af,color:#000
+ classDef core fill:#dcfce7,stroke:#166534,color:#000
+ classDef ans fill:#fef3c7,stroke:#92400e,color:#000
+ classDef ext fill:#e9d5ff,stroke:#6b21a8,color:#000
-- **[Infrastructure](./infrastructure/)**
- Documentation and guides related to infrastructure configuration and best practices.
- - [ACME](./infrastructure/acme.md)
- Documentation of the ACME concept.
- - [IPV6](./infrastructure/ipv6.md)
- Documentation of the ipv6 concept.
+ User((Operator / Engineer))
-- **[Keycloak](./keycloak/)**
- Documentation and guides related to Keycloak configuration and best practices.
- - [Enforce OTP 2FA for Internal Users](./keycloak/enforce-otp-internal.md)
- Step-by-step instructions for enforcing OTP-based two-factor authentication for internal users, while excluding external Microsoft Entra users.
- - [Integrate MS Entra in Keycloak as IDP](./keycloak/idp-ms-entra.md)
- Step-by-step instructions for integrating MS Entra as identity-provider.
+ subgraph REPOS["Digitalboard repositories"]
+ DOCS["docs
π architecture, runbooks,
integration guides
(this repo)"]:::docs
+ CORE["digitalboard.core
βοΈ Ansible collection
= all roles
(traefik, authentik, nextcloud,
garage, keycloak, β¦)"]:::core
+ REF["reference-ansible
π inventories + playbooks
(demo-gymburgdorf,
demo-phbern, demo-mbazΓΌrich,
vagrant)"]:::ans
+ end
-- **[Microsoft Entra](./ms-entra/)**
- Documentation and guides related to Microsft Entra configuration and best practices.
- - [Enterprise App Integration with Keycloak](./ms-entra/enterprise-app-keycloak.md)
- Step-by-step instructions for creating an Enterprise Application in Microsoft Entra (Azure AD) as an identity provider for Keycloak.
+ subgraph PLATFORM["Runtime targets"]
+ BAO["OpenBao
bao.digitalboard.ch
(secrets)"]:::ext
+ DNS["Knot DNS
ns1.digitalboard.ch
(ACME / split-horizon)"]:::ext
+ HOSTS["Tenant VMs
(reverseproxy Β· application Β·
storage Β· turn)"]:::ext
+ end
-- **[Troubleshooting](./troubleshooting/)**
- Encountered & solved problems.
- - [Nextcloud File Locking](./troubleshooting/nextcloud-file-locking.md)
- Preventing sync conflicts when multiple users edit the same file via the Nextcloud desktop client.
+ User -->|reads| DOCS
+ User -->|runs `make deploy_β¦`| REF
+ REF -->|requires| CORE
+ REF -.->|hashi_vault lookups.-> BAO
+ REF -->|ansible-playbook| HOSTS
+ HOSTS -.->|nsupdate TSIG / ACME DNS-01.-> DNS
+ HOSTS -.->|hashi_vault lookups.-> BAO
+ DOCS -.documents.-> REF
+ DOCS -.documents.-> CORE
+```
+**The three repos at a glance:**
+
+| Repo | Role | Link |
+|---|---|---|
+| **`docs`** *(here)* | Architecture, integration guides, runbooks, troubleshooting. The "why" and the "how it fits together." | [git.digitalboard.ch/Digitalboard/docs](https://git.digitalboard.ch/Digitalboard/docs) |
+| **`digitalboard.core`** | Ansible collection β every reusable role (Traefik, Authentik, Keycloak, Nextcloud, Garage, β¦). The "what runs on a host." | [git.digitalboard.ch/Digitalboard/digitalboard.core](https://git.digitalboard.ch/Digitalboard/digitalboard.core) |
+| **`reference-ansible`** | Inventories + playbooks for the demo tenants and the `vagrant` test setup. The "what gets deployed where, with which variables." | [git.digitalboard.ch/Digitalboard/reference-ansible](https://git.digitalboard.ch/Digitalboard/reference-ansible) |
+
+> π **Want to deploy something?** Start in
+> [`reference-ansible`](https://git.digitalboard.ch/Digitalboard/reference-ansible) β
+> its README covers the Bao login, the `make` targets, and the available
+> playbooks. Come back here for the architectural background
+> ([architecture/](./architecture/)) or for solved problems
+> ([troubleshooting/](./troubleshooting/)).
+
+## π Contents
+
+- **[Architecture](./architecture/)** β How the `reference-ansible`
+ deployment is structured, using `demo-gymburgdorf` as the running example.
+ - [Index & glossary](./architecture/README.md)
+ - [Setup and repo layout](./architecture/setup.md) β control-node prerequisites, Bao login workflow
+ - [Variables](./architecture/variables.md) β Ansible variable hierarchy and cheatsheet
+ - [Topology](./architecture/topology.md) β Inventory groups, service layout per host
+ - [DNS and ACME](./architecture/dns.md) β Knot zones, TSIG/ACL model, split-horizon FQDNs
+ - [Deploy](./architecture/deploy.md) β Play sequence, Traefik DMZ vs. backend modes
+ - [Security](./architecture/security.md) β Bao lookup pattern, demo-only defaults, production hardening
+ - [Operations](./architecture/operations.md) β New-tenant walkthrough, known gaps
+
+- **[Contributing](./contributing/)** β Conventions for collaborating on this codebase.
+ - [Git](./contributing/git.md) β Guidelines for contributing using git
+
+- **[Infrastructure](./infrastructure/)** β Infrastructure-level concepts that apply across services.
+ - [ACME](./infrastructure/acme.md) β Documentation of the ACME concept
+ - [IPv6](./infrastructure/ipv6.md) β Documentation of the IPv6 concept
+
+- **[Keycloak](./keycloak/)** β Keycloak configuration and best practices.
+ - [Account Linking](./keycloak/account-linking.md) β How to link existing accounts to a federated identity
+ - [Enforce OTP 2FA for Internal Users](./keycloak/enforce-otp-internal.md) β OTP-based 2FA for internal users, excluding external MS Entra users
+ - [Integrate MS Entra in Keycloak as IDP](./keycloak/idp-ms-entra.md) β MS Entra as identity provider
+
+- **[Microsoft Entra](./ms-entra/)** β Microsoft Entra configuration and best practices.
+ - [Enterprise App Integration with Keycloak](./ms-entra/enterprise-app-keycloak.md) β Enterprise App in MS Entra (Azure AD) as IDP for Keycloak
+
+- **[Troubleshooting](./troubleshooting/)** β Encountered & solved problems.
+ - [Nextcloud File Locking](./troubleshooting/nextcloud-file-locking.md) β Preventing sync conflicts when multiple users edit the same file via the Nextcloud desktop client
+
+## π§ Where to look
+
+| If you want to⦠| Go to |
+|---|---|
+| Understand how a tenant is wired up | [architecture/topology.md](./architecture/topology.md) |
+| Set up a new demo tenant | [architecture/operations.md](./architecture/operations.md) |
+| Look up a variable's correct home | [architecture/variables.md](./architecture/variables.md) |
+| Understand why two ACME models coexist | [architecture/dns.md](./architecture/dns.md) |
+| Plug an identity provider into Keycloak | [keycloak/](./keycloak/) |
+| Solve a recurring runtime issue | [troubleshooting/](./troubleshooting/) |
+
+---
+
+π Contributions follow the guidelines in [contributing/git.md](./contributing/git.md).
diff --git a/architecture/README.md b/architecture/README.md
new file mode 100644
index 0000000..df6317f
--- /dev/null
+++ b/architecture/README.md
@@ -0,0 +1,44 @@
+
+# Architecture β `reference-ansible`
+
+This documentation describes the architecture of the `reference-ansible`
+repository and uses the inventory `inventories/demo-gymburgdorf/` as a
+running example. It serves both as onboarding documentation for new
+engineers and as a reference when setting up additional demo tenants.
+
+> **Demo-only.** All defaults in the roles (passwords, tokens, RPC
+> secrets) are insecure and intended exclusively for demo setups. See
+> [security.md](security.md).
+
+**Last updated:** 2026-05-26 Β· **Owner:** @sbaerlocher
+
+## Contents
+
+| Section | File | Topics |
+|---|---|---|
+| Setup and repo layout | [setup.md](setup.md) | Repo layout, role provenance, control-node prerequisites, Bao login workflow |
+| Variables | [variables.md](variables.md) | Ansible variable hierarchy, variable cheatsheet |
+| Topology | [topology.md](topology.md) | Inventory groups, service layout per host, variable placement |
+| DNS and ACME | [dns.md](dns.md) | Knot zones, NS-delegated vs. ACL-isolated ACME models, split-horizon FQDNs, TSIG/ACL |
+| Deploy | [deploy.md](deploy.md) | Play sequence, Traefik DMZ/backend modes |
+| Security | [security.md](security.md) | Bao lookup pattern, demo-only defaults, threat boundaries, production hardening |
+| Operations | [operations.md](operations.md) | New-tenant walkthrough, known gaps and trade-offs |
+
+## Glossary
+
+| Term | Meaning |
+|---|---|
+| **OpenBao** | HashiCorp Vault fork. Single source of truth for secrets. Endpoint: `bao.digitalboard.ch`. |
+| **Authentik** | Identity provider. Issues OIDC for SP services and LDAP via the Outpost. |
+| **Outpost (Authentik)** | Separate Authentik sidecar that emulates LDAP/proxy protocols for legacy apps. Talks to Authentik via RPC + token. |
+| **WOPI** | Web Application Open Platform Interface β protocol used by Nextcloud/Opencloud to hand office documents to Collabora. |
+| **TSIG / RFC2136** | Authenticated DNS updates. Traefik uses TSIG-signed `nsupdate` calls for ACME DNS-01 challenges. |
+| **DNS-01 (ACME)** | Let's Encrypt challenge type: certificate ownership is proven via a TXT record in DNS instead of HTTP. Required for wildcard certs. |
+| **CNAME bridge** | `_acme-challenge.` points via CNAME into a dedicated update label (`.demo-gymb._acme.digitalboard.ch`), keeping the TSIG key scoped to a narrow sub-tree. See [dns.md](dns.md). |
+| **Knot DNS** | Authoritative DNS server used on `ns1.digitalboard.ch`. Config and zone files live in the separate [`dns-zones`](https://git.digitalboard.ch/Digitalboard/dns-zones) repo. |
+| **DNSSEC** | Zones are signed with Ed25519, NSEC3 (no opt-out), KSK 1y / ZSK 90d rollovers, CDS/CDNSKEY published for automatic DS at the parent. |
+| **Split horizon** | Two FQDN families per service: public `.gymb.souveredu.ch` β DMZ Traefik front-end IP, internal `.int.gymb.souveredu.ch` β directly the backend host. See [dns.md](dns.md). |
+| **File provider / Docker provider** | Traefik configuration sources. The file provider reads static YAML; the Docker provider reads container labels via `/var/run/docker.sock`. |
+| **STUN/TURN** | NAT-traversal protocols for WebRTC (e.g. for Nextcloud Talk). Runs on a separate host (`turn`). |
+| **Garage** | S3-compatible object store (Rust). Backend for Nextcloud/Opencloud. |
+| **FQCN** | Fully Qualified Collection Name, e.g. `digitalboard.core.traefik`. Mandatory in Ansible since 2.10. |
diff --git a/architecture/deploy.md b/architecture/deploy.md
new file mode 100644
index 0000000..13f1ab8
--- /dev/null
+++ b/architecture/deploy.md
@@ -0,0 +1,74 @@
+
+# Deploy flow and Traefik modes
+
+β Back to [Architecture index](README.md)
+
+## 6. Deploy flow
+
+Sequence taken from [playbooks/site.yml](https://git.digitalboard.ch/Digitalboard/reference-ansible/src/branch/main/playbooks/site.yml):
+
+```mermaid
+sequenceDiagram
+ participant U as User
+ participant A as ansible-playbook
+ participant V as OpenBao
+ participant H as Hosts
+
+ U->>U: bao login + export VAULT_TOKEN
+ U->>A: make deploy_site_demo_gymburgdorf
+ A->>A: load vars: role defaults β group_vars/all β group_vars/<groups> β host_vars/<host>
+ A->>V: community.hashi_vault lookups
(acme-tsig, service secrets)
+ V-->>A: secret values
+ A->>H: Play 1 β base (all hosts)
+ A->>H: Play 2 β traefik (all hosts: dmz on reverseproxy, backend elsewhere)
+ A->>H: Play 3 β httpbin
+ A->>H: Play 4 β 389ds
+ A->>H: Play 5 β keycloak
+ A->>H: Play 6 β garage (storage)
+ A->>H: Play 7 β collabora (application)
+ A->>H: Play 8 β authentik (application)
+ A->>H: Play 9 β authentik_outpost_ldap (application)
+ A->>H: Play 10 β nextcloud (application)
+ A->>H: Play 11 β drawio (application)
+ A->>H: Play 12 β send
+ A->>H: Play 13 β opnform
+ A->>H: Play 14 β homarr
+ A->>H: Play 15 β bookstack
+ A->>H: Play 16 β opencloud
+```
+
+Plays without matching group members (`httpbin_servers`,
+`ds389_servers`, `keycloak_servers`, `send_servers`,
+`opnform_servers`, `homarr_servers`, `bookstack_servers`,
+`opencloud_servers` in this inventory) run as no-ops.
+
+> **Role-name spelling traps:** the LDAP role is `389ds` (not
+> `ds389`); the forms role is `opnform` (not `openforms`/`openform`).
+> Inventory groups must match the names used in
+> [playbooks/site.yml](https://git.digitalboard.ch/Digitalboard/reference-ansible/src/branch/main/playbooks/site.yml) exactly β
+> `ds389_servers`, `opnform_servers`.
+
+`--diff` is enabled in the target β per-task changes are visible.
+
+## 7. Traefik modes (DMZ vs Backend)
+
+**`traefik_mode: dmz`** β public-facing reverse proxy on `reverseproxy`:
+
+- **File provider** with `services.yml` for static routing.
+- No Docker socket mounted, no local containers.
+- Routes to `backend_host` addresses on other machines.
+- Backends are declared via `traefik_dmz_exposed_services` (a list in
+ `host_vars/reverseproxy/`). Selective backend selection is also
+ possible via `traefik_backend_servers_to_proxy`.
+
+**`traefik_mode: backend`** β application/storage:
+
+- Mounts `/var/run/docker.sock`.
+- **Docker provider**: auto-discovery via container labels
+ (`traefik.enable=true`).
+- Services are exposed locally; the DMZ Traefik routes external
+ traffic to them in plaintext HTTP (see
+ [security.md](security.md)).
+
+**Both modes** support ACME via RFC2136 DNS challenge or self-signed
+(`traefik_cert_mode: acme | selfsigned`).
diff --git a/architecture/dns.md b/architecture/dns.md
new file mode 100644
index 0000000..226dd5c
--- /dev/null
+++ b/architecture/dns.md
@@ -0,0 +1,123 @@
+
+# DNS topology and ACME zone layout
+
+β Back to [Architecture index](README.md)
+
+Authoritative DNS for everything described in this document runs on
+**`ns1.digitalboard.ch`** (public `193.43.183.169`, DMZ `172.16.9.169`)
+using **Knot DNS**. The zone files and Knot config live in the
+[`dns-zones`](https://git.digitalboard.ch/Digitalboard/dns-zones) repo; this section explains how the
+public service FQDNs, the internal "split-horizon" FQDNs, and the ACME
+challenge sub-trees fit together.
+
+## Authoritative zones on `ns1`
+
+| Zone | Purpose | DNSSEC | Dynamic updates |
+|---|---|---|---|
+| `digitalboard.ch` | Production zone for the platform itself (`auth`, `cloud`, `office`, `bao`, β¦). | on | none (static zone file) |
+| `_acme.digitalboard.ch` | Parent zone for ACME challenge labels. | on | yes, per-tenant TSIG ACLs (`demo-gymb`, `demo-phbe`, `demo-mbaz`) |
+| `digitalboard._acme.digitalboard.ch` | **Delegated** child zone for `digitalboard.ch` ACME updates only. | off | yes, TSIG `acme_update_key_digitalboard` |
+| `souveredu.ch` | Demo-tenant zone (`gymb`, `phbe`, `mbaz` sub-labels). | on | none (static zone file) |
+| `demo-schulen.ch` | Reserve / unused so far. | on | none |
+
+> **Two different ACME models live here.** This is the most common
+> source of confusion when copying a tenant:
+>
+> - `digitalboard.ch` uses a **NS-delegated child zone**
+> (`digitalboard._acme.digitalboard.ch.` has its own `NS` record in
+> `_acme.digitalboard.ch`). The TSIG key writes into that delegated
+> zone.
+> - The demo tenants (`demo-gymb`, `demo-phbe`, `demo-mbaz`) **share
+> the parent zone** `_acme.digitalboard.ch` and are isolated only
+> by **Knot ACL `update-owner-name`** on the per-tenant sub-tree
+> (`demo-gymb._acme.digitalboard.ch.` and below). There is no NS
+> delegation for them.
+>
+> Both work for the ACME flow; the demo model is cheaper to manage but
+> means tenant isolation depends on Knot ACLs, not zone boundaries.
+
+## Naming pattern for `demo-gymb` (template for new tenants)
+
+```text
+Public, browser-facing:
+ cloud.gymb.souveredu.ch CNAME β rvp.gymb.souveredu.ch (193.43.183.131)
+ auth.gymb.souveredu.ch CNAME β rvp.gymb.souveredu.ch
+ office.gymb.souveredu.ch CNAME β rvp.gymb.souveredu.ch
+ s3.gymb.souveredu.ch CNAME β rvp.gymb.souveredu.ch
+ ...
+
+Internal, server-to-server (split horizon):
+ cloud.int.gymb.souveredu.ch A β 172.16.19.101 (application host)
+ auth.int.gymb.souveredu.ch A β 172.16.19.101
+ office.int.gymb.souveredu.ch A β 172.16.19.101
+ s3.int.gymb.souveredu.ch A β 172.16.19.102 (storage host)
+ ...
+
+Tenant entry IPs:
+ rvp.gymb.souveredu.ch A β 193.43.183.131 (DMZ Traefik public)
+ reverseproxy.int.gymb A β 172.16.9.111 (DMZ Traefik internal)
+
+ACME challenge labels (writeable via TSIG acme_update_key_demo_gymb):
+ _acme-challenge.cloud.gymb CNAME β cloud.demo-gymb._acme.digitalboard.ch
+ _acme-challenge.cloud.int.gymb CNAME β cloud.int.demo-gymb._acme.digitalboard.ch
+ ...
+```
+
+The `.int.` family is what makes Nextcloud β Garage, Nextcloud β
+Authentik (OIDC), Nextcloud β Collabora (WOPI) etc. **bypass the DMZ
+Traefik**: the backend host's local Traefik presents the right cert
+directly, so traffic stays on the backend subnet. Without this,
+server-to-server calls would either ride out through the DMZ and back
+in, or hit a hostname mismatch on the cert.
+
+## TSIG / ACL model
+
+```mermaid
+flowchart LR
+ classDef tenant fill:#dcfce7,stroke:#166534,color:#000
+ classDef zone fill:#dbeafe,stroke:#1e40af,color:#000
+ classDef acl fill:#fef3c7,stroke:#92400e,color:#000
+
+ subgraph KNOT["ns1.digitalboard.ch (Knot DNS)"]
+ Z1["_acme.digitalboard.ch
(parent zone)"]:::zone
+ Z2["digitalboard._acme.digitalboard.ch
(NS-delegated child)"]:::zone
+ A1["ACL acme_updates_digitalboard
scope: digitalboard._acme.digitalboard.ch."]:::acl
+ A2["ACL acme_updates_demo_gymb
scope: demo-gymb._acme.digitalboard.ch."]:::acl
+ A3["ACL acme_updates_demo_phbe
scope: demo-phbe._acme.digitalboard.ch."]:::acl
+ A4["ACL acme_updates_demo_mbaz
scope: demo-mbaz._acme.digitalboard.ch."]:::acl
+ end
+
+ DB["digitalboard.ch Traefik
TSIG: acme_update_key_digitalboard"]:::tenant
+ GY["demo-gymb Traefik
TSIG: acme_update_key_demo_gymb"]:::tenant
+ PH["demo-phbe Traefik
TSIG: acme_update_key_demo_phbe"]:::tenant
+ MB["demo-mbaz Traefik
TSIG: acme_update_key_demo_mbaz"]:::tenant
+
+ DB -- nsupdate TXT --> A1
+ GY -- nsupdate TXT --> A2
+ PH -- nsupdate TXT --> A3
+ MB -- nsupdate TXT --> A4
+ A1 -- writes into --> Z2
+ A2 -- writes into --> Z1
+ A3 -- writes into --> Z1
+ A4 -- writes into --> Z1
+```
+
+Each ACL is restricted to **`update-type: TXT`** and
+**`update-owner-match: sub-or-equal`** under the tenant prefix, so a
+leaked tenant key cannot write outside its own ACME sub-tree and cannot
+modify non-TXT records (no A/CNAME/NS hijack).
+
+## Traefik variables that bind to this layout
+
+From `inventories/demo-gymburgdorf/group_vars/traefik_servers/traefik.yml`:
+
+| Traefik variable | Value for `demo-gymb` | Bound to |
+|---|---|---|
+| `traefik_acme_dns_provider` | `rfc2136` | Knot dynamic-update endpoint |
+| `traefik_acme_dns_zone` | `demo-gymb._acme.digitalboard.ch` | Per-tenant write scope on `ns1` |
+| `traefik_acme_tsig_key_name` | `acme_update_key_demo_gymb` | Matches `key:` entry in [`knot.conf`](https://git.digitalboard.ch/Digitalboard/dns-zones/src/branch/main/knot/knot.conf) |
+| `traefik_acme_tsig_secret` | Bao lookup | See [security.md](security.md) |
+
+A tenant whose ACME zone does **not** match the Knot ACL
+`update-owner-name` will get `REFUSED` on `nsupdate` and ACME issuance
+will silently retry until the renewal window expires.
diff --git a/architecture/operations.md b/architecture/operations.md
new file mode 100644
index 0000000..58ce8b9
--- /dev/null
+++ b/architecture/operations.md
@@ -0,0 +1,99 @@
+
+# Operations β new tenants and known gaps
+
+β Back to [Architecture index](README.md)
+
+## 10. Walkthrough: creating a new demo tenant
+
+Recommended template: **`demo-gymburgdorf`** (not `vagrant`, since its
+group topology is incompatible).
+
+1. **Copy the inventory:**
+
+ ```bash
+ cp -r inventories/demo-gymburgdorf inventories/demo-
+ ```
+
+2. **Adjust `hosts.yml`:** IPs and hostnames per host.
+
+3. **`group_vars/all/vault.yml`** β point `vault_mount` at the new
+ tenant mount (`demo-`).
+
+4. **`group_vars/traefik_servers/traefik.yml`** β bend
+ `traefik_acme_dns_zone` and the `traefik_acme_tsig_*` lookup paths
+ to the new zone / new Bao path.
+
+5. **`host_vars/application/*.yml`** and
+ **`host_vars/storage/*.yml`** β walk through them: FQDNs to the new
+ domain pattern (e.g. `*..souveredu.ch`), Bao lookup paths
+ to `demo-/data/β¦`.
+
+6. **Prepare OpenBao** (out-of-band, not via Ansible):
+ - Create a new KV-v2 mount `demo-`.
+ - Write secrets: `acme-tsig`, `authentik`, `nextcloud`, `garage`, β¦
+ (see [security.md](security.md) for the mandatory-override list).
+ - Policy for the deploy token: read on `demo-/data/*`.
+
+7. **DNS** (in the [`dns-zones`](https://git.digitalboard.ch/Digitalboard/dns-zones) repo, see
+ [dns.md](dns.md)):
+ - Add `key:` and `acl:` entries for the new tenant in
+ [`knot/knot.conf`](https://git.digitalboard.ch/Digitalboard/dns-zones/src/branch/main/knot/knot.conf), pattern
+ `acme_update_key_demo_` /
+ `acme_updates_demo_` scoped to
+ `demo-._acme.digitalboard.ch.`.
+ - Append the new ACL to the `_acme.digitalboard.ch` zone's `acl:`
+ list β the tenants share the parent zone, no NS delegation.
+ - In `zones/souveredu.ch.zone` (or the tenant's public zone) add
+ the public/internal A records (`rvp.`,
+ `reverseproxy.int.`, `application.int.`,
+ `storage.int.`, β¦), the service CNAMEs to
+ `rvp.`, and the `_acme-challenge.*` CNAMEs into
+ `demo-._acme.digitalboard.ch`. Bump the SOA serial.
+ - `make deploy_ns1` to push.
+
+8. **Makefile** β add a new target modelled on
+ `deploy_site_demo_gymburgdorf` and wire it into
+ `deploy_site_demo`.
+
+9. **Smoke test:**
+ `ansible all -i inventories/demo-/hosts.yml -m ping`.
+
+10. **Deploy:** Bao login + `make deploy_site_demo_`.
+
+## 11. Known gaps and trade-offs
+
+- **Optional services without group bindings in `demo-gymburgdorf`:**
+ `opencloud`, `send`, `opnform`, `homarr`, and `bookstack` are
+ declared as plays in
+ [playbooks/site.yml](https://git.digitalboard.ch/Digitalboard/reference-ansible/src/branch/main/playbooks/site.yml) but have no
+ `_servers` group in the inventory β those plays run as
+ no-ops. If needed, add the group + `host_vars/application/.yml`
+ as described in [topology.md](topology.md). Mind spelling:
+ `opnform_servers` (not `openform`/`openforms`).
+- **`turn` host:** defined in the DMZ, but no STUN/TURN role in
+ [playbooks/site.yml](https://git.digitalboard.ch/Digitalboard/reference-ansible/src/branch/main/playbooks/site.yml). Currently provisioned only
+ via `base` + `traefik`.
+- **Idempotency:** roles are Docker-Compose-based; re-runs may trigger
+ container restarts when compose inputs change. There is no dedicated
+ rollback mechanism β on failure, roll back manually to the previous
+ state.
+- **TLS renewal:** handled internally by Traefik via ACME. There is no
+ external renewal cron in the repo.
+- **CI / testing:** not present in the repo. Smoke test is
+ `make ping_demo`.
+- **Logs:** Traefik runs with `traefik_log_level: DEBUG` in
+ `demo-gymburgdorf` and `vagrant` (role default is `INFO`) β reduce
+ to `INFO` or `WARN` before adapting for production.
+- **TSIG secrets in `knot.conf`:** the `dns-zones` repo currently
+ stores all four ACME TSIG keys in plaintext in
+ [`knot/knot.conf`](https://git.digitalboard.ch/Digitalboard/dns-zones/src/branch/main/knot/knot.conf). The Ansible
+ side reads them from Bao, but the Knot side does not β anyone with
+ read on the `dns-zones` repo can write TXT records under the
+ matching tenant's ACME sub-tree. For prod, source the Knot keys
+ from a templated config + secret store, or restrict repo access.
+- **Demo tenants share `_acme.digitalboard.ch`:** isolation is by
+ Knot ACL `update-owner-name`, not by zone delegation. A mis-edit
+ of the ACL list could break ACL-based isolation without breaking
+ DNS resolution β failure is silent. The production zone
+ (`digitalboard.ch`) uses a properly delegated child zone and is
+ not affected.
diff --git a/architecture/security.md b/architecture/security.md
new file mode 100644
index 0000000..64dc6ec
--- /dev/null
+++ b/architecture/security.md
@@ -0,0 +1,71 @@
+
+# Security and demo-only defaults
+
+β Back to [Architecture index](README.md)
+
+> This repo is explicitly designed for **demo setups**. All default
+> values in the roles are insecure and are overridden in `demo-*`
+> inventories via Bao lookups or host_vars. For production deployments
+> the hardening block further down also applies.
+
+## Secret pattern (Bao lookup)
+
+```yaml
+# group_vars/.../.yml or host_vars/.../.yml
+authentik_secret_key: "{{ lookup('community.hashi_vault.hashi_vault',
+ vault_mount + '/data/authentik:secret_key',
+ url=vault_addr) }}"
+```
+
+- `vault_mount` and `vault_addr` come from
+ [group_vars/all/vault.yml](https://git.digitalboard.ch/Digitalboard/reference-ansible/src/branch/main/inventories/demo-gymburgdorf/group_vars/all/vault.yml).
+- KV-v2 paths require an explicit `/data/` segment β Ansible does not
+ resolve this automatically.
+- `vault_mount` is unique per inventory (`demo-gymburgdorf`,
+ `demo-phbern`, β¦) β tenant isolation in Bao via mount + policy.
+
+## Demo-only defaults β override required
+
+These defaults in `digitalboard.core` are insecure. In any
+**production-grade** deployment they must be overridden via Bao lookup
+or host_var:
+
+| Variable | Default | Where to override |
+|---|---|---|
+| `keycloak_admin_password` | `changeme` | host_vars `keycloak_servers` |
+| `keycloak_postgres_password` | `changeme` | same |
+| `authentik_secret_key` | `changeme-generate-a-random-string` | `host_vars/application/authentik.yml` |
+| `authentik_postgres_password` | `changeme` | same |
+| `nextcloud_admin_password` | `admin` | `host_vars/application/nextcloud.yml` |
+| `nextcloud_postgres_password` | `changeme` | same |
+| `nextcloud_s3_key` / `nextcloud_s3_secret` | `changeme` / `changeme` | same |
+| `garage_webui_password` | `admin` | `host_vars/storage/garage.yml` |
+| `garage_rpc_secret` | `0123β¦cdef` (64-hex constant) | same |
+| `garage_admin_token` | identical to `rpc_secret` | same |
+| `garage_metrics_token` | identical to `rpc_secret` | same |
+
+> **Convention:** every value listed above **must** have a Bao lookup
+> in `demo-*/host_vars/.../...yml` before the inventory is considered
+> deploy-ready.
+
+## Threat boundaries (current demo state)
+
+| Boundary | Status | Notes |
+|---|---|---|
+| DMZ β Backend (172.16.9 β 172.16.19) | **Plaintext HTTP** | Auth bearers, OIDC codes, session cookies travel unencrypted. Fine for demo; for prod use mTLS or a WireGuard overlay. |
+| Host firewall | **missing** | The `base` role does not install UFW/nftables. Segmentation relies on the hypervisor/VLAN. |
+| SSH | `ansible_user: root` | No bastion, no jump host. Key distribution out-of-band. |
+| Authentik SPOF | **accepted** | IdP and SP services share the same host (`application`). An Authentik outage means a login outage including the LDAP outpost. No break-glass path. |
+| ACME TSIG key | Bao lookup (in Ansible), **plaintext in [`knot.conf`](https://git.digitalboard.ch/Digitalboard/dns-zones/src/branch/main/knot/knot.conf)** on `ns1` side | One TSIG key per demo tenant, scoped via Knot ACL `update-owner-name` to the tenant's ACME sub-tree. Rotation is manual and must be done on both sides simultaneously (Bao + `knot.conf` + `knotc zone-reload`). |
+| Backup / DR | **out of scope** | Garage `replication_factor: 1` (default), no Postgres backup job, no Bao snapshot cron. |
+
+## To adapt for production, add
+
+- Host firewall (extend the `base` role or add a dedicated `firewall`
+ role).
+- mTLS or WireGuard between DMZ and backend.
+- Authentik on a separate host with a recovery admin token.
+- Bao policies per inventory mount (read-only for the deploy token,
+ write-only for the bootstrap job).
+- Backup cron for Postgres + Garage + Bao.
+- SSH bastion + key rotation.
diff --git a/architecture/setup.md b/architecture/setup.md
new file mode 100644
index 0000000..0c75d18
--- /dev/null
+++ b/architecture/setup.md
@@ -0,0 +1,68 @@
+
+# Setup and repo layout
+
+β Back to [Architecture index](README.md)
+
+## 1. Repo layout and role provenance
+
+```text
+reference-ansible/
+βββ Makefile # Deploy targets, OIDC login, OBJC fork workaround
+βββ ansible.cfg # collections_path, remote_user=root, hashi_vault auth_method=token
+βββ requirements.yml # community.hashi_vault + digitalboard.core (Git)
+βββ playbooks/site.yml # Play sequence (14 plays, see deploy.md)
+βββ collections/ # β installed by `make install`, gitignored
+β βββ ansible_collections/
+β βββ digitalboard/core/
+β βββ roles/ # π Roles live HERE, NOT in the repo root
+βββ inventories/
+ βββ demo-gymburgdorf/ # Inventory used throughout this document
+ βββ demo-mbazΓΌrich/
+ βββ demo-phbern/
+ βββ vagrant/ # Local test inventory with its own topology
+```
+
+> **Important:** There is **no** `roles/` directory at the repo root.
+> All roles come from the `digitalboard.core` collection (see
+> [requirements.yml](https://git.digitalboard.ch/Digitalboard/reference-ansible/src/branch/main/requirements.yml)), installed via `make install`
+> into `./collections/`. Plays reference them by FQCN
+> `digitalboard.core.`.
+
+## 2. Setup and prerequisites
+
+**Tools on the control node:**
+
+- `ansible` (Core β₯ 2.15)
+- `bao` CLI (OpenBao) β e.g. `sudo pacman -S openbao python-hvac` (Arch) or Homebrew
+- `python-hvac` (for `community.hashi_vault` lookups)
+- On macOS: `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` (set in the
+ [Makefile](https://git.digitalboard.ch/Digitalboard/reference-ansible/src/branch/main/Makefile); without it Ansible forks crash on Bao lookups)
+
+**Initial setup:**
+
+```bash
+git clone
+cd reference-ansible
+make install # Galaxy + digitalboard.core into ./collections/
+```
+
+**Before every deploy:** Bao login in the **same shell** that will then
+run `ansible-playbook`:
+
+```bash
+export BAO_ADDR=https://bao.digitalboard.ch
+bao login -method=oidc -path=Digitalboard
+export VAULT_TOKEN=$(bao print token)
+```
+
+> β οΈ `make bao` on its own is **not enough**: every `make` target spawns
+> a new shell, and the `VAULT_TOKEN` exported in there only lives for
+> the duration of `make bao` itself. Either run the three commands
+> above manually, or invoke `make bao deploy_site_demo_gymburgdorf` as
+> **one** call β otherwise the deploy has no token.
+
+**Smoke test:**
+
+```bash
+make ping_demo # pings all three demo inventories
+```
diff --git a/architecture/topology.md b/architecture/topology.md
new file mode 100644
index 0000000..c884c4d
--- /dev/null
+++ b/architecture/topology.md
@@ -0,0 +1,110 @@
+
+# Topology β inventory and services
+
+β Back to [Architecture index](README.md)
+
+## 4. Inventory topology (`demo-gymburgdorf`)
+
+```mermaid
+flowchart LR
+ classDef dmz fill:#fee2e2,stroke:#991b1b,color:#000
+ classDef app fill:#dcfce7,stroke:#166534,color:#000
+ classDef stor fill:#dbeafe,stroke:#1e40af,color:#000
+ classDef turn fill:#fef9c3,stroke:#854d0e,color:#000
+
+ subgraph ALL["group: all_servers"]
+ direction LR
+ subgraph DMZ["DMZ 172.16.9.0/24"]
+ RP["reverseproxy
172.16.9.111
traefik_mode: dmz"]:::dmz
+ TURN["turn
172.16.9.112
(no role in site.yml yet)"]:::turn
+ end
+ subgraph BE["Backend 172.16.19.0/24
group: backend_servers"]
+ APP["application
172.16.19.101
traefik_mode: backend
+ authentik, authentik_outpost_ldap,
nextcloud, collabora, drawio"]:::app
+ ST["storage
172.16.19.102
traefik_mode: backend
+ garage (S3)"]:::stor
+ end
+ end
+
+ RP -.HTTPS in, HTTP out.-> APP
+ RP -.HTTPS in, HTTP out.-> ST
+```
+
+**Group memberships (from [hosts.yml](https://git.digitalboard.ch/Digitalboard/reference-ansible/src/branch/main/inventories/demo-gymburgdorf/hosts.yml)):**
+
+| Group | Members | Purpose |
+|---|---|---|
+| `all_servers` | `reverseproxy`, `application`, `storage`, `turn` | Base role for all hosts |
+| `traefik_servers` | `children: all_servers` (= all 4 hosts) | Traefik everywhere; DMZ/backend via `traefik_mode` |
+| `backend_servers` | `application`, `storage` | Sets `traefik_mode: backend` via group var |
+| `garage_servers` | `storage` | Single-host wrapper for the Garage role |
+| `nextcloud_servers`, `collabora_servers`, `drawio_servers`, `authentik_servers`, `authentik_outpost_ldap_servers` | `application` only | Single-host wrappers |
+
+> **Difference vs. the `vagrant` inventory:** `vagrant` structures
+> Traefik differently β via the children groups `traefik_servers_dmz`
+> and `traefik_servers_backend` instead of `backend_servers` +
+> `host_vars` override. The two topologies are **structurally
+> incompatible**; a 1:1 mapping is not possible. See
+> [operations.md](operations.md) for the recommended template.
+
+## 5. Service layout and variable placement
+
+```mermaid
+flowchart TB
+ classDef rp fill:#fee2e2,stroke:#991b1b,color:#000
+ classDef ap fill:#dcfce7,stroke:#166534,color:#000
+ classDef st fill:#dbeafe,stroke:#1e40af,color:#000
+ classDef ext fill:#e9d5ff,stroke:#6b21a8,color:#000
+
+ Internet((Internet))
+ DNS["DNS ns1.digitalboard.ch
RFC2136 TSIG
Zone: demo-gymb._acme.digitalboard.ch
CNAME bridge: _acme-challenge.*.gymb.souveredu.ch"]:::ext
+ BAO["OpenBao
bao.digitalboard.ch
mount: demo-gymburgdorf"]:::ext
+
+ subgraph RP["reverseproxy β traefik dmz"]
+ TRDMZ["traefik (file provider)
π group_vars/traefik_servers/traefik.yml
π host_vars/reverseproxy/traefik.yml
β traefik_mode: dmz
β traefik_dmz_exposed_services"]:::rp
+ end
+
+ subgraph APP["application β traefik backend"]
+ TRA["traefik (docker provider)
π group_vars/backend_servers/traefik.yml"]:::ap
+ AK["authentik (OIDC + LDAP outpost backend)
π host_vars/application/authentik.yml"]:::ap
+ AKO["authentik_outpost_ldap
π host_vars/application/authentik_outpost_ldap.yml"]:::ap
+ NC["nextcloud
π host_vars/application/nextcloud.yml"]:::ap
+ COL["collabora
π host_vars/application/collabora.yml"]:::ap
+ DRW["drawio
π host_vars/application/drawio.yml"]:::ap
+ end
+
+ subgraph ST["storage β traefik backend"]
+ TRS["traefik (docker provider)"]:::st
+ GAR["garage (S3)
π host_vars/storage/garage.yml"]:::st
+ end
+
+ Internet -->|HTTPS :443| TRDMZ
+ TRDMZ -->|HTTP backend| TRA
+ TRDMZ -->|HTTP backend| TRS
+ TRA --> AK & AKO & NC & COL & DRW
+ TRS --> GAR
+
+ NC -. S3 .-> GAR
+ NC -. OIDC .-> AK
+ NC -. WOPI .-> COL
+ NC -. LDAP .-> AKO
+ AKO -. RPC + token .-> AK
+
+ TRDMZ -. ACME DNS-01 TSIG .-> DNS
+ TRDMZ -. hashi_vault acme-tsig .-> BAO
+ AK -. hashi_vault secrets .-> BAO
+ NC -. hashi_vault secrets .-> BAO
+ GAR -. hashi_vault secrets .-> BAO
+```
+
+> **Note:** `opencloud`, `send`, `opnform`, `homarr`, and `bookstack`
+> are defined as plays in [playbooks/site.yml](https://git.digitalboard.ch/Digitalboard/reference-ansible/src/branch/main/playbooks/site.yml)
+> but currently have no matching group in
+> [hosts.yml](https://git.digitalboard.ch/Digitalboard/reference-ansible/src/branch/main/inventories/demo-gymburgdorf/hosts.yml) for
+> `demo-gymburgdorf` β those plays therefore run as no-ops. If a
+> tenant needs these services, add the corresponding
+> `_servers` group in `hosts.yml` and a
+> `host_vars/application/.yml` (mind the spelling β the
+> forms role is `opnform`, the LDAP role is `389ds`).
+>
+> The `turn` host is in `all_servers` (and therefore in
+> `traefik_servers`) but has **no** service group of its own β
+> currently only the `base` and `traefik` roles run on it.
diff --git a/architecture/variables.md b/architecture/variables.md
new file mode 100644
index 0000000..66bb648
--- /dev/null
+++ b/architecture/variables.md
@@ -0,0 +1,58 @@
+
+# Variables β hierarchy and cheatsheet
+
+β Back to [Architecture index](README.md)
+
+## 3. Variable hierarchy
+
+Ansible merges variables from multiple sources. Simplified model for
+this repo (see the Ansible docs for the full precedence rules):
+
+```mermaid
+flowchart LR
+ classDef role fill:#fef3c7,stroke:#92400e,color:#000
+ classDef group fill:#dbeafe,stroke:#1e40af,color:#000
+ classDef host fill:#dcfce7,stroke:#166534,color:#000
+ classDef vault fill:#fee2e2,stroke:#991b1b,color:#000
+
+ R["role defaults
(lowest precedence)
collections/.../roles/<r>/defaults/main.yml"]:::role
+ GA["group_vars/all/
vault.yml, docker.yml"]:::group
+ GG["group_vars/<group>/
traefik_servers/, backend_servers/
(parallel groups, merged via
ansible_group_priority)"]:::group
+ HV["host_vars/<host>/
(highest of the three inventory sources)"]:::host
+ BAO["OpenBao
lookup at runtime"]:::vault
+
+ R --> |"<overridden by>"| GA
+ GA --> |"<overridden by>"| GG
+ GG --> |"<overridden by>"| HV
+ HV -.community.hashi_vault.-> BAO
+ GG -.community.hashi_vault.-> BAO
+```
+
+**Key properties:**
+
+- Multiple `group_vars//` are **parallel**, not hierarchically
+ nested. `traefik_servers` and `backend_servers` are merged by
+ `ansible_group_priority` (default 1); on conflict the
+ alphabetically-later group name wins.
+- `host_vars//` beats any group.
+- `host_vars/reverseproxy/traefik.yml: traefik_mode: dmz` therefore
+ overrides the default from `group_vars/backend_servers/` β and only
+ because `reverseproxy` is not a member of `backend_servers` in the
+ first place (otherwise the override wouldn't even be needed).
+
+**Bao lookups** are not a precedence layer but **values** inside any
+variable source. See [security.md](security.md) for the pattern.
+
+## 9. Variable cheatsheet
+
+| Variable | Where in `demo-gymburgdorf/` | Why |
+|---|---|---|
+| `vault_addr`, `vault_mount` | `group_vars/all/vault.yml` | Bao endpoint applies site-wide |
+| `docker_registry_mirrors` | `group_vars/all/docker.yml` | Pulls from mirror on all hosts |
+| `traefik_acme_*`, `traefik_use_ssl`, `traefik_cert_mode` | `group_vars/traefik_servers/traefik.yml` | Applies to every Traefik instance (dmz + backend) |
+| `traefik_mode: backend` | `group_vars/backend_servers/traefik.yml` | Default for app + storage |
+| `traefik_mode: dmz` | `host_vars/reverseproxy/traefik.yml` | Host-specific override |
+| `traefik_dmz_exposed_services` | `host_vars/reverseproxy/` | DMZ backend list β only meaningful here |
+| `nextcloud_*`, `authentik_*`, `collabora_*`, `drawio_*` | `host_vars/application/.yml` | Service runs on `application` |
+| `garage_*` | `host_vars/storage/garage.yml` | Service runs on `storage` |
+| Secrets (passwords, tokens, keys) | inline variable using `lookup('community.hashi_vault.hashi_vault', β¦)` | Single source of truth via Bao |