diff --git a/README.md b/README.md index d49b004..6a5b32e 100644 --- a/README.md +++ b/README.md @@ -1,103 +1,35 @@ - -# πŸ“š Digitalboard Documentation +# πŸ“š Documentation Repository -Welcome β€” this repository is the **central documentation hub** for the -Digitalboard platform. It collects architecture notes, operational -runbooks, integration guides, and troubleshooting recipes that span -multiple repositories, so they have one stable home instead of being -scattered across READMEs. +This repository contains documentation, guides, and reference material. -## πŸ›οΈ Platform at a glance +## πŸ“– Available Documentation -```mermaid -flowchart LR - classDef docs fill:#dbeafe,stroke:#1e40af,color:#000 - classDef core fill:#dcfce7,stroke:#166534,color:#000 - classDef ans fill:#fef3c7,stroke:#92400e,color:#000 - classDef ext fill:#e9d5ff,stroke:#6b21a8,color:#000 +- **[Contribution guidelines](./contributing/)** + Documentation and guides related to infrastructure configuration and best practices. + - [Git](./contributing/git.md) + Guidelines for contributing using git - User((Operator / Engineer)) +- **[Infrastructure](./infrastructure/)** + Documentation and guides related to infrastructure configuration and best practices. + - [ACME](./infrastructure/acme.md) + Documentation of the ACME concept. + - [IPV6](./infrastructure/ipv6.md) + Documentation of the ipv6 concept. - subgraph REPOS["Digitalboard repositories"] - DOCS["docs
πŸ“– architecture, runbooks,
integration guides
(this repo)"]:::docs - CORE["digitalboard.core
βš™οΈ Ansible collection
= all roles
(traefik, authentik, nextcloud,
garage, keycloak, …)"]:::core - REF["reference-ansible
πŸš€ inventories + playbooks
(demo-gymburgdorf,
demo-phbern, demo-mbazΓΌrich,
vagrant)"]:::ans - end +- **[Keycloak](./keycloak/)** + Documentation and guides related to Keycloak configuration and best practices. + - [Enforce OTP 2FA for Internal Users](./keycloak/enforce-otp-internal.md) + Step-by-step instructions for enforcing OTP-based two-factor authentication for internal users, while excluding external Microsoft Entra users. + - [Integrate MS Entra in Keycloak as IDP](./keycloak/idp-ms-entra.md) + Step-by-step instructions for integrating MS Entra as identity-provider. - subgraph PLATFORM["Runtime targets"] - BAO["OpenBao
bao.digitalboard.ch
(secrets)"]:::ext - DNS["Knot DNS
ns1.digitalboard.ch
(ACME / split-horizon)"]:::ext - HOSTS["Tenant VMs
(reverseproxy Β· application Β·
storage Β· turn)"]:::ext - end +- **[Microsoft Entra](./ms-entra/)** + Documentation and guides related to Microsft Entra configuration and best practices. + - [Enterprise App Integration with Keycloak](./ms-entra/enterprise-app-keycloak.md) + Step-by-step instructions for creating an Enterprise Application in Microsoft Entra (Azure AD) as an identity provider for Keycloak. - User -->|reads| DOCS - User -->|runs `make deploy_…`| REF - REF -->|requires| CORE - REF -.->|hashi_vault lookups.-> BAO - REF -->|ansible-playbook| HOSTS - HOSTS -.->|nsupdate TSIG / ACME DNS-01.-> DNS - HOSTS -.->|hashi_vault lookups.-> BAO - DOCS -.documents.-> REF - DOCS -.documents.-> CORE -``` +- **[Troubleshooting](./troubleshooting/)** + Encountered & solved problems. + - [Nextcloud File Locking](./troubleshooting/nextcloud-file-locking.md) + Preventing sync conflicts when multiple users edit the same file via the Nextcloud desktop client. -**The three repos at a glance:** - -| Repo | Role | Link | -|---|---|---| -| **`docs`** *(here)* | Architecture, integration guides, runbooks, troubleshooting. The "why" and the "how it fits together." | [git.digitalboard.ch/Digitalboard/docs](https://git.digitalboard.ch/Digitalboard/docs) | -| **`digitalboard.core`** | Ansible collection β€” every reusable role (Traefik, Authentik, Keycloak, Nextcloud, Garage, …). The "what runs on a host." | [git.digitalboard.ch/Digitalboard/digitalboard.core](https://git.digitalboard.ch/Digitalboard/digitalboard.core) | -| **`reference-ansible`** | Inventories + playbooks for the demo tenants and the `vagrant` test setup. The "what gets deployed where, with which variables." | [git.digitalboard.ch/Digitalboard/reference-ansible](https://git.digitalboard.ch/Digitalboard/reference-ansible) | - -> πŸš€ **Want to deploy something?** Start in -> [`reference-ansible`](https://git.digitalboard.ch/Digitalboard/reference-ansible) β€” -> its README covers the Bao login, the `make` targets, and the available -> playbooks. Come back here for the architectural background -> ([architecture/](./architecture/)) or for solved problems -> ([troubleshooting/](./troubleshooting/)). - -## πŸ“– Contents - -- **[Architecture](./architecture/)** β€” How the `reference-ansible` - deployment is structured, using `demo-gymburgdorf` as the running example. - - [Index & glossary](./architecture/README.md) - - [Setup and repo layout](./architecture/setup.md) β€” control-node prerequisites, Bao login workflow - - [Variables](./architecture/variables.md) β€” Ansible variable hierarchy and cheatsheet - - [Topology](./architecture/topology.md) β€” Inventory groups, service layout per host - - [DNS and ACME](./architecture/dns.md) β€” Knot zones, TSIG/ACL model, split-horizon FQDNs - - [Deploy](./architecture/deploy.md) β€” Play sequence, Traefik DMZ vs. backend modes - - [Security](./architecture/security.md) β€” Bao lookup pattern, demo-only defaults, production hardening - - [Operations](./architecture/operations.md) β€” New-tenant walkthrough, known gaps - -- **[Contributing](./contributing/)** β€” Conventions for collaborating on this codebase. - - [Git](./contributing/git.md) β€” Guidelines for contributing using git - -- **[Infrastructure](./infrastructure/)** β€” Infrastructure-level concepts that apply across services. - - [ACME](./infrastructure/acme.md) β€” Documentation of the ACME concept - - [IPv6](./infrastructure/ipv6.md) β€” Documentation of the IPv6 concept - -- **[Keycloak](./keycloak/)** β€” Keycloak configuration and best practices. - - [Account Linking](./keycloak/account-linking.md) β€” How to link existing accounts to a federated identity - - [Enforce OTP 2FA for Internal Users](./keycloak/enforce-otp-internal.md) β€” OTP-based 2FA for internal users, excluding external MS Entra users - - [Integrate MS Entra in Keycloak as IDP](./keycloak/idp-ms-entra.md) β€” MS Entra as identity provider - -- **[Microsoft Entra](./ms-entra/)** β€” Microsoft Entra configuration and best practices. - - [Enterprise App Integration with Keycloak](./ms-entra/enterprise-app-keycloak.md) β€” Enterprise App in MS Entra (Azure AD) as IDP for Keycloak - -- **[Troubleshooting](./troubleshooting/)** β€” Encountered & solved problems. - - [Nextcloud File Locking](./troubleshooting/nextcloud-file-locking.md) β€” Preventing sync conflicts when multiple users edit the same file via the Nextcloud desktop client - -## 🧭 Where to look - -| If you want to… | Go to | -|---|---| -| Understand how a tenant is wired up | [architecture/topology.md](./architecture/topology.md) | -| Set up a new demo tenant | [architecture/operations.md](./architecture/operations.md) | -| Look up a variable's correct home | [architecture/variables.md](./architecture/variables.md) | -| Understand why two ACME models coexist | [architecture/dns.md](./architecture/dns.md) | -| Plug an identity provider into Keycloak | [keycloak/](./keycloak/) | -| Solve a recurring runtime issue | [troubleshooting/](./troubleshooting/) | - ---- - -πŸ“ Contributions follow the guidelines in [contributing/git.md](./contributing/git.md). diff --git a/architecture/README.md b/architecture/README.md deleted file mode 100644 index df6317f..0000000 --- a/architecture/README.md +++ /dev/null @@ -1,44 +0,0 @@ - -# Architecture β€” `reference-ansible` - -This documentation describes the architecture of the `reference-ansible` -repository and uses the inventory `inventories/demo-gymburgdorf/` as a -running example. It serves both as onboarding documentation for new -engineers and as a reference when setting up additional demo tenants. - -> **Demo-only.** All defaults in the roles (passwords, tokens, RPC -> secrets) are insecure and intended exclusively for demo setups. See -> [security.md](security.md). - -**Last updated:** 2026-05-26 Β· **Owner:** @sbaerlocher - -## Contents - -| Section | File | Topics | -|---|---|---| -| Setup and repo layout | [setup.md](setup.md) | Repo layout, role provenance, control-node prerequisites, Bao login workflow | -| Variables | [variables.md](variables.md) | Ansible variable hierarchy, variable cheatsheet | -| Topology | [topology.md](topology.md) | Inventory groups, service layout per host, variable placement | -| DNS and ACME | [dns.md](dns.md) | Knot zones, NS-delegated vs. ACL-isolated ACME models, split-horizon FQDNs, TSIG/ACL | -| Deploy | [deploy.md](deploy.md) | Play sequence, Traefik DMZ/backend modes | -| Security | [security.md](security.md) | Bao lookup pattern, demo-only defaults, threat boundaries, production hardening | -| Operations | [operations.md](operations.md) | New-tenant walkthrough, known gaps and trade-offs | - -## Glossary - -| Term | Meaning | -|---|---| -| **OpenBao** | HashiCorp Vault fork. Single source of truth for secrets. Endpoint: `bao.digitalboard.ch`. | -| **Authentik** | Identity provider. Issues OIDC for SP services and LDAP via the Outpost. | -| **Outpost (Authentik)** | Separate Authentik sidecar that emulates LDAP/proxy protocols for legacy apps. Talks to Authentik via RPC + token. | -| **WOPI** | Web Application Open Platform Interface β€” protocol used by Nextcloud/Opencloud to hand office documents to Collabora. | -| **TSIG / RFC2136** | Authenticated DNS updates. Traefik uses TSIG-signed `nsupdate` calls for ACME DNS-01 challenges. | -| **DNS-01 (ACME)** | Let's Encrypt challenge type: certificate ownership is proven via a TXT record in DNS instead of HTTP. Required for wildcard certs. | -| **CNAME bridge** | `_acme-challenge.` points via CNAME into a dedicated update label (`.demo-gymb._acme.digitalboard.ch`), keeping the TSIG key scoped to a narrow sub-tree. See [dns.md](dns.md). | -| **Knot DNS** | Authoritative DNS server used on `ns1.digitalboard.ch`. Config and zone files live in the separate [`dns-zones`](https://git.digitalboard.ch/Digitalboard/dns-zones) repo. | -| **DNSSEC** | Zones are signed with Ed25519, NSEC3 (no opt-out), KSK 1y / ZSK 90d rollovers, CDS/CDNSKEY published for automatic DS at the parent. | -| **Split horizon** | Two FQDN families per service: public `.gymb.souveredu.ch` β†’ DMZ Traefik front-end IP, internal `.int.gymb.souveredu.ch` β†’ directly the backend host. See [dns.md](dns.md). | -| **File provider / Docker provider** | Traefik configuration sources. The file provider reads static YAML; the Docker provider reads container labels via `/var/run/docker.sock`. | -| **STUN/TURN** | NAT-traversal protocols for WebRTC (e.g. for Nextcloud Talk). Runs on a separate host (`turn`). | -| **Garage** | S3-compatible object store (Rust). Backend for Nextcloud/Opencloud. | -| **FQCN** | Fully Qualified Collection Name, e.g. `digitalboard.core.traefik`. Mandatory in Ansible since 2.10. | diff --git a/architecture/deploy.md b/architecture/deploy.md deleted file mode 100644 index 13f1ab8..0000000 --- a/architecture/deploy.md +++ /dev/null @@ -1,74 +0,0 @@ - -# Deploy flow and Traefik modes - -← Back to [Architecture index](README.md) - -## 6. Deploy flow - -Sequence taken from [playbooks/site.yml](https://git.digitalboard.ch/Digitalboard/reference-ansible/src/branch/main/playbooks/site.yml): - -```mermaid -sequenceDiagram - participant U as User - participant A as ansible-playbook - participant V as OpenBao - participant H as Hosts - - U->>U: bao login + export VAULT_TOKEN - U->>A: make deploy_site_demo_gymburgdorf - A->>A: load vars: role defaults β†’ group_vars/all β†’ group_vars/<groups> β†’ host_vars/<host> - A->>V: community.hashi_vault lookups
(acme-tsig, service secrets) - V-->>A: secret values - A->>H: Play 1 β€” base (all hosts) - A->>H: Play 2 β€” traefik (all hosts: dmz on reverseproxy, backend elsewhere) - A->>H: Play 3 β€” httpbin - A->>H: Play 4 β€” 389ds - A->>H: Play 5 β€” keycloak - A->>H: Play 6 β€” garage (storage) - A->>H: Play 7 β€” collabora (application) - A->>H: Play 8 β€” authentik (application) - A->>H: Play 9 β€” authentik_outpost_ldap (application) - A->>H: Play 10 β€” nextcloud (application) - A->>H: Play 11 β€” drawio (application) - A->>H: Play 12 β€” send - A->>H: Play 13 β€” opnform - A->>H: Play 14 β€” homarr - A->>H: Play 15 β€” bookstack - A->>H: Play 16 β€” opencloud -``` - -Plays without matching group members (`httpbin_servers`, -`ds389_servers`, `keycloak_servers`, `send_servers`, -`opnform_servers`, `homarr_servers`, `bookstack_servers`, -`opencloud_servers` in this inventory) run as no-ops. - -> **Role-name spelling traps:** the LDAP role is `389ds` (not -> `ds389`); the forms role is `opnform` (not `openforms`/`openform`). -> Inventory groups must match the names used in -> [playbooks/site.yml](https://git.digitalboard.ch/Digitalboard/reference-ansible/src/branch/main/playbooks/site.yml) exactly β€” -> `ds389_servers`, `opnform_servers`. - -`--diff` is enabled in the target β†’ per-task changes are visible. - -## 7. Traefik modes (DMZ vs Backend) - -**`traefik_mode: dmz`** β€” public-facing reverse proxy on `reverseproxy`: - -- **File provider** with `services.yml` for static routing. -- No Docker socket mounted, no local containers. -- Routes to `backend_host` addresses on other machines. -- Backends are declared via `traefik_dmz_exposed_services` (a list in - `host_vars/reverseproxy/`). Selective backend selection is also - possible via `traefik_backend_servers_to_proxy`. - -**`traefik_mode: backend`** β€” application/storage: - -- Mounts `/var/run/docker.sock`. -- **Docker provider**: auto-discovery via container labels - (`traefik.enable=true`). -- Services are exposed locally; the DMZ Traefik routes external - traffic to them in plaintext HTTP (see - [security.md](security.md)). - -**Both modes** support ACME via RFC2136 DNS challenge or self-signed -(`traefik_cert_mode: acme | selfsigned`). diff --git a/architecture/dns.md b/architecture/dns.md deleted file mode 100644 index 226dd5c..0000000 --- a/architecture/dns.md +++ /dev/null @@ -1,123 +0,0 @@ - -# DNS topology and ACME zone layout - -← Back to [Architecture index](README.md) - -Authoritative DNS for everything described in this document runs on -**`ns1.digitalboard.ch`** (public `193.43.183.169`, DMZ `172.16.9.169`) -using **Knot DNS**. The zone files and Knot config live in the -[`dns-zones`](https://git.digitalboard.ch/Digitalboard/dns-zones) repo; this section explains how the -public service FQDNs, the internal "split-horizon" FQDNs, and the ACME -challenge sub-trees fit together. - -## Authoritative zones on `ns1` - -| Zone | Purpose | DNSSEC | Dynamic updates | -|---|---|---|---| -| `digitalboard.ch` | Production zone for the platform itself (`auth`, `cloud`, `office`, `bao`, …). | on | none (static zone file) | -| `_acme.digitalboard.ch` | Parent zone for ACME challenge labels. | on | yes, per-tenant TSIG ACLs (`demo-gymb`, `demo-phbe`, `demo-mbaz`) | -| `digitalboard._acme.digitalboard.ch` | **Delegated** child zone for `digitalboard.ch` ACME updates only. | off | yes, TSIG `acme_update_key_digitalboard` | -| `souveredu.ch` | Demo-tenant zone (`gymb`, `phbe`, `mbaz` sub-labels). | on | none (static zone file) | -| `demo-schulen.ch` | Reserve / unused so far. | on | none | - -> **Two different ACME models live here.** This is the most common -> source of confusion when copying a tenant: -> -> - `digitalboard.ch` uses a **NS-delegated child zone** -> (`digitalboard._acme.digitalboard.ch.` has its own `NS` record in -> `_acme.digitalboard.ch`). The TSIG key writes into that delegated -> zone. -> - The demo tenants (`demo-gymb`, `demo-phbe`, `demo-mbaz`) **share -> the parent zone** `_acme.digitalboard.ch` and are isolated only -> by **Knot ACL `update-owner-name`** on the per-tenant sub-tree -> (`demo-gymb._acme.digitalboard.ch.` and below). There is no NS -> delegation for them. -> -> Both work for the ACME flow; the demo model is cheaper to manage but -> means tenant isolation depends on Knot ACLs, not zone boundaries. - -## Naming pattern for `demo-gymb` (template for new tenants) - -```text -Public, browser-facing: - cloud.gymb.souveredu.ch CNAME β†’ rvp.gymb.souveredu.ch (193.43.183.131) - auth.gymb.souveredu.ch CNAME β†’ rvp.gymb.souveredu.ch - office.gymb.souveredu.ch CNAME β†’ rvp.gymb.souveredu.ch - s3.gymb.souveredu.ch CNAME β†’ rvp.gymb.souveredu.ch - ... - -Internal, server-to-server (split horizon): - cloud.int.gymb.souveredu.ch A β†’ 172.16.19.101 (application host) - auth.int.gymb.souveredu.ch A β†’ 172.16.19.101 - office.int.gymb.souveredu.ch A β†’ 172.16.19.101 - s3.int.gymb.souveredu.ch A β†’ 172.16.19.102 (storage host) - ... - -Tenant entry IPs: - rvp.gymb.souveredu.ch A β†’ 193.43.183.131 (DMZ Traefik public) - reverseproxy.int.gymb A β†’ 172.16.9.111 (DMZ Traefik internal) - -ACME challenge labels (writeable via TSIG acme_update_key_demo_gymb): - _acme-challenge.cloud.gymb CNAME β†’ cloud.demo-gymb._acme.digitalboard.ch - _acme-challenge.cloud.int.gymb CNAME β†’ cloud.int.demo-gymb._acme.digitalboard.ch - ... -``` - -The `.int.` family is what makes Nextcloud β†’ Garage, Nextcloud β†’ -Authentik (OIDC), Nextcloud β†’ Collabora (WOPI) etc. **bypass the DMZ -Traefik**: the backend host's local Traefik presents the right cert -directly, so traffic stays on the backend subnet. Without this, -server-to-server calls would either ride out through the DMZ and back -in, or hit a hostname mismatch on the cert. - -## TSIG / ACL model - -```mermaid -flowchart LR - classDef tenant fill:#dcfce7,stroke:#166534,color:#000 - classDef zone fill:#dbeafe,stroke:#1e40af,color:#000 - classDef acl fill:#fef3c7,stroke:#92400e,color:#000 - - subgraph KNOT["ns1.digitalboard.ch (Knot DNS)"] - Z1["_acme.digitalboard.ch
(parent zone)"]:::zone - Z2["digitalboard._acme.digitalboard.ch
(NS-delegated child)"]:::zone - A1["ACL acme_updates_digitalboard
scope: digitalboard._acme.digitalboard.ch."]:::acl - A2["ACL acme_updates_demo_gymb
scope: demo-gymb._acme.digitalboard.ch."]:::acl - A3["ACL acme_updates_demo_phbe
scope: demo-phbe._acme.digitalboard.ch."]:::acl - A4["ACL acme_updates_demo_mbaz
scope: demo-mbaz._acme.digitalboard.ch."]:::acl - end - - DB["digitalboard.ch Traefik
TSIG: acme_update_key_digitalboard"]:::tenant - GY["demo-gymb Traefik
TSIG: acme_update_key_demo_gymb"]:::tenant - PH["demo-phbe Traefik
TSIG: acme_update_key_demo_phbe"]:::tenant - MB["demo-mbaz Traefik
TSIG: acme_update_key_demo_mbaz"]:::tenant - - DB -- nsupdate TXT --> A1 - GY -- nsupdate TXT --> A2 - PH -- nsupdate TXT --> A3 - MB -- nsupdate TXT --> A4 - A1 -- writes into --> Z2 - A2 -- writes into --> Z1 - A3 -- writes into --> Z1 - A4 -- writes into --> Z1 -``` - -Each ACL is restricted to **`update-type: TXT`** and -**`update-owner-match: sub-or-equal`** under the tenant prefix, so a -leaked tenant key cannot write outside its own ACME sub-tree and cannot -modify non-TXT records (no A/CNAME/NS hijack). - -## Traefik variables that bind to this layout - -From `inventories/demo-gymburgdorf/group_vars/traefik_servers/traefik.yml`: - -| Traefik variable | Value for `demo-gymb` | Bound to | -|---|---|---| -| `traefik_acme_dns_provider` | `rfc2136` | Knot dynamic-update endpoint | -| `traefik_acme_dns_zone` | `demo-gymb._acme.digitalboard.ch` | Per-tenant write scope on `ns1` | -| `traefik_acme_tsig_key_name` | `acme_update_key_demo_gymb` | Matches `key:` entry in [`knot.conf`](https://git.digitalboard.ch/Digitalboard/dns-zones/src/branch/main/knot/knot.conf) | -| `traefik_acme_tsig_secret` | Bao lookup | See [security.md](security.md) | - -A tenant whose ACME zone does **not** match the Knot ACL -`update-owner-name` will get `REFUSED` on `nsupdate` and ACME issuance -will silently retry until the renewal window expires. diff --git a/architecture/operations.md b/architecture/operations.md deleted file mode 100644 index 58ce8b9..0000000 --- a/architecture/operations.md +++ /dev/null @@ -1,99 +0,0 @@ - -# Operations β€” new tenants and known gaps - -← Back to [Architecture index](README.md) - -## 10. Walkthrough: creating a new demo tenant - -Recommended template: **`demo-gymburgdorf`** (not `vagrant`, since its -group topology is incompatible). - -1. **Copy the inventory:** - - ```bash - cp -r inventories/demo-gymburgdorf inventories/demo- - ``` - -2. **Adjust `hosts.yml`:** IPs and hostnames per host. - -3. **`group_vars/all/vault.yml`** β€” point `vault_mount` at the new - tenant mount (`demo-`). - -4. **`group_vars/traefik_servers/traefik.yml`** β€” bend - `traefik_acme_dns_zone` and the `traefik_acme_tsig_*` lookup paths - to the new zone / new Bao path. - -5. **`host_vars/application/*.yml`** and - **`host_vars/storage/*.yml`** β€” walk through them: FQDNs to the new - domain pattern (e.g. `*..souveredu.ch`), Bao lookup paths - to `demo-/data/…`. - -6. **Prepare OpenBao** (out-of-band, not via Ansible): - - Create a new KV-v2 mount `demo-`. - - Write secrets: `acme-tsig`, `authentik`, `nextcloud`, `garage`, … - (see [security.md](security.md) for the mandatory-override list). - - Policy for the deploy token: read on `demo-/data/*`. - -7. **DNS** (in the [`dns-zones`](https://git.digitalboard.ch/Digitalboard/dns-zones) repo, see - [dns.md](dns.md)): - - Add `key:` and `acl:` entries for the new tenant in - [`knot/knot.conf`](https://git.digitalboard.ch/Digitalboard/dns-zones/src/branch/main/knot/knot.conf), pattern - `acme_update_key_demo_` / - `acme_updates_demo_` scoped to - `demo-._acme.digitalboard.ch.`. - - Append the new ACL to the `_acme.digitalboard.ch` zone's `acl:` - list β€” the tenants share the parent zone, no NS delegation. - - In `zones/souveredu.ch.zone` (or the tenant's public zone) add - the public/internal A records (`rvp.`, - `reverseproxy.int.`, `application.int.`, - `storage.int.`, …), the service CNAMEs to - `rvp.`, and the `_acme-challenge.*` CNAMEs into - `demo-._acme.digitalboard.ch`. Bump the SOA serial. - - `make deploy_ns1` to push. - -8. **Makefile** β€” add a new target modelled on - `deploy_site_demo_gymburgdorf` and wire it into - `deploy_site_demo`. - -9. **Smoke test:** - `ansible all -i inventories/demo-/hosts.yml -m ping`. - -10. **Deploy:** Bao login + `make deploy_site_demo_`. - -## 11. Known gaps and trade-offs - -- **Optional services without group bindings in `demo-gymburgdorf`:** - `opencloud`, `send`, `opnform`, `homarr`, and `bookstack` are - declared as plays in - [playbooks/site.yml](https://git.digitalboard.ch/Digitalboard/reference-ansible/src/branch/main/playbooks/site.yml) but have no - `_servers` group in the inventory β€” those plays run as - no-ops. If needed, add the group + `host_vars/application/.yml` - as described in [topology.md](topology.md). Mind spelling: - `opnform_servers` (not `openform`/`openforms`). -- **`turn` host:** defined in the DMZ, but no STUN/TURN role in - [playbooks/site.yml](https://git.digitalboard.ch/Digitalboard/reference-ansible/src/branch/main/playbooks/site.yml). Currently provisioned only - via `base` + `traefik`. -- **Idempotency:** roles are Docker-Compose-based; re-runs may trigger - container restarts when compose inputs change. There is no dedicated - rollback mechanism β€” on failure, roll back manually to the previous - state. -- **TLS renewal:** handled internally by Traefik via ACME. There is no - external renewal cron in the repo. -- **CI / testing:** not present in the repo. Smoke test is - `make ping_demo`. -- **Logs:** Traefik runs with `traefik_log_level: DEBUG` in - `demo-gymburgdorf` and `vagrant` (role default is `INFO`) β€” reduce - to `INFO` or `WARN` before adapting for production. -- **TSIG secrets in `knot.conf`:** the `dns-zones` repo currently - stores all four ACME TSIG keys in plaintext in - [`knot/knot.conf`](https://git.digitalboard.ch/Digitalboard/dns-zones/src/branch/main/knot/knot.conf). The Ansible - side reads them from Bao, but the Knot side does not β€” anyone with - read on the `dns-zones` repo can write TXT records under the - matching tenant's ACME sub-tree. For prod, source the Knot keys - from a templated config + secret store, or restrict repo access. -- **Demo tenants share `_acme.digitalboard.ch`:** isolation is by - Knot ACL `update-owner-name`, not by zone delegation. A mis-edit - of the ACL list could break ACL-based isolation without breaking - DNS resolution β€” failure is silent. The production zone - (`digitalboard.ch`) uses a properly delegated child zone and is - not affected. diff --git a/architecture/security.md b/architecture/security.md deleted file mode 100644 index 64dc6ec..0000000 --- a/architecture/security.md +++ /dev/null @@ -1,71 +0,0 @@ - -# Security and demo-only defaults - -← Back to [Architecture index](README.md) - -> This repo is explicitly designed for **demo setups**. All default -> values in the roles are insecure and are overridden in `demo-*` -> inventories via Bao lookups or host_vars. For production deployments -> the hardening block further down also applies. - -## Secret pattern (Bao lookup) - -```yaml -# group_vars/.../.yml or host_vars/.../.yml -authentik_secret_key: "{{ lookup('community.hashi_vault.hashi_vault', - vault_mount + '/data/authentik:secret_key', - url=vault_addr) }}" -``` - -- `vault_mount` and `vault_addr` come from - [group_vars/all/vault.yml](https://git.digitalboard.ch/Digitalboard/reference-ansible/src/branch/main/inventories/demo-gymburgdorf/group_vars/all/vault.yml). -- KV-v2 paths require an explicit `/data/` segment β€” Ansible does not - resolve this automatically. -- `vault_mount` is unique per inventory (`demo-gymburgdorf`, - `demo-phbern`, …) β†’ tenant isolation in Bao via mount + policy. - -## Demo-only defaults β€” override required - -These defaults in `digitalboard.core` are insecure. In any -**production-grade** deployment they must be overridden via Bao lookup -or host_var: - -| Variable | Default | Where to override | -|---|---|---| -| `keycloak_admin_password` | `changeme` | host_vars `keycloak_servers` | -| `keycloak_postgres_password` | `changeme` | same | -| `authentik_secret_key` | `changeme-generate-a-random-string` | `host_vars/application/authentik.yml` | -| `authentik_postgres_password` | `changeme` | same | -| `nextcloud_admin_password` | `admin` | `host_vars/application/nextcloud.yml` | -| `nextcloud_postgres_password` | `changeme` | same | -| `nextcloud_s3_key` / `nextcloud_s3_secret` | `changeme` / `changeme` | same | -| `garage_webui_password` | `admin` | `host_vars/storage/garage.yml` | -| `garage_rpc_secret` | `0123…cdef` (64-hex constant) | same | -| `garage_admin_token` | identical to `rpc_secret` | same | -| `garage_metrics_token` | identical to `rpc_secret` | same | - -> **Convention:** every value listed above **must** have a Bao lookup -> in `demo-*/host_vars/.../...yml` before the inventory is considered -> deploy-ready. - -## Threat boundaries (current demo state) - -| Boundary | Status | Notes | -|---|---|---| -| DMZ ↔ Backend (172.16.9 ↔ 172.16.19) | **Plaintext HTTP** | Auth bearers, OIDC codes, session cookies travel unencrypted. Fine for demo; for prod use mTLS or a WireGuard overlay. | -| Host firewall | **missing** | The `base` role does not install UFW/nftables. Segmentation relies on the hypervisor/VLAN. | -| SSH | `ansible_user: root` | No bastion, no jump host. Key distribution out-of-band. | -| Authentik SPOF | **accepted** | IdP and SP services share the same host (`application`). An Authentik outage means a login outage including the LDAP outpost. No break-glass path. | -| ACME TSIG key | Bao lookup (in Ansible), **plaintext in [`knot.conf`](https://git.digitalboard.ch/Digitalboard/dns-zones/src/branch/main/knot/knot.conf)** on `ns1` side | One TSIG key per demo tenant, scoped via Knot ACL `update-owner-name` to the tenant's ACME sub-tree. Rotation is manual and must be done on both sides simultaneously (Bao + `knot.conf` + `knotc zone-reload`). | -| Backup / DR | **out of scope** | Garage `replication_factor: 1` (default), no Postgres backup job, no Bao snapshot cron. | - -## To adapt for production, add - -- Host firewall (extend the `base` role or add a dedicated `firewall` - role). -- mTLS or WireGuard between DMZ and backend. -- Authentik on a separate host with a recovery admin token. -- Bao policies per inventory mount (read-only for the deploy token, - write-only for the bootstrap job). -- Backup cron for Postgres + Garage + Bao. -- SSH bastion + key rotation. diff --git a/architecture/setup.md b/architecture/setup.md deleted file mode 100644 index 0c75d18..0000000 --- a/architecture/setup.md +++ /dev/null @@ -1,68 +0,0 @@ - -# Setup and repo layout - -← Back to [Architecture index](README.md) - -## 1. Repo layout and role provenance - -```text -reference-ansible/ -β”œβ”€β”€ Makefile # Deploy targets, OIDC login, OBJC fork workaround -β”œβ”€β”€ ansible.cfg # collections_path, remote_user=root, hashi_vault auth_method=token -β”œβ”€β”€ requirements.yml # community.hashi_vault + digitalboard.core (Git) -β”œβ”€β”€ playbooks/site.yml # Play sequence (14 plays, see deploy.md) -β”œβ”€β”€ collections/ # ← installed by `make install`, gitignored -β”‚ └── ansible_collections/ -β”‚ └── digitalboard/core/ -β”‚ └── roles/ # πŸ”‘ Roles live HERE, NOT in the repo root -└── inventories/ - β”œβ”€β”€ demo-gymburgdorf/ # Inventory used throughout this document - β”œβ”€β”€ demo-mbazΓΌrich/ - β”œβ”€β”€ demo-phbern/ - └── vagrant/ # Local test inventory with its own topology -``` - -> **Important:** There is **no** `roles/` directory at the repo root. -> All roles come from the `digitalboard.core` collection (see -> [requirements.yml](https://git.digitalboard.ch/Digitalboard/reference-ansible/src/branch/main/requirements.yml)), installed via `make install` -> into `./collections/`. Plays reference them by FQCN -> `digitalboard.core.`. - -## 2. Setup and prerequisites - -**Tools on the control node:** - -- `ansible` (Core β‰₯ 2.15) -- `bao` CLI (OpenBao) β€” e.g. `sudo pacman -S openbao python-hvac` (Arch) or Homebrew -- `python-hvac` (for `community.hashi_vault` lookups) -- On macOS: `OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` (set in the - [Makefile](https://git.digitalboard.ch/Digitalboard/reference-ansible/src/branch/main/Makefile); without it Ansible forks crash on Bao lookups) - -**Initial setup:** - -```bash -git clone -cd reference-ansible -make install # Galaxy + digitalboard.core into ./collections/ -``` - -**Before every deploy:** Bao login in the **same shell** that will then -run `ansible-playbook`: - -```bash -export BAO_ADDR=https://bao.digitalboard.ch -bao login -method=oidc -path=Digitalboard -export VAULT_TOKEN=$(bao print token) -``` - -> ⚠️ `make bao` on its own is **not enough**: every `make` target spawns -> a new shell, and the `VAULT_TOKEN` exported in there only lives for -> the duration of `make bao` itself. Either run the three commands -> above manually, or invoke `make bao deploy_site_demo_gymburgdorf` as -> **one** call β€” otherwise the deploy has no token. - -**Smoke test:** - -```bash -make ping_demo # pings all three demo inventories -``` diff --git a/architecture/topology.md b/architecture/topology.md deleted file mode 100644 index c884c4d..0000000 --- a/architecture/topology.md +++ /dev/null @@ -1,110 +0,0 @@ - -# Topology β€” inventory and services - -← Back to [Architecture index](README.md) - -## 4. Inventory topology (`demo-gymburgdorf`) - -```mermaid -flowchart LR - classDef dmz fill:#fee2e2,stroke:#991b1b,color:#000 - classDef app fill:#dcfce7,stroke:#166534,color:#000 - classDef stor fill:#dbeafe,stroke:#1e40af,color:#000 - classDef turn fill:#fef9c3,stroke:#854d0e,color:#000 - - subgraph ALL["group: all_servers"] - direction LR - subgraph DMZ["DMZ 172.16.9.0/24"] - RP["reverseproxy
172.16.9.111
traefik_mode: dmz"]:::dmz - TURN["turn
172.16.9.112
(no role in site.yml yet)"]:::turn - end - subgraph BE["Backend 172.16.19.0/24
group: backend_servers"] - APP["application
172.16.19.101
traefik_mode: backend
+ authentik, authentik_outpost_ldap,
nextcloud, collabora, drawio"]:::app - ST["storage
172.16.19.102
traefik_mode: backend
+ garage (S3)"]:::stor - end - end - - RP -.HTTPS in, HTTP out.-> APP - RP -.HTTPS in, HTTP out.-> ST -``` - -**Group memberships (from [hosts.yml](https://git.digitalboard.ch/Digitalboard/reference-ansible/src/branch/main/inventories/demo-gymburgdorf/hosts.yml)):** - -| Group | Members | Purpose | -|---|---|---| -| `all_servers` | `reverseproxy`, `application`, `storage`, `turn` | Base role for all hosts | -| `traefik_servers` | `children: all_servers` (= all 4 hosts) | Traefik everywhere; DMZ/backend via `traefik_mode` | -| `backend_servers` | `application`, `storage` | Sets `traefik_mode: backend` via group var | -| `garage_servers` | `storage` | Single-host wrapper for the Garage role | -| `nextcloud_servers`, `collabora_servers`, `drawio_servers`, `authentik_servers`, `authentik_outpost_ldap_servers` | `application` only | Single-host wrappers | - -> **Difference vs. the `vagrant` inventory:** `vagrant` structures -> Traefik differently β€” via the children groups `traefik_servers_dmz` -> and `traefik_servers_backend` instead of `backend_servers` + -> `host_vars` override. The two topologies are **structurally -> incompatible**; a 1:1 mapping is not possible. See -> [operations.md](operations.md) for the recommended template. - -## 5. Service layout and variable placement - -```mermaid -flowchart TB - classDef rp fill:#fee2e2,stroke:#991b1b,color:#000 - classDef ap fill:#dcfce7,stroke:#166534,color:#000 - classDef st fill:#dbeafe,stroke:#1e40af,color:#000 - classDef ext fill:#e9d5ff,stroke:#6b21a8,color:#000 - - Internet((Internet)) - DNS["DNS ns1.digitalboard.ch
RFC2136 TSIG
Zone: demo-gymb._acme.digitalboard.ch
CNAME bridge: _acme-challenge.*.gymb.souveredu.ch"]:::ext - BAO["OpenBao
bao.digitalboard.ch
mount: demo-gymburgdorf"]:::ext - - subgraph RP["reverseproxy β€” traefik dmz"] - TRDMZ["traefik (file provider)
πŸ“ group_vars/traefik_servers/traefik.yml
πŸ“ host_vars/reverseproxy/traefik.yml
β†’ traefik_mode: dmz
β†’ traefik_dmz_exposed_services"]:::rp - end - - subgraph APP["application β€” traefik backend"] - TRA["traefik (docker provider)
πŸ“ group_vars/backend_servers/traefik.yml"]:::ap - AK["authentik (OIDC + LDAP outpost backend)
πŸ“ host_vars/application/authentik.yml"]:::ap - AKO["authentik_outpost_ldap
πŸ“ host_vars/application/authentik_outpost_ldap.yml"]:::ap - NC["nextcloud
πŸ“ host_vars/application/nextcloud.yml"]:::ap - COL["collabora
πŸ“ host_vars/application/collabora.yml"]:::ap - DRW["drawio
πŸ“ host_vars/application/drawio.yml"]:::ap - end - - subgraph ST["storage β€” traefik backend"] - TRS["traefik (docker provider)"]:::st - GAR["garage (S3)
πŸ“ host_vars/storage/garage.yml"]:::st - end - - Internet -->|HTTPS :443| TRDMZ - TRDMZ -->|HTTP backend| TRA - TRDMZ -->|HTTP backend| TRS - TRA --> AK & AKO & NC & COL & DRW - TRS --> GAR - - NC -. S3 .-> GAR - NC -. OIDC .-> AK - NC -. WOPI .-> COL - NC -. LDAP .-> AKO - AKO -. RPC + token .-> AK - - TRDMZ -. ACME DNS-01 TSIG .-> DNS - TRDMZ -. hashi_vault acme-tsig .-> BAO - AK -. hashi_vault secrets .-> BAO - NC -. hashi_vault secrets .-> BAO - GAR -. hashi_vault secrets .-> BAO -``` - -> **Note:** `opencloud`, `send`, `opnform`, `homarr`, and `bookstack` -> are defined as plays in [playbooks/site.yml](https://git.digitalboard.ch/Digitalboard/reference-ansible/src/branch/main/playbooks/site.yml) -> but currently have no matching group in -> [hosts.yml](https://git.digitalboard.ch/Digitalboard/reference-ansible/src/branch/main/inventories/demo-gymburgdorf/hosts.yml) for -> `demo-gymburgdorf` β€” those plays therefore run as no-ops. If a -> tenant needs these services, add the corresponding -> `_servers` group in `hosts.yml` and a -> `host_vars/application/.yml` (mind the spelling β€” the -> forms role is `opnform`, the LDAP role is `389ds`). -> -> The `turn` host is in `all_servers` (and therefore in -> `traefik_servers`) but has **no** service group of its own β€” -> currently only the `base` and `traefik` roles run on it. diff --git a/architecture/variables.md b/architecture/variables.md deleted file mode 100644 index 66bb648..0000000 --- a/architecture/variables.md +++ /dev/null @@ -1,58 +0,0 @@ - -# Variables β€” hierarchy and cheatsheet - -← Back to [Architecture index](README.md) - -## 3. Variable hierarchy - -Ansible merges variables from multiple sources. Simplified model for -this repo (see the Ansible docs for the full precedence rules): - -```mermaid -flowchart LR - classDef role fill:#fef3c7,stroke:#92400e,color:#000 - classDef group fill:#dbeafe,stroke:#1e40af,color:#000 - classDef host fill:#dcfce7,stroke:#166534,color:#000 - classDef vault fill:#fee2e2,stroke:#991b1b,color:#000 - - R["role defaults
(lowest precedence)
collections/.../roles/<r>/defaults/main.yml"]:::role - GA["group_vars/all/
vault.yml, docker.yml"]:::group - GG["group_vars/<group>/
traefik_servers/, backend_servers/
(parallel groups, merged via
ansible_group_priority)"]:::group - HV["host_vars/<host>/
(highest of the three inventory sources)"]:::host - BAO["OpenBao
lookup at runtime"]:::vault - - R --> |"<overridden by>"| GA - GA --> |"<overridden by>"| GG - GG --> |"<overridden by>"| HV - HV -.community.hashi_vault.-> BAO - GG -.community.hashi_vault.-> BAO -``` - -**Key properties:** - -- Multiple `group_vars//` are **parallel**, not hierarchically - nested. `traefik_servers` and `backend_servers` are merged by - `ansible_group_priority` (default 1); on conflict the - alphabetically-later group name wins. -- `host_vars//` beats any group. -- `host_vars/reverseproxy/traefik.yml: traefik_mode: dmz` therefore - overrides the default from `group_vars/backend_servers/` β€” and only - because `reverseproxy` is not a member of `backend_servers` in the - first place (otherwise the override wouldn't even be needed). - -**Bao lookups** are not a precedence layer but **values** inside any -variable source. See [security.md](security.md) for the pattern. - -## 9. Variable cheatsheet - -| Variable | Where in `demo-gymburgdorf/` | Why | -|---|---|---| -| `vault_addr`, `vault_mount` | `group_vars/all/vault.yml` | Bao endpoint applies site-wide | -| `docker_registry_mirrors` | `group_vars/all/docker.yml` | Pulls from mirror on all hosts | -| `traefik_acme_*`, `traefik_use_ssl`, `traefik_cert_mode` | `group_vars/traefik_servers/traefik.yml` | Applies to every Traefik instance (dmz + backend) | -| `traefik_mode: backend` | `group_vars/backend_servers/traefik.yml` | Default for app + storage | -| `traefik_mode: dmz` | `host_vars/reverseproxy/traefik.yml` | Host-specific override | -| `traefik_dmz_exposed_services` | `host_vars/reverseproxy/` | DMZ backend list β€” only meaningful here | -| `nextcloud_*`, `authentik_*`, `collabora_*`, `drawio_*` | `host_vars/application/.yml` | Service runs on `application` | -| `garage_*` | `host_vars/storage/garage.yml` | Service runs on `storage` | -| Secrets (passwords, tokens, keys) | inline variable using `lookup('community.hashi_vault.hashi_vault', …)` | Single source of truth via Bao |