feat: services bundle and collection documentation #8

Merged
Tobias-Wuest merged 14 commits from feat/services-bundle-and-docs into main 2026-06-04 09:17:43 +00:00

14 commits

Author SHA1 Message Date
Simon Bärlocher
a8954f525c
fix(opnform): align FRONT_API_SECRET across api and ui SSR path
The api service now also receives FRONT_API_SECRET so AuthenticateJWT
accepts the UI's server-side JWT forwards instead of blacklisting them
on UA mismatch. On the ui service the var is renamed FRONT_API_SECRET ->
NUXT_API_SECRET so Nuxt's runtimeConfig.apiSecret is actually populated
(NUXT_<key> convention) and injected as x-api-secret, short-circuiting
the UA-fingerprint check that otherwise 401s every reload.
2026-06-02 17:05:44 +02:00
Simon Bärlocher
3ace667b6c
feat(services): refine split-horizon OIDC routing and harden nextcloud patch
- authentik: address the rewrite service by compose service name instead
  of a network alias on the public FQDN, which shadowed extra_hosts pins
  and broke OIDC discovery for c-ares-based (Node) resolvers
- homarr: add homarr_extra_hosts to pin the IdP FQDN to a LAN IP so OIDC
  discovery stays in-network while the issuer matches the browser-facing URL
- opnform: add opnform_oidc_sso_redirect_root to 302 the root URL to the
  SSO path (deep-links untouched, /login?bypass=1 break-glass); restart
  ingress via container restart so envsubst re-renders nginx.conf
- nextcloud: make the UserConfig sed workaround fail loud on upstream
  drift instead of silently skipping (nextcloud/server#59629)
- gitignore: exclude the local .ansible/ collection cache
2026-06-02 13:44:08 +02:00
Simon Bärlocher
3236ca332f
docs(collection): document all roles and fix metadata drift
Replace ansible-galaxy init placeholders across the collection and
correct documentation that drifted from the code, after a multi-agent
review of every role README against its defaults, tasks and templates.

Collection level:
- README: role table for all 16 roles, requirements and role-ordering
- galaxy.yml: declare community.docker and community.general deps,
  real description/tags/urls; normalize license to MIT-0
- meta/runtime.yml: requires_ansible '>=2.15.0'
- plugins/README: document the homarr_layout filter and
  garage_credentials lookup instead of scaffold boilerplate

Per-role meta/main.yml and README for the placeholder roles
(389ds, authentik, authentik_outpost_ldap, base, collabora, drawio,
garage, homarr, httpbin, keycloak, nextcloud, opencloud, traefik).

Correctness fixes found during review:
- keycloak: wrong domain default, drop invented keycloak_cert_resolver,
  document the provisioning feature
- garage: root_domain is .s3.<first-entry>, not the bare domain
- opnform: jwt/front_api secrets use `openssl rand -hex 32`; align the
  validation fail_msg in tasks/main.yml accordingly
- send: S3 example references garage_s3_domains[0] (was singular)
- opencloud: document required opencloud_wopi_domain

License normalized to MIT-0 across galaxy.yml, role meta and READMEs to
match the SPDX headers.
2026-05-27 23:12:24 +02:00
Simon Bärlocher
19864d79b2
feat(services): multi-domain routing, split-horizon and OIDC hardening
Bundle of cross-role changes for the gymb services deployment:

- Traefik routers: OR-combine opnform/homarr/bookstack Host rules with new
  *_extra_domains (internal *.int.* FQDNs for a DMZ reverseproxy), and emit
  tls.certresolver only when traefik_cert_mode == acme (drawio, homarr,
  opnform, send).
- Split-horizon: bookstack_extra_hosts / opnform_extra_hosts add container
  /etc/hosts overrides so containers reach the IdP public FQDN over the LAN.
- bookstack: assert the OIDC issuer resolves concretely (reject "//v2.0"),
  allowing non-Entra IdPs that override bookstack_oidc_issuer.
- homarr: derive the bcrypt salt from the password digest so the admin hash
  is idempotent — no spurious template changes / container restarts.
- opnform: PATCH an existing OIDC connection instead of skipping (applies
  corrected inventory on re-run); add OIDC_FORCE_LOGIN (enabled only after
  bootstrap) and an optional direct-SSO ingress entrypoint.

Docs: READMEs and meta/argument_specs.yml updated for all new variables.
2026-05-27 23:12:24 +02:00
Simon Bärlocher
1dcff92240
docs(roles): add argument_specs and README for traefik, authentik, drawio, garage, nextcloud
Each of the five roles touched in this branch now ships:

* meta/argument_specs.yml: typed schema for every variable in
  defaults/main.yml plus the optional inputs surfaced via this
  branch (traefik_extra_hosts, authentik_host_rewrite_domains,
  authentik_proxy_apps.mode / .allowed_groups, drawio_extra_domains,
  drawio_authentik_forward_auth*, garage_webui_authentik_forward_auth*).
  All five specs load cleanly through ansible-core's
  ArgumentSpecValidator.

* README.md: replaces the ansible-galaxy boilerplate (where it was
  still in place) with a focused write-up — service vars, required
  secrets, ForwardAuth/idempotency notes, dependencies, and a working
  example playbook. authentik and garage READMEs are rewritten to cover
  the new knobs while preserving their existing content.
2026-05-27 23:12:24 +02:00
Simon Bärlocher
a9c33baed9
feat(drawio): support extra hostnames via drawio_extra_domains
Add drawio_extra_domains (list, default empty). The traefik Host rule
on the drawio router now expands to Host(<canonical>) || Host(<extra>)
... so the same container can answer on additional FQDNs — e.g. an
internal *.int.* name so a DMZ reverse-proxy can reach drawio via a
backend hostname covered by the local traefik cert.

Empty by default; behaviour unchanged for existing inventories.
2026-05-27 23:12:24 +02:00
Simon Bärlocher
60464e6d23
fix(nextcloud): in-container patch for UserConfig::getValueBool TypeError
nextcloud/server#59629: under PHP 8.x with OPcache,
UserConfig::getValueBool() passes a non-string from getTypedValue()
straight into strtolower(), throwing a TypeError on every authenticated
request once user_ldap is involved. Fix landed in master (PR #59646)
but no stable33 backport made it into 33.0.4.

Discover all compose-managed nextcloud containers, check whether the
`strtolower((string)` cast is already present, and `sed` it into
`lib/private/Config/UserConfig.php` on the ones that still ship the
broken version. Idempotent via grep guard so re-runs are no-ops.

Remove this block once the deployed image >= 33.0.4 ships the upstream fix.
2026-05-27 23:12:23 +02:00
Simon Bärlocher
f0cd8ba432
fix(nextcloud): make occ-driven config tasks idempotent
Every `occ config:app:set` / `ldap:set-config` / `notify_push:setup`
call previously fired on every play, marking changed even when the
stored value already matched. Now we read the current value first and
only invoke the setter when it differs:

* richdocuments (collabora): pre-read wopi_url, public_wopi_url,
  disable_certificate_verification, wopi_allowlist into a fact map;
  guard each `config:app:set` and tag `richdocuments:activate-config`
  with `changed_when: false` since it's a discovery refresh.

* drawio: same pattern for DrawioUrl, DrawioTheme, DrawioOffline,
  comparing as strings (occ stores booleans as "1"/"0").

* user_ldap: pre-read `ldap:show-config s01 --output=json`, parse JSON
  defensively (occ logs interleave on stderr), and skip per-key
  `ldap:set-config` calls when the stored value already equals the
  desired one.

* notify_push: skip `notify_push:setup` when the stored base_endpoint
  already matches the computed URL.

* plugins: `app:install`/`app:enable` were treating "already installed/
  enabled" output as a change. Add the negative match to `changed_when`
  so re-runs of a fully-provisioned site report ok rather than changed.
2026-05-27 23:12:23 +02:00
Simon Bärlocher
3855b3e0e7
fix(garage): make bootstrap & provision idempotent across reruns
* bootstrap: `garage layout show` truncates node IDs to 16 chars, but
  the membership check compared against the full hex. After the first
  successful join, subsequent runs no longer found the short ID in
  `layout show` and re-issued `layout assign`, marking the task
  changed every time. Compare against both the truncated and the full
  form so a configured node stays detected. Also tag the read-only
  `garage node id` / `layout show` probes with `changed_when: false`.

* provision keys: the old parser sliced `stdout_lines[1:]` to drop the
  header but missed that INFO log lines and ANSI escapes can interleave
  with table rows. Replace with an explicit `^GK[0-9a-fA-F]+` filter
  after stripping ANSI, so probe-output noise no longer corrupts the
  existing-keys set and triggers spurious `key new` calls.

* provision buckets: same class of fix — match `^[0-9a-f]{16}\s` data
  rows instead of slicing `[2:]`, which broke when the table header
  wasn't exactly two lines.

* provision permissions: pre-read `bucket info` for each (key, bucket)
  pair and only run `bucket allow` when the current `RWO` flag set for
  that key ID doesn't already match the desired permissions. Previously
  `bucket allow` ran unconditionally and reported changed every play.

* `changed_when: false` on all read-only probes (`key list`, `key info`,
  `bucket list`).
2026-05-27 23:12:23 +02:00
Simon Bärlocher
ce50bdb4d3
feat(drawio,garage): optional Authentik ForwardAuth in front of UIs
Add `*_authentik_forward_auth` + `*_authentik_forward_auth_url` knobs to
both roles. When enabled:

* drawio: traefik attaches a ForwardAuth middleware pointing at the
  authentik embedded outpost; unauthenticated requests get redirected
  to log in and downstream sees X-Authentik-* identity headers.

* garage WebUI: same ForwardAuth wiring, and `AUTH_USER_PASS` is dropped
  from the container env so authentik is the only gate. Tasks now key
  the htpasswd hash workflow off `_garage_webui_htpasswd_active`
  (`webui_enabled AND NOT authentik_forward_auth`); when authentik
  fronts the UI we skip hashing entirely. htpasswd hash is also now
  cached on disk and re-verified via `htpasswd -vbB` so unchanged
  passwords stop showing as `changed=true` on every run.

Both knobs default to `false`, preserving existing htpasswd/plain behaviour.
2026-05-27 23:12:23 +02:00
Simon Bärlocher
6411f94cce
feat(authentik): split-horizon host rewrite + proxy-app mode/group bindings
* `authentik_host_rewrite_domains`: extra hostnames that reach the
  authentik container but make it generate URLs (OIDC issuer, reset
  links) as if requested from the canonical `authentik_domains[0]`.
  Each entry gets its own traefik router and a URL-based loadbalancer
  service that disables passHostHeader and pins X-Forwarded-Host via
  middleware, so server-to-server calls on internal FQDNs keep traffic
  in the LAN while the iss claim stays aligned with the public host.
  Uses a network alias on the canonical FQDN so traefik (sharing the
  network) resolves the URL upstream to this very container.

* proxy-app blueprint:
  - `mode` (default `forward_single`) lets callers pick between proxy,
    forward_single and forward_domain providers in one template.
  - `allowed_groups`: when set, emit one PolicyBinding per group on
    the application; authentik OR-evaluates bindings, so users in any
    listed group pass and others are denied.

Existing inventories with an empty list see no behavioural change.
2026-05-27 23:12:23 +02:00
Simon Bärlocher
99d8968a2e
feat(traefik): configurable extra_hosts for container DNS overrides
Add `traefik_extra_hosts` (list of `host:ip`) that maps straight into
the traefik container's compose `extra_hosts`. Needed when a downstream
middleware (e.g. ForwardAuth to authentik on a sibling LAN) has to
resolve a public FQDN to an internal IP because the DMZ doesn't hairpin
the public address back inside.

Empty by default; behaviour unchanged for existing inventories.
2026-05-27 23:12:23 +02:00
Simon Bärlocher
2104e5fe7d
feat: drop blanket recreates, ACME-DNS knobs, notify_push override
- Drop `recreate: always` from collabora/drawio/homarr/opencloud/traefik
  handlers and the authentik_outpost_ldap start task. `up -d` with
  `state: present` already recreates exactly the services whose
  compose definition changed; the blanket recreate was forcing
  restarts even when nothing relevant moved.
- Rewrite the `*_domains` Traefik Host loop to the `Host(\`a\`) ||
  Host(\`b\`)` form across authentik/collabora/garage/nextcloud so the
  rule still matches when traefik can't normalize the comma-form into
  the same canonical shape.
- Traefik: add `traefik_acme_tcp_only` (sets LEGO_EXPERIMENTAL_DNS_TCP_ONLY)
  and `traefik_acme_disable_ans_checks` (disables lego's authoritative-NS
  propagation check) for environments where the DNS path between the
  traefik container and the zone's nameservers is constrained.
- Traefik DMZ collector: two-step merge so a `traefik_dmz_exposed_services`
  entry that sets its own `backend_host` wins over the host fallback;
  lets a route target an internal FQDN covered by the backend cert's
  SANs instead of the raw IP.
- Nextcloud: add `nextcloud_notify_push_domain` override for the
  `occ notify_push:setup` call so the setup check can hit an internal
  FQDN instead of hairpinning through the DMZ. Push router now matches
  every entry in `nextcloud_domains`.
- Nextcloud: also %2F-escape slashes in the postgres user/password
  inside the notify_push DATABASE_URL.
2026-05-27 23:12:23 +02:00
Simon Bärlocher
c3cf779532
feat: domain list refactor + demo-gymburgdorf fixes
- Refactor: collapse `*_domain` + `*_extra_domains` into a single
  `*_domains` list across authentik, collabora, garage and nextcloud
  roles. First entry is the canonical FQDN (used for OVERWRITEHOST,
  BASE_URL, notify_push setup and garage root_domain).
- Authentik blueprint: guard the OAuth sources block so an empty
  `authentik_login_sources` no longer renders an invalid YAML key.
- Nextcloud: introduce `nextcloud_collabora_public_domain` and set
  Collabora's `public_wopi_url` separately from the server-to-server
  `wopi_url` so browsers can reach Collabora via the public name while
  Nextcloud still talks to it on the internal one.
- Nextcloud: URL-encode the postgres user/password in DATABASE_URL.
2026-05-27 23:12:22 +02:00