# Outbound Webhook Subsystem PRD

**Status:** Draft for implementation
**Last updated:** 2026-04-16
**Priority:** P1 — committed for post-launch delivery

---

## 1) Purpose

Give marketplace operators the ability to react to Guild state changes in real time by registering HTTPS webhook endpoints that receive signed event payloads.

Today, marketplaces must poll `GET /v1/wallet/:id` (or admin endpoints) to see changes. This blocks use cases that need low-latency reaction:
- Marketplace wants to send its own notifications when a user earns
- Marketplace wants to run its own fraud checks when tokens are minted
- Marketplace wants to write its own accounting entries when Guild settles

This is the #1 most-requested integration capability from operator conversations.

---

## 2) Scope

### In scope

- Webhook endpoint registration per tenant (one or more URLs)
- Event subscription (which event types each URL receives)
- Durable delivery with retry + dead-letter (outbox pattern)
- HMAC-SHA256 signed payloads with replay-protection timestamps
- Delivery acknowledgment (2xx = success, anything else = retry)
- Marketplace-facing delivery health endpoint
- Guild-ops admin observability for webhook reliability

### Out of scope for v1 of this subsystem

- Bidirectional sync (Guild does not accept inbound events from marketplaces on this path — they use the existing `/v1/*` API)
- Webhook endpoint testing UI (a separate devex item)
- Custom payload transformation (events emit in their canonical shape)

---

## 3) Event Catalog

Events emitted via the new subsystem:

| Event | When emitted | Payload |
|---|---|---|
| `user.linked` | A tenant_user is created via `POST /v1/users/link` | `tenant_id`, `tenant_user_id`, `guild_user_id`, `referred_by_tenant_user_id`, `created_at` |
| `transaction.reported` | A transaction is successfully reported via `POST /v1/transactions` | `tenant_id`, `transaction_id`, `platform_commission_cents`, `idempotency_key`, `settled_at` |
| `transaction.reversed` | `reverseTransaction` is called | `tenant_id`, `original_transaction_id`, `reversed_at`, `reversed_amount_cents` |
| `tokens.minted` | Tokens are awarded after a settled transaction | `tenant_id`, `tenant_user_id`, `transaction_id`, `raw_tokens`, `role` |
| `settlement.completed` | Daily tenant settlement runs to completion | `tenant_id`, `settlement_date`, `total_final_tokens`, `token_value_cents`, `distributed_pool_cents` |
| `wallet.updated` | User's `balance_cents` changes | `tenant_id`, `tenant_user_id`, `old_balance_cents`, `new_balance_cents`, `cause` (`settlement`, `reversal`, `adjustment`) |
| `invoice.generated` | Monthly protocol fee invoice is generated | `tenant_id`, `period_start`, `period_end`, `invoice_id`, `amount_cents` |
| `invoice.paid` | Protocol fee invoice is marked paid | `tenant_id`, `invoice_id`, `paid_at`, `amount_cents` |

Tenants subscribe to a subset by event type.

---

## 4) Payload shape

```json
{
  "id": "evt_01HVZ...",
  "type": "transaction.reported",
  "created_at": "2026-05-01T12:00:00.000Z",
  "tenant_id": "hooks-ly",
  "api_version": "2026-04-01",
  "data": {
    "transaction_id": "txn_01HVZ...",
    "platform_commission_cents": 1000,
    "idempotency_key": "hooks-ly-txn-42",
    "settled_at": "2026-05-01T11:59:57.000Z",
    "buyer_tenant_user_id": "tu_01HVZ...",
    "seller_tenant_user_id": null
  }
}
```

Delivery headers:
- `Guild-Signature: t=1714564800,v1=<hmac>` (Stripe-style t/v format)
- `Guild-Event-Id: evt_01HVZ...`
- `Guild-Event-Type: transaction.reported`
- `Guild-Api-Version: 2026-04-01`

Signature is `HMAC_SHA256(secret, "{timestamp}.{body}")`. The `t` value is the timestamp the delivery was signed (for replay protection with a 5-minute tolerance window).

---

## 5) Design

### Storage

New Postgres tables (projections of immudb):

```
tenant_webhook_endpoints
  id, tenant_id, url, secret (encrypted),
  subscribed_events text[],
  active boolean, created_at, last_success_at, last_failure_at,
  consecutive_failure_count

tenant_webhook_deliveries
  id, endpoint_id, event_id, event_type,
  payload jsonb, signature,
  attempt_count, status ('pending','delivered','failed','dead_lettered'),
  next_attempt_at, last_response_code, last_response_body,
  created_at, delivered_at
```

### Delivery workflow

Reuse the existing hub_event_outbox pattern:

1. Events are written to `tenant_webhook_deliveries` with `status='pending'` inside the same DB transaction as the triggering event (guarantees at-least-once delivery)
2. A Temporal-orchestrated worker polls pending rows, attempts HTTPS POST, updates status
3. Retry schedule: exponential backoff over 72 hours (attempts at 30s, 2m, 5m, 15m, 1h, 3h, 6h, 12h, 24h, 48h, 72h)
4. After 72h of failures, status → `dead_lettered`, consecutive_failure_count incremented on endpoint
5. If endpoint's `consecutive_failure_count` exceeds threshold (default 100), auto-disable and alert ops

### Marketplace-facing APIs

```
POST   /v1/webhooks/endpoints      Register a new endpoint
GET    /v1/webhooks/endpoints      List your endpoints
PATCH  /v1/webhooks/endpoints/:id  Update url/events/active state
DELETE /v1/webhooks/endpoints/:id  Remove endpoint
POST   /v1/webhooks/endpoints/:id/rotate-secret  Generate new signing secret
GET    /v1/webhooks/deliveries?endpoint_id=&status=  Delivery history for debugging
POST   /v1/webhooks/deliveries/:id/retry  Manually retry a dead-lettered delivery
```

---

## 6) Testing

- Unit: HMAC signature generation + verification round trip
- Unit: exponential backoff schedule correctness
- Integration: end-to-end event → delivery → retry cycle
- Integration: `consecutive_failure_count` auto-disables endpoint
- Scenario: webhook endpoint returns 5xx → delivery retries → recovers → delivers
- Scenario: webhook endpoint returns 4xx → delivery does NOT retry (client error)
- Scenario: replay attack with stale timestamp → rejected by receiver verification example
- Scenario: marketplace rotates secret → new deliveries use new secret, in-flight deliveries fail gracefully

---

## 7) Operator integration guide (draft)

To be written alongside implementation:
- Signature verification example in TypeScript, Python, Go, Ruby
- Idempotent consumer pattern (dedupe by `Guild-Event-Id`)
- How to handle "same event with different payload" during payload schema evolution
- How to test receipt locally (recommended: use ngrok + delivery retry from debug endpoint)

---

## 8) Risks and open questions

### Risks

- **Webhook endpoint abuse** — a malicious/misconfigured endpoint URL that returns 2xx always but doesn't actually process could silently eat critical events. Mitigation: require the marketplace to confirm receipt via an out-of-band ping during endpoint registration.
- **Secret leakage** — HMAC secrets stored in the database need to be encrypted at rest. Use existing encryption pattern for `tenant_api_keys`.
- **Thundering herd** — if Guild has an outage, when it comes back there will be a flood of queued events. Need rate-limiting on outbound delivery per endpoint.

### Open questions

- **Custom headers from marketplace?** Some operators will want to add a custom header (e.g., their own auth token on top of Guild's signature). Probably yes in v2 of the subsystem.
- **Batched delivery option?** Single-event POST per delivery is simpler but higher overhead. Batched POST (up to 100 events per request) could be offered as an opt-in for high-volume tenants.
- **WebSocket alternative?** Some marketplaces may prefer a persistent connection over webhook polling. Deferred to post-v1 of this subsystem.

---

## 9) Rollout plan

1. Ship endpoint registration + in-memory delivery (no persistence yet) for internal testing
2. Add Postgres persistence + Temporal worker for durable delivery
3. Add marketplace-facing APIs + docs
4. Beta with 2–3 friendly tenants for 2 weeks
5. Open to all tenants
6. Deprecate the polling recommendation in Launch Readiness doc

---

## 10) Acceptance criteria

- A marketplace can register a webhook URL, subscribe to events, and receive signed payloads within 60 seconds of the triggering event
- Delivery is at-least-once with idempotent semantics via `Guild-Event-Id`
- Signature verification follows a documented pattern operators can implement in any language
- Dead-lettered deliveries are inspectable and manually retryable
- Endpoint auto-disable prevents runaway retry loops on permanently-broken URLs
- Operator ops (Guild side) can see per-endpoint delivery health
