# Tenant Graph Export PRD

**Status:** Draft for implementation
**Last updated:** 2026-04-16
**Priority:** P2 — committed as a trust/anti-lock-in gesture

---

## 1) Purpose

Give marketplace operators a first-class way to export the full state they have accumulated on Guild — referral graph, tenant user records, transaction history, wallet ledger — in a machine-readable format they can take with them.

Two motivations:

1. **Anti-lock-in.** Removing switching-cost concerns makes the integration decision easier. If an operator can leave with their data, they are more willing to commit in the first place.
2. **Disaster-recovery mirror.** Some operators want an independent off-Guild copy of their referral state for their own archival or regulatory needs.

This is the second-most-requested trust-related capability from operator conversations.

---

## 2) Scope

### In scope

- Tenant-scoped export: one marketplace gets its own data, not other tenants'
- Authenticated via tenant API key with explicit `export` scope
- Output format: streamed NDJSON (newline-delimited JSON) with one record per line
- Multiple object types in a single export (users, referral edges, transactions, ledger entries)
- Async for large tenants (export job → signed download URL when ready)
- Sync for small tenants (direct streaming response under a size threshold)
- Monthly rate-limit (operators don't need this every hour)

### Out of scope for v1 of this feature

- Cross-tenant export (a `guild_user`'s activity across multiple marketplaces — not scoped to one tenant)
- Incremental / delta exports (each export is a full snapshot)
- Re-import — this is export only; there is no "restore from export" flow
- Custom schema selection — fixed schema in v1

---

## 3) Export schema

Export is a tar.gz archive containing the following NDJSON files:

### `tenant_users.ndjson`

```json
{"id": "tu_01HVZ...", "guild_user_id": "gu_01HVZ...", "phone_hash": "abc...", "created_at": "2026-01-15T...", "referred_by_tenant_user_id": "tu_01HVW..."}
```

### `referral_edges.ndjson`

One line per directed `REFERRED_ON` edge in the tenant's Neo4j projection:

```json
{"from_tenant_user_id": "tu_01HVW...", "to_tenant_user_id": "tu_01HVZ...", "created_at": "2026-01-15T..."}
```

### `transactions.ndjson`

```json
{"id": "txn_01HVZ...", "idempotency_key": "hooks-ly-42", "buyer_tenant_user_id": "tu_...", "seller_tenant_user_id": null, "platform_commission_cents": 1000, "settled_at": "2026-01-15T...", "reported_at": "2026-01-15T...", "reversed_at": null}
```

### `token_awards.ndjson`

```json
{"id": "tok_01HVZ...", "transaction_id": "txn_...", "tenant_user_id": "tu_...", "raw_tokens": 5, "role": "buyer_direct", "bonus_multiplier": 2.0, "final_tokens": 10}
```

### `wallet_ledger.ndjson`

Append-only ledger entries (current state is the sum):

```json
{"id": "wal_01HVZ...", "tenant_user_id": "tu_...", "delta_cents": 2177, "cause": "settlement", "reference_id": "settlement_2026-01-15", "created_at": "2026-01-16T00:05:00.000Z"}
```

### `settlements.ndjson`

```json
{"id": "set_01HVZ...", "settlement_date": "2026-01-15", "total_commission_cents": 1800000, "gross_program_allocation_cents": 540000, "protocol_fee_cents": 162000, "distributed_pool_cents": 378000, "total_final_tokens": 23, "token_value_cents_e6": 164347826}
```

### `metadata.json`

```json
{
  "tenant_id": "hooks-ly",
  "export_generated_at": "2026-05-01T12:00:00.000Z",
  "tenant_created_at": "2025-11-01T00:00:00.000Z",
  "export_format_version": "1.0",
  "record_counts": {
    "tenant_users": 14203,
    "referral_edges": 22191,
    "transactions": 8821,
    "token_awards": 34992,
    "wallet_ledger": 41733,
    "settlements": 163
  }
}
```

---

## 4) API surface

### Sync path (small tenants, <100MB estimated)

```
GET /v1/tenants/:tenant_id/export
Authorization: Bearer <tenant_api_key_with_export_scope>

→ 200 application/gzip, Content-Disposition: attachment; filename="hooks-ly-export-2026-05-01.tar.gz"
```

### Async path (large tenants)

```
POST /v1/tenants/:tenant_id/exports
Authorization: Bearer <tenant_api_key_with_export_scope>

→ 202 Accepted
{ "export_id": "exp_01HVZ...", "status": "queued", "estimated_duration_seconds": 900 }

GET /v1/tenants/:tenant_id/exports/:export_id
→ 200
{ "export_id": "...", "status": "completed", "download_url": "https://...", "expires_at": "2026-05-02T..." }
```

Signed download URL expires after 7 days; export is deleted from storage after 30 days.

---

## 5) Implementation

### Query layer

Exports run against Postgres projections, not immudb or Neo4j directly. Postgres has all the needed data and is optimized for bulk reads with appropriate indexes.

### Performance

- Stream NDJSON rather than buffer in memory
- Batch Postgres reads with server-side cursors (1000 rows per fetch)
- For async path, write tar.gz directly to Tigris bucket as it streams
- No transaction locks during export — use read-committed snapshot

### Rate limiting

- Sync: 1 export per tenant per day
- Async: 1 export per tenant per hour, max 5 queued

### Audit

Every export is written as an `EXPORT_GENERATED` event to immudb with tenant_id, requesting_api_key_id, record_counts, and export_id. This creates a durable audit trail of who exported what and when.

---

## 6) Testing

- Unit: NDJSON serialization round-trips cleanly
- Integration: export for a tenant with 1 user, 10 users, 10000 users — confirm all included
- Integration: export respects tenant scoping (tenant A cannot export tenant B's data)
- Integration: export with in-flight transactions — captures at export time, not before/after
- Scenario: rate limit prevents 2nd export in the same day
- Scenario: expired download URL returns 410
- Scenario: export includes reversed transactions with reversed_at timestamp

---

## 7) Risks and open questions

### Risks

- **PII exposure** — phone_hash is already hashed, but the export still contains behavior data that is regulated under GDPR/CCPA. Operators must treat the export as PII and protect it accordingly. Document in operator guide.
- **Performance** — a tenant with millions of records could generate a multi-GB archive. Must stream, not buffer. Must not block the main API.
- **Scope creep** — operators will ask "can you add field X to the export?" Accept only fields that already exist in Postgres projections. No custom computations in v1.

### Open questions

- **Include Neo4j-native graph structure?** The referral_edges file captures the key relationships, but the Neo4j property graph has additional structure (indices, composite edges) that doesn't round-trip. Export flat edges in v1; consider `.cypher` format in v2 if demand arises.
- **Include identity resolution history?** The mapping between `guild_user_id` and `tenant_user_id` is in the export via `tenant_users.ndjson`. Whether to include the full `POTENTIALLY_SAME_AS` / identity-merge history is an open question — privacy considerations are higher here.
- **Should operators be able to self-delete an export before expiry?** Yes, likely. Add `DELETE /v1/tenants/:id/exports/:export_id` in initial implementation.

---

## 8) Rollout plan

1. Ship sync path for small tenants first (most operators)
2. Add async path with Tigris bucket storage
3. Add rate limiting and audit trail
4. Beta with 2–3 friendly tenants
5. Publish operator guide in `docs/operator/` explaining when to use sync vs async
6. Update `docs/operator/risks-and-responsibilities.md` to note the capability has shipped
7. Update `docs/operator/faq.md` question "Can I get my referral graph if I leave Guild?" from "Not today" to "Yes, see export docs"

---

## 9) Acceptance criteria

- An authorized tenant API key can export the tenant's full state via a single API call
- Export is complete and internally consistent at a point-in-time
- Export cannot leak one tenant's data to another tenant's key
- Every export generates an immudb audit event
- Large tenant (10k users, 50k transactions) exports in under 15 minutes via async path
- Documentation example round-trips an export through a Python reader without error
