Skip to content

[FEAT] Webhook Delivery Queue with Retry, Dead Letter, and Ordering #8

@oomokaro1

Description

@oomokaro1

[FEAT] Webhook Delivery Queue with Retry, Dead Letter, and Ordering

Priority: High

Difficulty: Hard
Estimated Effort: 3-4 days
Relevant Packages: OrbitStream_backend/, orbitstream_docs/
Labels: enhancement, infrastructure, priority:high

Requirements

1. Job Queue Architecture

  • Implement a Redis-backed job queue (use Bull from @nestjs/bull or a custom implementation)
  • Queue name: webhook-delivery
  • Jobs are enqueued when dispatchWebhook() is called
  • A separate worker process (or same process with @Processor) consumes jobs from the queue

2. Priority Levels

  • payment.confirmed → priority 1 (highest — merchants need this to fulfill orders)
  • session.expired → priority 2
  • session.created → priority 3
  • session.cancelled → priority 3
  • payment.failed → priority 2

3. Retry with Exponential Backoff + Jitter

  • Retry schedule: 1min → 5min → 30min → 2hr → 12hr (max 5 attempts)
  • Add random jitter (±20%) to each interval to prevent thundering herd
  • Use Bull's built-in delay backoff or implement custom backoffDelay(attempt) function
  • After 5 failed attempts, move to dead letter queue

4. Dead Letter Queue

  • Queue name: webhook-dead-letter
  • Dead letter jobs include: original payload, all delivery attempts with timestamps and error messages, merchant ID, event type
  • Merchants can view dead letter entries via GET /v1/webhooks/dead-letter (JWT-authenticated)
  • Merchants can manually retry dead letter entries via POST /v1/webhooks/dead-letter/:id/retry

5. Idempotency

  • Each webhook delivery gets a unique X-OrbitStream-Delivery-Id header (UUID v4)
  • Each delivery also gets X-OrbitStream-Timestamp (ISO 8601)
  • The signature covers: delivery_id + timestamp + payload
  • Merchants must store delivery IDs and reject duplicates
  • Document idempotency in the integration guide

6. Webhook Ordering

  • Webhooks for the same session must arrive in order
  • Use a per-session sequence number: webhook_deliveries table gets a sequence column
  • The worker processes jobs for the same session sequentially (not in parallel)
  • If a job for session X is being processed, subsequent jobs for session X are delayed until the first completes

7. Network Handling

  • HTTP timeout: 10 seconds
  • 2xx response: success, mark delivered
  • 4xx response: don't retry (merchant endpoint is broken), move to dead letter immediately
  • 5xx response: retry with backoff
  • Network error / timeout: retry with backoff
  • Log all delivery attempts with full error details

8. API Endpoints

  • GET /v1/webhooks/deliveries — list recent deliveries (JWT auth, per-merchant)
  • GET /v1/webhooks/dead-letter — list dead letter entries (JWT auth, per-merchant)
  • POST /v1/webhooks/dead-letter/:id/retry — manually retry a dead letter entry
  • DELETE /v1/webhooks/dead-letter/:id — dismiss a dead letter entry

9. Testing

  • Unit tests: retry backoff calculation, priority ordering, dead letter placement
  • Integration tests: verify webhook delivery with mocked HTTP endpoint
  • Test 4xx → dead letter (no retry), 5xx → retry
  • Test ordering: verify webhooks for same session arrive in order
  • Test idempotency: verify delivery IDs are unique and included in headers
  • Load test: verify queue handles 100 concurrent webhook deliveries

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions