## Error Handling Checklist - [ ] Shell script error handling: - [ ] All scripts use `set -euo pipefail` - [ ] Add proper error handling with trap handlers - [ ] Add retry logic with exponential backoff for transient failures - [ ] Add timeout handling for external calls - [ ] Add proper cleanup on exit (trap EXIT) - [ ] Add structured error messages with context - [ ] Docker health checks: - [ ] Improve health checks for all services (add proper endpoints) - [ ] Add startup probes (not just health checks) - [ ] Add readiness probes - [ ] Configure appropriate timeouts and retries - [ ] Docker Compose resilience: - [ ] Add restart policies (already: unless-stopped) - [ ] Add depends_on with condition: service_healthy - [ ] Add resource limits (CPU, memory limits) - [ ] Add health check intervals/timeouts appropriate for each service - [ ] Service-level resilience: - [ ] Add circuit breakers for external API calls - [ ] Add retry logic with exponential backoff for external calls - [ ] Add timeout configuration for all external calls - [ ] Add dead letter queue for failed operations - [ ] Error monitoring: - [ ] Structured error logging with context - [ ] Error rate alerting - [ ] Dead man's switch for critical services - [ ] Add error handling documentation to docs/ERROR_HANDLING.md
Error Handling Checklist
set -euo pipefail