[INFO][DevOps][Integration] Complete client wiring — real APIs, IT sign-off, hypercare

## Overview

**Type:** INFO — Client integration & go-live playbook
**Audience:** Delivery engineers, client DevOps, client IT/security, CS hypercare
**Related:** [#13 Deployment Case Study](https://github.com/EngineXV/engineX/issues/13) · [#14 Storage Architecture](https://github.com/EngineXV/engineX/issues/14) · [engineX-internal #5 Onboarding runbook](https://github.com/EngineXV/engineX-internal/issues/5) · [engineX-internal #6 Security brief](https://github.com/EngineXV/engineX-internal/issues/6)

This ticket is the **complete client integration guide** for taking a pilot from **deployed EngineX** to **production with real systems** — not mocks. It covers:

1. **Custom agent wiring** — real DB/APIs, tools, credentials
2. **Client IT sign-off** — VPN, TLS, network, security review
3. **Hypercare** — first 2–4 weeks after go-live

Use alongside [#13](https://github.com/EngineXV/engineX/issues/13) (install) and [#14](https://github.com/EngineXV/engineX/issues/14) (storage).

---

## Ticket metadata

| Field | Value |
|-------|-------|
| **Phase** | Phase 1 — pilot delivery |
| **Priority** | P0 — required for every signed client |
| **Blocks** | Design partner go-live, repeatable onboarding |
| **Out of scope** | Building Engine Cloud (#4), platform multi-tenant |

---

## 1. Integration lifecycle (end-to-end)

```mermaid
flowchart LR
 K["Kickoff scope + systems"] --> D["Deploy EngineX #13"]
 D --> W["Wire agent real DB/APIs"]
 W --> I["IT sign-off VPN/TLS/firewall"]
 I --> G["Go-live"]
 G --> H["Hypercare 2–4 weeks"]
 H --> S["Steady state client owns ops"]
```

| Phase | Owner | Duration |
|-------|-------|----------|
| Kickoff + discovery | EngineX + client sponsor | Week 0 |
| Deploy + base config | Client DevOps + EngineX | Week 1 |
| Custom agent wiring | EngineX engineering | Week 1–2 |
| IT / security review | Client IT | Week 1–2 (parallel) |
| UAT + HITL training | Client ops + EngineX | Week 2 |
| Go-live | Joint | End of week 2–3 |
| Hypercare | EngineX CS + client ops | Weeks 3–6 |

---

## 2. Custom agent wiring — real DB/APIs (not mocks)

### 2.1 What “mock → real” means

Shipped templates often include **demo tools** that return hardcoded JSON. Production pilots **replace or extend** these with tools that call the client’s systems.

| Template | Mock today | Production wiring |
|----------|------------|-------------------|
| `hourly_tracking` | `fetch_broker_transactions()`, `fetch_investor_logs()` return static JSON | Client DB/API/ETL for broker + investor feeds |
| `log_monitor` | Ready for real Grafana/Slack **if env vars set** | `GRAFANA_*`, `SLACK_*` — usually minimal code change |
| `agreement_analysis` | LLM-only extraction | Document source: S3, SharePoint API, file drop |
| `support_triage` | JSON `--input` | CRM/ticket webhook or API poll (Zendesk, Salesforce) |
| `deep_research` | Demo web search fallback | `BRAVE_SEARCH_API_KEY` or client-approved search API |

**Rule:** EngineX platform state stays on `~/.engine/` ([#14](https://github.com/EngineXV/engineX/issues/14)). The **client’s database** is accessed only through **agent tools** — never as the EngineX platform store.

### 2.2 Integration patterns

```mermaid
flowchart TB
 subgraph ClientVPC["Client VPC"]
 RT["EngineX runtime"]
 Tools["Agent tools.py / MCP"]
 RT --> Tools
 end

 subgraph ClientSystems["Client systems"]
 DB["Postgres / warehouse"]
 API["REST / GraphQL APIs"]
 SaaS["Salesforce / Grafana / Slack"]
 Files["S3 / SFTP / file drop"]
 end

 Tools --> DB
 Tools --> API
 Tools --> SaaS
 Tools --> Files
```

| Pattern | When to use | Example |
|---------|-------------|---------|
| **Env + API key** | Simple REST, webhooks | Grafana, Slack, PagerDuty |
| **OAuth (dashboard Connect)** | SaaS with user consent | HubSpot, Google Calendar, Zoho |
| **Custom `@tool` in `tools.py`** | Client-specific SQL/HTTP | Hourly tracking → Postgres read |
| **MCP server** | Reusable connector, stdio/HTTP | Calendar, internal microservice |
| **Read-only DB user** | Reporting / reconciliation | `SELECT` on replica, not primary write |
| **API gateway in front of DB** | Client security preference | EngineX calls internal API, not raw SQL |

### 2.3 Custom tool implementation checklist

For each data source the agent needs:

- [ ] **Discovery** — schema, rate limits, auth method, PII fields, retention policy
- [ ] **Credential** — stored in `~/.engine/credentials/` or client secret manager → env injection
- [ ] **Tool contract** — input/output JSON shape matches node `output_keys`
- [ ] **Read vs write** — prefer read-only for pilots; explicit approval for writes
- [ ] **Error handling** — timeouts, retries, idempotency documented
- [ ] **Network** — outbound allowlist from EngineX subnet to endpoint (see Section 3)
- [ ] **Validate** — `./engine validate <agent>` passes with new tools registered
- [ ] **Test run** — `./engine run <agent> --input '...'` against **staging** data first

**Code locations:**

- Agent-local tools: `examples/templates/<agent>/tools.py` (`@tool` decorator)
- MCP config: `examples/templates/<agent>/mcp_servers.json`
- Credential wizard: `./engine setup-credentials examples/templates/<agent>`
- Integration index: `examples/templates/integrations/README.md`

### 2.4 Example: hourly_tracking → real Postgres (illustrative)

Replace mock fetch tools with read-only SQL against client replica:

```python
# examples/templates/hourly_tracking/tools.py (client fork)
@tool(description="Fetch broker transactions from last hour.")
def fetch_broker_transactions() -> dict[str, Any]:
 # Credentials from env: CLIENT_DB_URL (injected by client secret manager)
 # SELECT ... WHERE created_at > now() - interval '1 hour'
 ...
```

**Client provides:**

- Read-only DB user + connection string (or internal HTTP API wrapping the query)
- VPN/peering if DB is not reachable from EngineX subnet
- Sample row schema + test dataset for UAT

### 2.5 Example: log_monitor (minimal wiring)

Often **no custom code** — configure env only:

```bash
GRAFANA_URL=https://grafana.client.internal
GRAFANA_API_TOKEN=...
GRAFANA_DATASOURCE_UID=...
SLACK_WEBHOOK_URL=...
```

Deploy with systemd: `examples/templates/log_monitor/deploy/engine-log-monitor.service`

### 2.6 Per-integration credential matrix

| Integration | Client secret | How EngineX loads it |
|-------------|---------------|----------------------|
| LLM (Anthropic/OpenAI) | API key | `.env` or `~/.engine/credentials` |
| Grafana | API token | Agent env file |
| Slack | Webhook or bot token | Env / credentials store |
| HubSpot / Zoho | OAuth client ID + secret | Dashboard **Connect** + encrypted vault |
| Google Calendar | OAuth | Dashboard **Connect** + MCP |
| Client Postgres | Connection string | Secret manager → env (never in git) |
| Internal REST API | Bearer / mTLS cert | Env or mounted cert volume |

---

## 3. Client IT sign-off — VPN, TLS, network

### 3.1 Architecture with security boundary

```mermaid
flowchart TB
 subgraph Internet["Internet / corporate network"]
 Users["Analysts / approvers"]
 end

 subgraph Edge["Client edge"]
 VPN["VPN / Zero Trust"]
 Proxy["Reverse proxy TLS + SSO"]
 end

 subgraph Private["Private subnet"]
 Eng["EngineX :8787 internal only"]
 Vol["~/.engine volume"]
 Eng --- Vol
 end

 subgraph Data["Client data plane"]
 DB["DB / APIs"]
 SaaS["External SaaS"]
 end

 Users --> VPN --> Proxy --> Eng
 Eng --> DB
 Eng --> SaaS
```

**Principle:** `:8787` is **never** exposed directly to the public internet in production.

### 3.2 IT security review checklist (client IT)

Provide this to client security team for sign-off:

#### Network
- [ ] EngineX runs in **private subnet** (no public IP on runtime, or restricted SG)
- [ ] Inbound: only from **VPN / Zero Trust / corporate IP range** to reverse proxy
- [ ] Outbound: allowlist to LLM APIs + agreed integration endpoints
- [ ] No inbound from EngineX to client DB — **EngineX initiates outbound** connections only

#### Dashboard access (`./engine serve`)
- [ ] **TLS terminated** at reverse proxy (nginx, ALB, Cloudflare, etc.)
- [ ] **SSO / SAML / OIDC** in front of dashboard (client IdP) **or** VPN-only access
- [ ] Session cookies / headers per client security policy
- [ ] Optional: IP allowlist for `:8787` backend

#### Example reverse proxy (nginx sketch)

```nginx
server {
 listen 443 ssl;
 server_name enginex.client.internal;

 ssl_certificate /etc/ssl/certs/client.crt;
 ssl_certificate_key /etc/ssl/private/client.key;

 location / {
 proxy_pass http://127.0.0.1:8787;
 proxy_set_header Host $host;
 proxy_set_header X-Real-IP $remote_addr;
 proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
 proxy_set_header X-Forwarded-Proto $scheme;
 }
}
```

EngineX bind (internal only):

```bash
./engine serve --host 127.0.0.1 --port 8787
# or --host 0.0.0.0 only if SG restricts to proxy subnet
```

#### Secrets & data
- [ ] Secrets from **client secret manager** (AWS SM, GCP SM, Vault) — not in image/git
- [ ] `ENGINE_CREDENTIAL_KEY` set; `~/.engine` on **encrypted volume** ([#14](https://github.com/EngineXV/engineX/issues/14))
- [ ] Backup/snapshot policy for `~/.engine` documented
- [ ] LLM data handling reviewed (prompts may contain client data — DPA with LLM vendor)
- [ ] No platform database — client data in their VPC on their disk

#### Access & audit
- [ ] Named **HITL approvers** with dashboard access
- [ ] Run history / ops console retention acceptable to client
- [ ] Incident contact list (EngineX + client ops)

#### Sign-off artifact
- [ ] Client IT **written approval** (email or security ticket closure) before production go-live
- [ ] Reference [engineX-internal #6](https://github.com/EngineXV/engineX-internal/issues/6) security brief when available

### 3.3 Headless-only clients (no dashboard)

If pilot is **headless only** (`./engine run --daemon`):

- [ ] No `:8787` exposure required
- [ ] IT review focuses on outbound network + secrets + volume encryption
- [ ] HITL may be terminal-based or deferred — document explicitly

---

## 4. Hypercare — first 2–4 weeks after go-live

### 4.1 Hypercare model

```mermaid
gantt
 title Pilot hypercare timeline
 dateFormat YYYY-MM-DD
 section Go-live
 Production cutover :milestone, m1, 2026-01-15, 0d
 section Hypercare
 Daily check-ins (week 1) :a1, 2026-01-15, 7d
 Every-other-day (week 2) :a2, after a1, 7d
 Weekly (weeks 3–4) :a3, after a2, 14d
 section Steady state
 Client-owned ops :milestone, m2, after a3, 0d
```

| Week | EngineX involvement | Client involvement |
|------|---------------------|-------------------|
| **1** | Daily 15-min standup; monitor runs/alerts; hotfix tools | Approvers available; report issues same day |
| **2** | Every-other-day check-in; tune prompts/thresholds | Ops owns dashboard; IT on call for network |
| **3–4** | Weekly review; document runbook handoff | Client primary on-call |
| **After** | SLA per contract; escalation path only | Full ownership |

### 4.2 Hypercare daily checklist (EngineX)

- [ ] Agent runs completed successfully (daemon/cron logs clean)
- [ ] No stuck HITL sessions blocking pipeline
- [ ] Integration errors (API 4xx/5xx, DB timeouts) triaged
- [ ] Checkpoint/resume tested if client uses dashboard
- [ ] `~/.engine` disk usage within limits
- [ ] LLM cost / rate limits normal

### 4.3 Hypercare daily checklist (client)

- [ ] HITL queue reviewed (approvers acted on pending items)
- [ ] Business outcome spot-check (e.g. reconciliations correct, alerts meaningful)
- [ ] Escalate to EngineX with logs / session ID if anomaly

### 4.4 Common hypercare issues

| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| Agent can’t reach DB | SG / VPN / wrong connection string | IT opens path; rotate creds |
| OAuth Connect fails | Redirect URI / client ID mismatch | Fix OAuth app config |
| HITL never resumes | Approver not trained / pause node stuck | Training; dashboard inject |
| Mock data still appearing | Old tools.py not deployed | Redeploy client agent fork |
| Dashboard 502 | Proxy misconfig / serve not running | Check nginx → `:8787` |
| LLM errors | Key expired / model access | Rotate key; check quota |

### 4.5 Hypercare exit criteria (handoff to steady state)

- [ ] **14 days** minimum hypercare completed (extend to 28 for finance clients)
- [ ] **7 consecutive days** without P1 incident
- [ ] Client ops can: restart service, read logs, approve HITL, rotate env secrets
- [ ] **Runbook** delivered (deploy, rollback, contacts, escalation)
- [ ] **Success metrics** vs kickoff goals documented (see Section 5)

---

## 5. Kickoff discovery worksheet (complete before wiring)

Collect in kickoff call — blocks custom integration:

| # | Question | Answer (client) |
|---|----------|-----------------|
| 1 | Primary agent / workflow? | |
| 2 | Success metric at 30 days? | |
| 3 | Data sources (DB name, API docs, owner)? | |
| 4 | Read-only or read-write? | |
| 5 | Staging environment available? | |
| 6 | HITL approvers (names, roles)? | |
| 7 | Headless, dashboard, or hybrid? | |
| 8 | How analysts reach dashboard (VPN/SSO URL)? | |
| 9 | LLM provider allowed (Anthropic/OpenAI/Ollama in VPC)? | |
| 10 | IT security contact for sign-off? | |
| 11 | Hypercare window (2 vs 4 weeks)? | |
| 12 | Go-live date target? | |

---

## 6. Go-live checklist (joint)

### Pre-go-live
- [ ] `./engine validate <agent>` passes in **staging**
- [ ] Real integration tested against staging data (not mocks)
- [ ] Production secrets injected (not staging keys)
- [ ] `~/.engine` volume mounted + backup verified
- [ ] IT sign-off complete (Section 3)
- [ ] HITL approvers trained on dashboard
- [ ] Rollback plan documented (stop systemd/cron; restore volume snapshot)

### Go-live day
- [ ] Enable production schedule / daemon
- [ ] EngineX + client on shared bridge call
- [ ] First production run monitored end-to-end
- [ ] HITL test case executed (if applicable)

### Post-go-live
- [ ] Hypercare schedule confirmed (Section 4)
- [ ] Incident channel live (Slack/Teams/email)

---

## 7. Deliverables (this ticket)

### Documentation
- [ ] This issue = canonical integration reference on GitHub
- [ ] Link from [#13](https://github.com/EngineXV/engineX/issues/13) Section 12 Support / Handoff
- [ ] Update [engineX-internal #5](https://github.com/EngineXV/engineX-internal/issues/5) — point hypercare + wiring here
- [ ] Kickoff worksheet (Section 5) as copy-paste template for CS

### Per-client (each pilot)
- [ ] Completed discovery worksheet
- [ ] Custom tools/MCP deployed (mock → real)
- [ ] IT sign-off record
- [ ] Hypercare log (daily notes weeks 1–4)
- [ ] Handoff runbook at exit

### Engineering (as needed per vertical)
- [ ] Client agent fork or export with real `tools.py`
- [ ] Staging + prod env file templates (no secrets in repo)

---

## 8. Definition of done

**Ticket done (playbook published):**
- [ ] Team can execute mock → real wiring using Section 2
- [ ] IT checklist (Section 3) sendable to client security unchanged
- [ ] Hypercare schedule (Section 4) used on every pilot

**Per-client integration done:**
- [ ] Real systems connected; no mock tools in production path
- [ ] IT written sign-off
- [ ] 2–4 weeks hypercare completed with exit criteria met
- [ ] Client ops owns steady-state runbook

---

## 9. Related links

- Deploy: [#13](https://github.com/EngineXV/engineX/issues/13)
- Storage: [#14](https://github.com/EngineXV/engineX/issues/14)
- Onboarding GTM: [engineX-internal #5](https://github.com/EngineXV/engineX-internal/issues/5)
- Security brief: [engineX-internal #6](https://github.com/EngineXV/engineX-internal/issues/6)
- Integrations sales doc: [engineX-internal #7](https://github.com/EngineXV/engineX-internal/issues/7)
- Design partner: [engineX-internal #2](https://github.com/EngineXV/engineX-internal/issues/2)
- Repo integrations: `examples/templates/integrations/README.md`

---

*Version: 2026-06-28 — Complete client integration, IT sign-off, and hypercare*

Template	Mock today	Production wiring
`hourly_tracking`	`fetch_broker_transactions()`, `fetch_investor_logs()` return static JSON	Client DB/API/ETL for broker + investor feeds
`log_monitor`	Ready for real Grafana/Slack if env vars set	`GRAFANA_`, `SLACK_` — usually minimal code change
`agreement_analysis`	LLM-only extraction	Document source: S3, SharePoint API, file drop
`support_triage`	JSON `--input`	CRM/ticket webhook or API poll (Zendesk, Salesforce)
`deep_research`	Demo web search fallback	`BRAVE_SEARCH_API_KEY` or client-approved search API

Field	Value
Phase	Phase 1 — pilot delivery
Priority	P0 — required for every signed client
Blocks	Design partner go-live, repeatable onboarding
Out of scope	Building Engine Cloud (#4), platform multi-tenant

Phase	Owner	Duration
Kickoff + discovery	EngineX + client sponsor	Week 0
Deploy + base config	Client DevOps + EngineX	Week 1
Custom agent wiring	EngineX engineering	Week 1–2
IT / security review	Client IT	Week 1–2 (parallel)
UAT + HITL training	Client ops + EngineX	Week 2
Go-live	Joint	End of week 2–3
Hypercare	EngineX CS + client ops	Weeks 3–6

Pattern	When to use	Example
Env + API key	Simple REST, webhooks	Grafana, Slack, PagerDuty
OAuth (dashboard Connect)	SaaS with user consent	HubSpot, Google Calendar, Zoho
Custom `@tool` in `tools.py`	Client-specific SQL/HTTP	Hourly tracking → Postgres read
MCP server	Reusable connector, stdio/HTTP	Calendar, internal microservice
Read-only DB user	Reporting / reconciliation	`SELECT` on replica, not primary write
API gateway in front of DB	Client security preference	EngineX calls internal API, not raw SQL

Integration	Client secret	How EngineX loads it
LLM (Anthropic/OpenAI)	API key	`.env` or `~/.engine/credentials`
Grafana	API token	Agent env file
Slack	Webhook or bot token	Env / credentials store
HubSpot / Zoho	OAuth client ID + secret	Dashboard Connect + encrypted vault
Google Calendar	OAuth	Dashboard Connect + MCP
Client Postgres	Connection string	Secret manager → env (never in git)
Internal REST API	Bearer / mTLS cert	Env or mounted cert volume

Week	EngineX involvement	Client involvement
1	Daily 15-min standup; monitor runs/alerts; hotfix tools	Approvers available; report issues same day
2	Every-other-day check-in; tune prompts/thresholds	Ops owns dashboard; IT on call for network
3–4	Weekly review; document runbook handoff	Client primary on-call
After	SLA per contract; escalation path only	Full ownership

Symptom	Likely cause	Fix
Agent can’t reach DB	SG / VPN / wrong connection string	IT opens path; rotate creds
OAuth Connect fails	Redirect URI / client ID mismatch	Fix OAuth app config
HITL never resumes	Approver not trained / pause node stuck	Training; dashboard inject
Mock data still appearing	Old tools.py not deployed	Redeploy client agent fork
Dashboard 502	Proxy misconfig / serve not running	Check nginx → `:8787`
LLM errors	Key expired / model access	Rotate key; check quota

#	Question	Answer (client)
1	Primary agent / workflow?
2	Success metric at 30 days?
3	Data sources (DB name, API docs, owner)?
4	Read-only or read-write?
5	Staging environment available?
6	HITL approvers (names, roles)?
7	Headless, dashboard, or hybrid?
8	How analysts reach dashboard (VPN/SSO URL)?
9	LLM provider allowed (Anthropic/OpenAI/Ollama in VPC)?
10	IT security contact for sign-off?
11	Hypercare window (2 vs 4 weeks)?
12	Go-live date target?

Uh oh!

[INFO][DevOps][Integration] Complete client wiring — real APIs, IT sign-off, hypercare #15

Description

Overview

Ticket metadata

1. Integration lifecycle (end-to-end)

2. Custom agent wiring — real DB/APIs (not mocks)

2.1 What “mock → real” means

2.2 Integration patterns

2.3 Custom tool implementation checklist

2.4 Example: hourly_tracking → real Postgres (illustrative)

2.5 Example: log_monitor (minimal wiring)

2.6 Per-integration credential matrix

3. Client IT sign-off — VPN, TLS, network

3.1 Architecture with security boundary

3.2 IT security review checklist (client IT)

Network

Dashboard access (./engine serve)

Example reverse proxy (nginx sketch)

Secrets & data

Access & audit

Sign-off artifact

3.3 Headless-only clients (no dashboard)

4. Hypercare — first 2–4 weeks after go-live

4.1 Hypercare model

4.2 Hypercare daily checklist (EngineX)

4.3 Hypercare daily checklist (client)

4.4 Common hypercare issues

4.5 Hypercare exit criteria (handoff to steady state)

5. Kickoff discovery worksheet (complete before wiring)

6. Go-live checklist (joint)

Pre-go-live

Go-live day

Post-go-live

7. Deliverables (this ticket)

Documentation

Per-client (each pilot)

Engineering (as needed per vertical)

8. Definition of done

9. Related links

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Dashboard access (`./engine serve`)