Skip to content

[INFO][Security] Data handling & client IT sign-off — VPC install model #16

Description

@mishrapravin114

Overview

Type: INFO — Security & data handling (client IT / finance review)
Audience: Client security teams, finance/compliance reviewers, delivery engineers
Related: #13 Deployment · #14 Storage · #15 Integration + IT sign-off · #4 Multi-tenant (Phase 2 — not required for VPC pilots)

This ticket is the canonical security reference for EngineX OSS deployed in the client's cloud (recommended pilot model). Use it for security questionnaires, IT sign-off, and finance due diligence.


Executive summary (for security reviewers)

Question Answer
Where does customer data live? Client's cloud — on disk at ~/.engine/ (or ENGINE_HOME)
Does EngineX require a customer database? No — platform state is JSON files, not client Postgres
Is data sent to EngineX vendor cloud by default? No — OSS runs entirely in client VPC unless optional Engine Cloud is configured
Multi-tenant isolation in OSS? No — one install = one organization; separate install per client (#14)
Are credentials encrypted at rest? Yes — Fernet encryption when ENGINE_CREDENTIAL_KEY is set
Is the dashboard authenticated out of the box? No — OSS ./engine serve has no built-in login; client must use VPN/SSO/reverse proxy (#15 Section 3)
Does agent data go to LLM providers? Yes — if using cloud LLM (Anthropic/OpenAI); prompts may contain business data — client chooses provider and DPA
SOC2 / ISO certified? Not claimed — client evaluates risk for pilot; see checklist below

Deployment model (security boundary)

flowchart TB
 subgraph ClientVPC["Client VPC / private network"]
 Proxy["Reverse proxy<br/>TLS + SSO/VPN"]
 Eng["EngineX runtime<br/>CLI + optional :8787"]
 Vol["Encrypted volume<br/>~/.engine/"]
 Eng --> Vol
 Proxy --> Eng
 end

 subgraph External["Outbound only (allowlisted)"]
 LLM["LLM API<br/>(client's choice)"]
 Int["Integrations<br/>Slack, Grafana, CRM, …"]
 end

 Users["Client users / approvers"] --> Proxy
 Eng --> LLM
 Eng --> Int
Loading

Key principle: EngineX vendor does not host customer production data in the default OSS model. The client controls network, disk encryption, access, and backup.


Data classification

Data type Location Encrypted at rest Leaves client VPC?
Session state, checkpoints ~/.engine/agents/.../ Volume encryption (client responsibility) No
OAuth tokens, API keys ~/.engine/credentials/ Yes (Fernet + ENGINE_CREDENTIAL_KEY) No
Credential bootstrap key ~/.engine/secrets/credential_key or env File permissions / secret manager No
Agent business inputs (documents, tickets) Session memory + checkpoint JSON Same as volume Only if sent to LLM or integration APIs
Runtime logs ~/.engine/.../runtime_logs/ Volume encryption No
LLM prompts/completions Transient / provider-side Provider-dependent Yes — to chosen LLM vendor

Client business databases (Postgres, warehouse, ERP) are accessed only if an agent tool is wired to them — not for EngineX platform storage (#15).


Credentials & secrets

Encryption at rest

  • ENGINE_CREDENTIAL_KEY — Fernet key; encrypts OAuth tokens and stored API keys
  • Storage: core/engine/credentials/EncryptedFileStorage, CredentialStore.with_encrypted_storage()
  • Bootstrap: key from env, ~/.engine/secrets/credential_key, or generated on first setup

Secret injection (client responsibility)

  • LLM keys (ANTHROPIC_API_KEY, OPENAI_API_KEY) via env or secret manager — not in git/images
  • Integration secrets (Grafana, Slack, CRM) via env or encrypted vault
  • OAuth client secrets (HUBSPOT_CLIENT_ID / SECRET, etc.) for dashboard Connect flow
  • Rotate keys per environment (dev/staging/prod separate installs)

What EngineX does NOT do (OSS today)

  • No built-in secrets rotation scheduler
  • No HSM integration (client can inject via env from their vault)
  • No automatic redaction of PII in logs (operational discipline required)

Network security

Recommended production controls (#15 Section 3)

Control Requirement
Dashboard (:8787) Never public internet without TLS + auth
Access path VPN, Zero Trust, or corporate network + reverse proxy
TLS Terminated at nginx/ALB/Cloudflare in front of EngineX
Bind address Prefer 127.0.0.1:8787 behind proxy
Inbound firewall Deny by default; allow only proxy/VPN sources
Outbound firewall Allowlist LLM endpoints + agreed integration APIs
Headless-only mode No :8787 exposure; outbound-only

OSS dashboard authentication gap (honest)

./engine serve today:

  • No built-in user login / RBAC on the HTTP API
  • Mitigation: Client IT places SSO (OIDC/SAML) or VPN in front of dashboard; or run headless only
  • Future: Multi-tenant auth is #4 Phase 2 / Engine Cloud — not required for dedicated VPC installs

Isolation & tenancy

Model Isolation mechanism DB required?
Client self-host (recommended) Separate install + separate ~/.engine volume per client No
Dev vs prod Separate installs or separate volumes No
Multi-tenant SaaS Not in OSS — see #4 Phase 2

Never run unrelated clients on one OSS install expecting tenant isolation — OSS does not enforce tenant_id on storage paths today.


LLM & third-party data flow

sequenceDiagram
 participant Agent as EngineX agent
 participant Disk as ~/.engine (client VPC)
 participant LLM as LLM provider
 participant SaaS as Client integrations

 Agent->>Disk: Sessions, credentials, checkpoints
 Agent->>LLM: Prompts (may include business text)
 LLM-->>Agent: Completions
 Agent->>SaaS: API calls (Grafana, Slack, CRM)
Loading

Client decisions:

  • Approved LLM vendor + DPA in place (Anthropic, OpenAI, or local Ollama in VPC for no external LLM)
  • Data residency requirements → prefer in-VPC Ollama or approved regional API
  • Minimize PII in prompts where possible
  • Review integration scopes (OAuth scopes for HubSpot, Google, etc.)

Human-in-the-loop (HITL)

  • Approvers use dashboard (or terminal in dev) to approve/reject at pause_nodes
  • Approval actions stored in session/checkpoint JSON on client disk
  • Access control: Same as dashboard — VPN/SSO; named approver list is operational policy

Backup, retention & deletion

Item Guidance
Backup Snapshot/replicate ~/.engine volume (client backup tooling)
Retention Client defines; no global retention policy in OSS
Deletion Remove session dirs under ~/.engine/agents/<agent>/sessions/; rotate credentials via dashboard/CLI
Right to erasure Client controls disk — delete volume snapshot + session files

Security checklist (client IT sign-off)

Copy for security ticket / sign-off record:

Infrastructure

  • EngineX in private subnet (no unnecessary public IP)
  • Encrypted volume for ~/.engine / ENGINE_HOME
  • Secrets from cloud secret manager, not baked into images or git
  • Separate install for dev / staging / prod

Application

  • ENGINE_CREDENTIAL_KEY set and rotation documented
  • ./engine setup-credentials used for agent secrets
  • Dashboard behind TLS + VPN/SSO (if UI enabled)
  • Headless workers use least-privilege integration credentials (read-only DB where possible)

Network

  • Outbound allowlist documented (LLM + integrations)
  • Inbound restricted to approved paths only

LLM & compliance

  • LLM vendor approved; DPA signed if required
  • Alternative: Ollama/local model in VPC (no external LLM)
  • Business data in prompts reviewed for pilot scope

Operations

  • Backup/restore tested for ~/.engine
  • Incident contacts documented (client ops + EngineX delivery)
  • Hypercare period defined (#15 Section 4)

Known limitations (OSS — disclose in reviews)

Limitation Mitigation
No dashboard login/RBAC VPN/SSO proxy; or headless-only
Filesystem storage (no DB) Volume backup; adequate for pilot scale (#14)
Single-tenant per install One client per deployment
LLM sends prompts externally Client chooses vendor or local LLM
No SOC2 on EngineX OSS itself Client VPC model reduces vendor data custody

What is out of scope for this ticket (Phase 2+)

  • Multi-tenant auth & tenant isolation (#4)
  • Engine Cloud hosted control plane (private)
  • SOC2/ISO certification program
  • Automated penetration test reports (client may run their own)

Deliverables

  • This issue = canonical security FAQ on GitHub
  • Link from #13 and #15
  • Optional: 1-page PDF export for finance (GTM — not OSS dev assignment)
  • Optional: examples/deploy/SECURITY.md mirror in repo (Dev 2 deploy pack)

Definition of done

  • Client security team can review this issue without a live call
  • Finance/legal understands: data in client VPC, no platform DB, LLM is main external dependency
  • IT sign-off checklist (above) used on every pilot (#15)


Client FAQ — questions IT, finance, legal & procurement ask

Use this section to fill security questionnaires (SIG Lite, vendor risk forms, finance due diligence). Answers assume recommended model: EngineX OSS self-hosted in client VPC (#13).


A. Hosting & data location

Q: Where is our data stored?
A: On your infrastructure — a persistent disk/volume mounted at ~/.engine/ (or ENGINE_HOME) inside your VPC. Sessions, checkpoints, credentials, and logs are JSON/files on that volume. EngineX vendor does not host your production data in the default OSS model.

Q: Do you store our data in your cloud?
A: No, for standard client VPC deployment. The runtime runs in your AWS/GCP/Azure/on-prem network. Optional future Engine Cloud (vendor-hosted control plane) is separate and not required for pilots.

Q: Which regions / countries is data processed in?
A: Wherever you deploy the EngineX VM/container. You choose region, subnet, and data residency. If you use a cloud LLM (Anthropic/OpenAI), prompt data also flows to that provider's regions per their DPA — use Ollama in-VPC if prompts must not leave your network.

Q: Can we keep everything inside our VPC with no external calls?
A: Partially. Platform state stays in VPC. Agents still need outbound access to integrations you configure (Grafana, Slack, CRM, your APIs). For zero external AI, run a local LLM (Ollama) and only allowlist your internal APIs — no cloud LLM key required.

Q: Do we need to give you VPN access to our network?
A: Not for production. You operate the install. EngineX delivery may request time-boxed staging access during integration/hypercare (#15) — document in SOW; prefer jump host + least privilege.

Q: Can we use our existing Kubernetes / VM standards?
A: Yes. EngineX is a Python process (./engine run or ./engine serve) in your container or VM. You apply your golden images, patching, and pod security policies.


B. Database & storage

Q: Do we need to provision a database for EngineX?
A: No. Platform persistence is filesystem JSON, not Postgres/MySQL (#14).

Q: Will EngineX connect to our production database?
A: Only if you configure an agent to do so (e.g. read-only reconciliation queries). EngineX does not require DB credentials for itself. Prefer read replicas, API gateways, or scoped service accounts — see #15.

Q: How do we backup EngineX data?
A: Snapshot or replicate the ~/.engine volume with your standard backup tool (EBS snapshot, Velero, rsync to DR). Test restore before go-live.

Q: What happens if the disk fills up?
A: Session/checkpoint writes may fail. Monitor disk usage; set retention policy (archive/delete old sessions). Size volume for pilot + growth (typical pilot: 10–50 GB depending on log volume).

Q: Is there a maximum data retention period?
A: You define it. OSS does not auto-purge. Delete session directories or restore from backup per your policy.


C. Authentication & access control

Q: Who can log into the EngineX dashboard?
A: OSS ./engine serve has no built-in user accounts. You control access via VPN, Zero Trust, and/or SSO/reverse proxy in front of :8787 (#15 Section 3). Named HITL approvers are an operational list your team maintains.

Q: Do you support SSO (Okta, Azure AD, SAML)?
A: Not natively in OSS today. Place your IdP in front of the reverse proxy (standard pattern). Native SSO is planned for multi-tenant / Engine Cloud (#4 Phase 2).

Q: Role-based access control (RBAC)?
A: Limited in OSS — no per-user roles in the product. Mitigate with network access, separate dev/prod installs, and headless workers without UI for automated jobs.

Q: How are API keys managed?
A: Stored in ~/.engine/credentials/ (encrypted with ENGINE_CREDENTIAL_KEY) or injected via your secret manager into environment variables. Setup wizard: ./engine setup-credentials <agent>.

Q: Can we use AWS Secrets Manager / HashiCorp Vault?
A: Yes — inject secrets at runtime into env vars or files; EngineX reads from env and encrypted local store. No vendor-specific Vault plugin required for pilot.


D. Encryption

Q: Is data encrypted at rest?
A: Credentials/tokens: yes — Fernet encryption when ENGINE_CREDENTIAL_KEY is set. Session/checkpoint files: rely on your volume encryption (EBS encrypted, GCP CMEK, Azure disk encryption).

Q: Is data encrypted in transit?
A: Your responsibility for dashboard: TLS at reverse proxy. Outbound: HTTPS to LLM and integration APIs (standard TLS). Do not expose :8787 without TLS.

Q: Who holds the encryption keys?
A: ENGINE_CREDENTIAL_KEY — you generate/store in env or secret manager. Volume encryption keys — your cloud KMS. EngineX vendor does not hold your production keys in VPC model.

Q: Can we bring our own KMS (CMK)?
A: For disk/volume — yes, via cloud provider. For credential vault — OSS uses Fernet key you supply; HSM integration is not built-in (future/custom).


E. LLM & AI-specific questions

Q: Does our data train your or OpenAI's models?
A: EngineX OSS does not train models. For Anthropic/OpenAI, data handling is governed by your API agreement with that provider (typically API data not used for training on enterprise terms — verify your contract).

Q: What data is sent to the LLM?
A: Prompts built from agent context — may include document text, ticket content, log snippets, or reconciliation data depending on workflow. Minimize PII in pilot scope; use redaction where possible.

Q: Can we use a private / on-prem LLM?
A: YesOllama or compatible local models via ~/.engine/configuration.json. Prompts stay in your VPC.

Q: Do you log prompts externally?
A: Prompts/completions may appear in local runtime logs on your disk. They are not sent to EngineX vendor by default. Cloud LLM providers may log per their policy.

Q: How do we prevent agents from hallucinating sensitive actions?
A: Use HITL pause_nodes for approvals, validate→fix loops, and least-privilege tools (read-only DB). See #10 goal vs node criteria.


F. Integrations & third parties

Q: What third-party services can EngineX call?
A: Only what you configure: LLM API, Grafana, Slack, HubSpot, Google Calendar, your internal REST APIs, etc. (#13 Section 5.3). Outbound firewall allowlist is recommended.

Q: Do you subprocess our data to sub-processors?
A: In client VPC model, you choose LLM and integration vendors. EngineX vendor is not a data processor hosting your files. List LLM + SaaS integrations in your sub-processor register.

Q: OAuth — where are refresh tokens stored?
A: Encrypted in ~/.engine/credentials/ on your disk after dashboard Connect flow (HubSpot, Google, Zoho).

Q: Can integrations be read-only?
A: Yes — recommended for pilots (read-only DB user, Grafana read token, Slack webhook post only). Write access requires explicit agent design + approval.


G. Compliance & certifications

Q: Are you SOC 2 / ISO 27001 certified?
A: EngineX OSS as software — not a certified hosted service. Client VPC deployment means you operate the environment under your compliance program. We provide this security reference; you assess risk.

Q: GDPR / right to erasure?
A: Personal data in sessions/checkpoints is on your disk. Erasure = delete session files + backups per your GDPR process. EngineX vendor does not retain copies in VPC model.

Q: HIPAA / PHI?
A: Not certified for HIPAA out of the box. If PHI is in prompts or documents, require BAA with LLM provider, encrypt volume, restrict access, and legal review. Many pilots avoid PHI in scope.

Q: PCI / card data?
A: Do not process PAN/CVV through agents unless explicitly designed and approved. Not in default templates.

Q: Audit trail for approvals?
A: HITL decisions stored in session/checkpoint JSON + runtime logs on your volume. Export via ops console / session history. Retention = your policy.

Q: Can we get a penetration test report?
A: Client may pentest their own deployment in their VPC. Vendor pentest report — not published for OSS; scope for enterprise agreement if needed later.


H. Operations & incident response

Q: What is your SLA / uptime?
A: Pilot: defined in your ops (systemd/K8s restart policies) + hypercare (#15). EngineX vendor SLA applies to support contract, not your VPC infrastructure.

Q: How do we patch vulnerabilities?
A: git pull / release tag from EngineXV/engineX, uv sync, restart service. You control patch cadence on your VM/container image.

Q: What if EngineX is compromised?
A: Isolate VM, rotate ENGINE_CREDENTIAL_KEY and all integration secrets, restore volume from clean backup, review runtime logs. Incident runbook is joint (client infra + EngineX delivery contact).

Q: Do you have access to production after go-live?
A: By agreement only — time-boxed support. Default: client owns production access.

Q: Logging & SIEM integration?
A: Forward VM/container logs and optionally ~/.engine/.../runtime_logs/ to your SIEM (CloudWatch, Datadog, Splunk). No built-in SIEM agent; standard file/stdout shipping.


I. Commercial & procurement

Q: What are we licensing?
A: EngineX OSS — open-source runtime (MIT License — verify on repo). Optional commercial support / Engine Cloud — separate agreement.

Q: Is our data used to improve your product?
A: Not from your VPC install — we do not receive your session files unless you explicitly share logs for support tickets.

Q: Vendor lock-in?
A: Agents are Python graphs in your repo/fork; data is JSON on your disk. You can export session data, stop the service, and delete the volume. LLM and integration choices are yours.

Q: Can we review source code?
A: Yes — public repo github.com/EngineXV/engineX.

Q: Insurance / cyber liability?
A: Vendor policy — address in commercial contract; not covered by this technical ticket.


J. Architecture choices clients ask about

Q: Headless vs dashboard — which is more secure?
A: Headless (./engine run --daemon) — smaller attack surface, no :8787. Dashboard — needed for OAuth Connect and browser HITL; secure with TLS + VPN/SSO (#13 Section 3).

Q: One install for dev and prod?
A: No — use separate installs or volumes to avoid credential and data bleed.

Q: Multi-tenant — can we share one EngineX for two business units?
A: OSS does not isolate business units on one install. Use separate installs per unit or wait for #4.

Q: Do you need inbound ports open from the internet?
A: No for headless. For dashboard: only through your proxy/VPN — not raw public :8787.

Q: What ports are used?
A: 8787 (dashboard/API, if enabled), outbound 443 to LLM and integrations. No inbound agent port required for headless workers.


K. Questionnaire quick-fill (copy-paste block)

Security questionnaire field Standard answer (VPC model)
Data storage location Customer cloud — customer-controlled region
Data processor role Customer is controller; vendor provides software only
Encryption at rest Yes — credentials encrypted; volume encryption by customer
Encryption in transit TLS (customer-configured) + HTTPS outbound
Authentication Customer VPN/SSO in front of UI; no native OSS login
Multi-tenancy Dedicated install per customer (OSS)
Database required No platform database
Sub-processors Customer-selected LLM and SaaS APIs only
SOC2 Customer-operated environment; OSS not a certified hosted SaaS
Pen test Customer may test their deployment
Backup Customer snapshots ~/.engine volume
DR Customer DR policy on VM/volume
Access to production Customer-controlled

L. Red flags — when to escalate to architecture review

Escalate with EngineX delivery + client CISO if the client requires all of:

  • Multi-tenant SaaS on single URL for unrelated external customers → #4
  • PHI/PCI in agent scope without local LLM and legal sign-off
  • Public internet dashboard without SSO
  • Write access to production financial DB without HITL
  • Requirement that vendor hosts and retains all session data (Engine Cloud scoping)

Related links

  • Deployment: #13
  • Storage (no DB): #14
  • Integration + TLS/VPN: #15
  • Multi-tenant (future): #4
  • Code: core/engine/credentials/, core/engine/server/, core/engine/storage/

Version: 2026-06-28 — EngineX OSS security model + client FAQ (client VPC install)

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationduplicateThis issue or pull request already existsphase-1Phase 1 — pilot / GTMpriority-p1P1 — high / next after P0type-docsDocumentation

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions