Skip to content

[INFO][Platform][Storage] Filesystem JSON vs database — per-client VPC model #14

Description

@mishrapravin114

Overview

Type: INFO — Platform storage architecture / decision record
Audience: DevOps, client platform teams, sales, engineering
Related: #4 Multi-Tenant Architecture · #13 Client Deployment Case Study

This ticket documents how EngineX stores sessions, checkpoints, and credentials today, whether a database is required, and when (if ever) to add one — based on product design and pilot deployment model.


Executive summary

Question Answer
Does EngineX use a database today? No — filesystem JSON under ~/.engine/
Does each client need to provision a DB for EngineX? No for OSS pilots (Model A/B)
Where does each client's data live? Their cloud — persistent volume → ~/.engine/
Is JSON enough for production pilots? Yes — one install + one volume per client
When would we need a DB? Phase 2+ — multi-tenant Engine Cloud SaaS (#4), not per-client VPC

Sales one-liner: EngineX runs in your cloud; platform state lives on your encrypted disk. We do not ask you to provision a database for EngineX.


Current storage model (OSS)

EngineX persists all platform state as JSON files on disk. There is no Postgres, SQLite, or other DB adapter in the OSS runtime.

Data Format Path
Session state state.json ~/.engine/agents/<agent>/sessions/<session_id>/
Checkpoints *.json + index.json .../checkpoints/
Runtime logs JSON / log files .../runtime_logs/
Credentials (encrypted) Encrypted files ~/.engine/credentials/
Bootstrap encryption key File ~/.engine/secrets/credential_key
Default LLM config JSON ~/.engine/configuration.json
Custom skills Markdown / files ~/.engine/skills/

Code references:

  • core/engine/storage/concurrent.pyConcurrentStorage (file-backed)
  • core/engine/storage/session_store.pystate.json per session
  • core/engine/storage/checkpoint_store.py — checkpoint JSON + index
  • core/engine/credentials/storage.pyEncryptedFileStorage
  • core/engine/runner/runner.py — default path ~/.engine/agents/<agent_name>/

Override root with ENGINE_HOME or per-run storage_path. Still files, not a DB.


Per-client cloud deployment (why no DB is needed)

Recommended pilot model (Model A/B from #13):

flowchart TB
 subgraph ClientA["Client A — AWS"]
 A1["EngineX VM / pod"]
 A2["Persistent volume<br/>~/.engine — JSON files"]
 A1 --- A2
 end

 subgraph ClientB["Client B — GCP"]
 B1["EngineX VM / pod"]
 B2["Persistent volume<br/>~/.engine — JSON files"]
 B1 --- B2
 end

 ClientA -.-x ClientB
Loading

Each client:

  • Gets a separate EngineX install in their VPC
  • Gets a separate persistent volume (~/.engine or ENGINE_HOME)
  • Has their own JSON files for sessions, checkpoints, credentials
  • Does not share storage with other clients
  • Does not need to provision a database for EngineX platform state

Isolation model: separate deployment + separate disk — not row-level multi-tenancy in one database.

Client backup: snapshot or replicate the ~/.engine volume (see #13 Section 8).


Client database vs EngineX storage (common confusion)

Layer Uses client's DB? Notes
EngineX sessions, checkpoints, credentials No Always ~/.engine/ on runtime disk
Agent business data (transactions, ERP, warehouse) Optional Only if the agent workflow is wired to query their systems via tools/MCP/API

EngineX is not designed as: "Client plugs in Postgres and EngineX uses it as the platform store."

EngineX is designed as: "Client runs EngineX in their cloud on a persistent disk; agents may optionally integrate with their business systems."

Examples today:

  • log_monitor → Grafana + Slack APIs (not client SQL)
  • hourly_tracking → mock tools today; real DB integration would be agent-level custom work
  • No shipped template uses client Postgres for platform persistence

When JSON/files is enough (keep current design)

No DB required for:

  • Single company, one VPC, one EngineX install
  • Low–moderate session volume (typical ops/finance pilots)
  • One or few runtime nodes with shared volume
  • Client wants minimal infra (no DB to patch/migrate for the platform)
  • Headless workers + optional dashboard (Models A/B)

This covers Phase 1 pilots and managed single-tenant hosting (you run one VM per client, each with its own volume).


When a database may be needed (future — not blocking pilots)

Consider a DB (or object store) only when moving to:

Scenario Why files may not scale Phase
Multi-tenant SaaS — many orgs on one hosted product Tenant/user/RBAC queries; org-scoped credential vault Phase 2 — #4
Engine Cloud control plane Central OAuth vault, user directory, cross-tenant admin Phase 2 — private repo
High concurrent writers Many dashboard users / runtimes without shared-volume locking Phase 2+
Cross-install analytics Aggregate runs across many clients from one control plane Phase 2+
Active-active HA Multiple nodes without shared filesystem Phase 2+
Large checkpoint audit trails Search/list at very high volume Phase 2+ (or S3 for blobs)

Important: Even with Engine Cloud, runtimes in client VPC can keep local JSON cache on disk; the DB is primarily for the control plane (tenants, users, central vault) — not necessarily replacing every runtime's ~/.engine/.

Open question from #4: Postgres vs filesystem for tenant/user storage in MVP — does not affect OSS per-client VPC pilots.


Deployment models vs storage

Model Who runs runtime Platform storage DB for EngineX?
A — Client self-host Client DevOps Client volume ~/.engine No
B — Managed single-tenant You (one VM per client) Per-client volume No
C — OSS runtime + Engine Cloud Client or you Local JSON cache + Cloud vault Cloud side: likely yes (#4)

Default for first pilots: Model A or B — JSON on disk is the intended production store.


Operational requirements (no DB)

Clients / DevOps should provide:

  1. Persistent encrypted volume mounted at ~/.engine or ENGINE_HOME
  2. Backup policy — volume snapshots / replication
  3. ENGINE_CREDENTIAL_KEY for encrypted credentials at rest
  4. Separate installs per client and per environment (dev vs prod)

They should not be asked to:

  • Provision Postgres/MySQL for EngineX platform state
  • Share one EngineX install across unrelated clients (OSS has no tenant isolation in one install)

FAQ (client / investor)

Q: Do we need a database?
A: No, for standard self-hosted pilots. Persistent disk is sufficient.

Q: Can we use our existing Postgres?
A: Not for EngineX internals. Only if you build an agent that reads/writes your business data.

Q: How is data isolated between clients?
A: Separate deployments and separate ~/.engine volumes — not a shared multi-tenant DB.

Q: Is JSON a temporary hack?
A: No — it's the deliberate OSS design for single-tenant installs. DB is a Phase 2 consideration for SaaS/control plane.

Q: What about compliance / finance reviews?
A: Data stays in client VPC on encrypted volume; no platform DB to audit. See #13 Section 9 Security Checklist.


Deliverables (this ticket)

Documentation

  • Link this issue from #13 Section 8 Storage & Backup
  • Add short "Storage" subsection to public README or ARCHITECTURE doc (when docs are published)
  • Cross-reference #4 — DB scope limited to Phase 2 SaaS, not OSS pilots

Engineering (Phase 1 — no change required)

  • Filesystem storage works for pilots — no DB implementation needed
  • Optional: document ENGINE_HOME and volume sizing guidance in deploy case study

Engineering (Phase 2 — only if #4 proceeds)

  • Evaluate Postgres vs object store for Engine Cloud tenant/user/vault
  • Define whether runtime ~/.engine remains file-based with Cloud sync vs full DB migration
  • Storage backend abstraction (if multiple backends needed)

Definition of done (Phase 1)

  • Team can explain: JSON per client VPC, no platform DB, client business DB is separate
  • Sales/DevOps FAQ aligned with #13
  • No engineering work blocked on "waiting for database" for pilot clients

Out of scope (Phase 1): Building a database storage backend for OSS single-tenant installs.


Decision record version: 2026-06-28

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationphase-1Phase 1 — pilot / GTMphase-2Phase 2 — platform scale (post-pilot)type-docsDocumentationtype-platformPlatform / infrastructure

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions