Skip to content

[Initiative] Separate Markdown Parsing Layer from Catalog Validation #134

@juliuskrah

Description

@juliuskrah

Summary

Introduce a dedicated, pure parsing layer for catalog Markdown files inside gitstore-api that is fully decoupled from both catalog mutation and field-level validation. This is required so that the admission control — triggered by git-receive-pack hooks sent from gitstore-git-service to gitstore-api — can parse and inspect pushed files without touching live catalog state.

Scope

In Scope

  • Define public parser functions for each catalog entity type (product, variant, category, collection) that accept raw Markdown content and return a typed intermediate representation
  • Separate frontmatter extraction and YAML unmarshalling from the catalog's Add*FromMarkdown methods, which currently combine all three concerns in a single call
  • Update Add*FromMarkdown to delegate to the new parsers as thin wrappers
  • Unit test coverage for the parsing layer in isolation (valid input, missing frontmatter delimiters, malformed YAML, empty body)
  • Update the validation layer ([Initiative] Catalog Validation in gitstore-api #105) to accept the parsed intermediate types as input rather than raw strings

Out of Scope

Acceptance Criteria

  • A public parsing API exists in gitstore-api for products, variants, categories, and collections
  • Parsing functions are pure: they accept a string and return a typed struct or an error — no side effects on catalog state
  • The existing Add*FromMarkdown methods delegate to the new parsers internally
  • No frontmatter-splitting logic is duplicated between the parsing layer and catalog methods
  • Unit tests cover: valid input, missing opening delimiter, missing closing delimiter, invalid YAML, empty Markdown body
  • The validation layer ([Initiative] Catalog Validation in gitstore-api #105) accepts parsed structs rather than raw content strings
  • The admission hook path ([Initiative] API Admission Controller (Validating Phase Only) #123) can invoke the parser without instantiating a Catalog
  • All existing catalog-loading tests continue to pass without modification

Implementation Notes

Why this separation is necessary

With #103 and #104 moving the Git smart HTTP and SSH protocol handling into gitstore-api, gitstore-git-service becomes a pure gRPC storage node. It no longer owns any protocol logic. When a client pushes, gitstore-api proxies the packfile to gitstore-git-service, which quarantines the incoming objects and fires a hook callback event back to gitstore-api over gRPC before finalising the ref update.

gitstore-api must handle this callback by parsing the affected Markdown files from the push and deciding whether to approve or reject — all without mutating the live catalog. The current design makes this impossible: the only available entry point combines parsing, implicit validation, and catalog mutation in a single call. Splitting these concerns into a parser-first design gives the hook handler a clean, stateless path to inspect content.

Parsing vs. validation

Parsing is a structural concern: does this file have valid frontmatter, can its YAML be decoded into a known shape? Validation is a semantic concern: do the field values satisfy business rules? These two phases must be independent. The parser's output becomes the input contract for the validator, not raw file bytes. This also means validation (#105) and admission control (#123) can evolve independently of how files are read off disk or streamed via gRPC.

Hook event flow (after #103 + #104)

A git push triggers git-receive-pack inside gitstore-git-service. The git-service quarantines the incoming objects and emits a hook callback gRPC event to gitstore-api containing the changed file paths and their raw content. gitstore-api invokes the parsing layer on each relevant file, then feeds the resulting structs to the validation layer. The decision (approve or reject) is returned to the git-service, which either promotes the quarantined objects to the live repo or discards them.

Dependency order

This initiative must be completed before #105 (Catalog Validation) and #123 (Admission Validation Contract) can be implemented correctly, since both depend on having a typed, parsed representation as their input boundary.

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Done

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions