Skip to content

[BUG] Artifactory + GitLab subgroups: apm install fails for 3+ segment project paths #1498

@chkp-roniz

Description

@chkp-roniz

Summary

apm install against an Artifactory VCS proxy fronting GitLab cannot resolve projects that live at 3+ subgroup depth (e.g. group/subgroup/project, group/sub-a/sub-b/project). The downloader hits HTTP 404 because parse-time chops the path at the second segment and treats the rest as an in-repo virtual sub-path — the proxy then receives the wrong archive URL.

PR #1472 addresses this by replacing the parse-time heuristic with an authoritative install-time boundary probe.

Background

JFrog Artifactory exposes upstream Git hosts via a VCS-remote endpoint:

https://<artifactory-host>/<repo-key>/<owner>/<project>/archive/refs/heads/<ref>.zip

When the upstream behind a VCS remote is GitHub, paths always have the fixed owner/repo shape (two segments). When the upstream is GitLab, a project can sit at any subgroup depth:

GitHub :  acme/widget
GitLab :  acme/widget                      # flat
GitLab :  acme/platform/widget             # 1-level subgroup
GitLab :  acme/platform/auth/widget        # 2-level subgroup
GitLab :  acme/platform/auth/v2/widget     # arbitrary depth, no upper bound

APM exposes this proxy with two env vars:

  • PROXY_REGISTRY_URL=https://art.example.com/artifactory/<repo-key> — base URL for the proxy.
  • PROXY_REGISTRY_ONLY=1 — strict mode; refuse direct VCS fallback.

The bug

For a 3+ segment dep under proxy-only mode:

# apm.yml
dependencies:
  apm:
    - group/subgroup/project#main
PROXY_REGISTRY_URL=https://art.example.com/artifactory/apm \
PROXY_REGISTRY_ONLY=1 \
apm install

APM (pre-fix) parses group/subgroup/project as owner=group, repo=subgroup, virtual_path=project — treating the third segment as an in-repo sub-path. It then asks the proxy for:

https://art.example.com/artifactory/apm/group/subgroup/archive/refs/heads/main.zip

which 404s on the proxy because no project sits at group/subgroup — the real project is one level deeper at group/subgroup/project. The install fails:

Failed to download package group/subgroup#main from Artifactory
(art.example.com/artifactory/apm). Last error: HTTP 404 ...

Why parse-time heuristics can't solve this

Earlier attempts tried to detect the boundary by inspecting segments for "well-known" marker directory names (skills/, prompts/, agents/, collections/, instructions/) and virtual file extensions (.prompt.md, .instructions.md, .chatmode.md, .agent.md). The list-based approach has two structural problems:

  1. Marker names are ambiguous. A GitLab subgroup or repo legitimately named agents (or prompts, etc.) is indistinguishable from the marker. Parse-time can't tell group/agents/project (where agents is a real subgroup) apart from group/agents/foo (where agents/foo is a virtual sub-path of repo group).
  2. The hard-coded list drifts. Every new APM primitive that introduces a marker has to be added to the constants in two places (the parser and the resolver). New file extensions need the same dual-update.

The result was a parse-time guess that was wrong often enough for nested-group paths that an end-user apm install would 404 with a confusing error pointing at the wrong owner/repo split.

The fix (PR #1472)

The boundary is determined at install time, not at parse time, by HEAD-probing the Artifactory archive URLs:

  1. Enumerate every plausible (owner, repo, virtual_path) split, shallow-first.
  2. For each candidate, HEAD the archive URL on the proxy (allow_redirects=False so a Bearer token can't leak cross-host on a redirect).
  3. The first candidate that responds 2xx or 3xx is the verified boundary.
  4. Rebuild the dependency reference at that split.

If every candidate is rejected the resolver raises — explicitly distinguishing "missing repo" (every 4xx) from "auth problem" (every 401/403) so a misconfigured PROXY_REGISTRY_TOKEN no longer masquerades as a missing repo. There is no silent fallback to a guess.

The mechanism mirrors the existing native-GitLab pattern (_try_resolve_gitlab_direct_shorthand) but uses the archive URL itself as the existence signal, so no separate metadata API call is needed against the proxy.

Coverage:

  • Mode 1 — explicit FQDN deps (<artifactory-host>/<prefix>/<owner>/<project>[/<more>]).
  • Mode 2 — bare shorthand (<owner>/<project>[/<more>]) under PROXY_REGISTRY_URL + PROXY_REGISTRY_ONLY=1. Audience-correct auth: Mode 2 uses the proxy's own PROXY_REGISTRY_TOKEN, not the upstream Git host token.

For users who need an explicit, deterministic answer without probing (e.g. air-gapped CI), the // empty-segment notation marks the repo/virtual boundary unambiguously:

<artifactory-host>/<prefix>/<owner>/<deep>/<project>//<virtual/sub-path>

Linked PR

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions