Skip to content

[Refactor]: Git URL path duplicates build dependency extraction with iterative phase machine #1220

Description

@LalatenduMohanty

Description

When bootstrapping a package from a git URL without a version tag (e.g., stevedore @ git+https://...), fromager performs build dependency extraction twice:

  1. During version discovery_resolve_version_from_git_url_get_version_from_package_metadata_prepare_build_dependencies calls PEP 517 hooks (get_requires_for_build_wheel, get_requires_for_build_sdist) to set up a build environment and extract the version from metadata.

  2. During normal phase processing — After version discovery, the package enters the iterative phase machine. _phase_prepare_source and _phase_prepare_build call the same dependencies.get_build_*_dependencies() functions, invoking the same PEP 517 hooks again.

The dependency cache files (build-system-requirements.txt, build-backend-requirements.txt, build-sdist-requirements.txt) that would prevent re-extraction are written into a TemporaryDirectory during step 1 and destroyed when the git clone is moved to its final working directory. So the phase machine never finds them and re-invokes the hooks.

What should happen vs what actually happens

EXPECTED (PyPI packages — single pass):

    resolve version (from PyPI)
         │
         ▼
    phase machine: PREPARE_SOURCE ─► extract build-system deps    ──► once
                   PREPARE_BUILD  ─► extract backend/sdist deps   ──► once
                   BUILD          ─► build wheel


ACTUAL (git URL without version tag — double pass):

    resolve version:
    ├── clone repo
    ├── extract build-system deps    ──► first time    ┐
    ├── extract backend/sdist deps   ──► first time    ├── cache files written
    ├── run metadata hook            ◄─────────────────┘   to tmpdir, then
    └── move clone, destroy tmpdir   ──► cache files LOST  destroyed
         │
         ▼
    phase machine: PREPARE_SOURCE ─► extract build-system deps    ──► second time
                   PREPARE_BUILD  ─► extract backend/sdist deps   ──► second time
                   BUILD          ─► build wheel

Impact

  • Redundant PEP 517 hook callsget_requires_for_build_wheel and get_requires_for_build_sdist are called twice for every git URL package without a version tag.
  • Redundant dependency resolution — Build dependencies are resolved twice (though _has_been_seen() prevents them from being fully rebuilt).
  • Two separate BuildEnvironment instances are created — one temporary for metadata extraction, one permanent for the actual build.
  • Build-sdist dependencies are bootstrapped unnecessarily during version discovery — they are not needed for prepare_metadata_for_build_wheel.

Context

The git URL code path predates the iterative phase machine refactor (43bf6a8). The recursive-to-iterative conversion commit explicitly deferred unifying the two paths. Three methods exist solely for the git URL path:

  • _prepare_build_dependencies — mirrors what PREPARE_SOURCE + PREPARE_BUILD do
  • _handle_build_requirements — eager sequential bootstrap
  • _bootstrap_one — single-requirement iterative loop wrapper

Note: The "version known" case (git+https://...@1.2.3) does not have this problem — _get_version_from_package_metadata is never called, and the phase machine handles everything in a single pass.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions