Skip to content

typing caches keep extension types alive past shutdown #151728

@wjakob

Description

@wjakob

Bug report

Bug description:

typing caches every subscripted type in global lru_cache. In real-world applications, this almost unavoidably leads to interpreter-wide leaks across extensions that leak checkers like valgrind and binding tools like nanobind complain about.

The problem has several components:

  1. An expression like typing.List[ext.Widget] creates a cache entry that holds a reference to Widget. In principle this should be cleared up when typing is GCed at interpreter shutdown.

  2. Many of the big ML framework (e.g., torch) hold references to typing that are never released when Python shuts down. Typing is an extremely commonly used module, and many existing extensions do have benign leak issues at shutdown time. This problem is unfixable, correcting every Python extension out there is a "let's boil the ocean" kind of impossible task.

  3. typing now greatly exacerbates this leak because it also keeps an innocent and well-defined extension from shutting down in an orderly manner. This often comes up with the nanobind binding generator, which loudly complains about any leak of types/functions/instances at shutdown time, where the above behavior can generate thousands of warnings about issues that are ultimately caused by the dense web of uncollectable references generated by typing.

A single such reference can keep a whole extension from being torn down, producing a burst of warnings like the following, none of which point to an actual bug in the extension:

nanobind: leaked 287 types!
 - leaked type "mymodule.Widget"
 - leaked type "mymodule.Mesh"
 - leaked type "mymodule.Transform"
 - ... skipped remainder
nanobind: leaked 1844 functions!
 - leaked function "Widget.__init__"
 - leaked function "Mesh.subdivide"
 - ... skipped remainder
nanobind: this is likely caused by a reference counting issue in the binding code.
See https://nanobind.readthedocs.io/en/latest/refleaks.html

This issue already came up once before in #98253, but this fix only broke a cycle inside _tp_cache and left the retention in place. Basically this fix was not enough.

Reproducer

A minimal CPython extension demonstrating the problem is available at https://github.com/wjakob/typing-leak.

ext.c is a tiny extension with just two pieces: Box, an object with no tp_traverse (so a cycle through it is invisible to the GC), and Meta, a metaclass that prints when one of its classes is allocated and deleted.

pip install -e .
python good.py    # Widget allocated / Widget deleted
python bad.py     # Widget allocated   (never deleted: leaked)

good.py defines a class and mentions it in one annotation:

import typing, ext

class Widget(metaclass=ext.Meta):
    pass

def f() -> typing.List[Widget]:
    ...

Widget is freed at shutdown, as it should be. bad.py is the same, plus an uncollectable cycle that holds one unrelated annotation:

box = ext.Box()
box.ref = [box, typing.Dict[str, int]]

Fix

typing cannot rely on being garbage-collected at shutdown, so it should clear its caches explicitly. The cache_clear callables are already gathered in typing._cleanups; they just never run. Registering them with atexit is enough:

# Lib/typing.py
import atexit
atexit.register(lambda: [clear() for clear in _cleanups])

It is also the same fix I suggested the last time this came up, which was not deemed nice. Please consider it, I really do think this needs to be fixed and hope this makes a case for it.

Affected Python versions

I reproduced the issue on Python 3.10 through 3.14, free-threaded Python included.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-bugAn unexpected behavior, bug, or error
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions