Skip to content

email: uncaught IndexError in the header parser on empty-input edge cases #151857

@tonghuaroot

Description

@tonghuaroot

The modern email header parser (email._header_value_parser, used under EmailPolicy / email.policy.default) raises a bare IndexError instead of a HeaderParseError / defect on two malformed inputs. The shared root cause is the same bug class: an empty parser token-list or string is indexed (value[0] / res[-1]) without first checking it is non-empty, so the parser escapes its own error-recovery contract.

Instance 1 — MIME parameter name ending with *

import email, email.policy
email.message_from_string("Content-Type: text/plain; name*\n\n",
                          policy=email.policy.default)['content-type'].params

get_parameter consumes the extended-parameter marker *, leaving value == '', then evaluates value[0]IndexError. parse_mime_parameters only catches HeaderParseError, so the IndexError escapes. Also reachable via name*0* (sectioned) and a trailing x=1; name*.

Instance 2 — address display name that is only a comment

email.message_from_string("To: (c):\n\n", policy=email.policy.default)['to']
email.message_from_string("Cc: (x): a@b.com;\n\n", policy=email.policy.default)['cc']

DisplayName.display_name guards the empty-list case, but when the name is a single cfws token, res.pop(0) empties res and the subsequent res[-1] raises IndexError (this also propagates through the .value property). The second example is a syntactically valid RFC 5322 group whose display name is a comment.

Both are reachable from ordinary parsing of untrusted/malformed address and parameter headers under the modern policies.

Fix

Add minimal guards so each input degrades to the existing recovery path (a parse defect / empty display name), matching how the parser already treats other malformed input.

After the fix, 100k+ randomized message_from_string parses over malformed address and parameter headers produce no escaping non-HeaderParseError exception. (The empty-string IndexError still produced by directly calling the low-level get_* token parsers is their documented non-empty precondition and is not reachable from public header parsing — every caller checks non-empty first.)

Affected versions

Both instances reproduce on main and on the maintained 3.13 / 3.14 / 3.15 bugfix branches; the affected code (get_parameter and DisplayName.display_name) has been present since well before 3.9.

Linked PR

A PR fixing both instances follows.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibStandard Library Python modules in the Lib/ directorytopic-emailtype-bugAn unexpected behavior, bug, or error
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions