Skip to content

[LHF-02] Non-deterministic keys across process restarts via unstable hash() #111

Description

@sheepdestroyer

Overview

The _make_key function in router/memory_mcp.py uses Python's built-in hash() function to generate uniqueness suffixes for memory storage keys. Since Python 3.3+, string hash seeds are randomized on every process start (via PYTHONHASHSEED) as a security measure against hash-collision DoS attacks. This means identical inputs produce different key suffixes across process restarts, breaking key stability guarantees.

Location

File: router/memory_mcp.py_make_key(), Lines 38–44

Current Code

def _make_key(category: str, is_global: bool, data: str) -> str:
    """Build a unique key from memory attributes."""
    scope = SCOPE_GLOBAL if is_global else SCOPE_LOCAL
    ts = int(time.time() * 1000)
    # Use first 12 chars of a basic hash for uniqueness within the same second
    h = str(hash(data + str(ts)))[:12].replace("-", "x")
    return f"{PREFIX}:{scope}:{category}::{ts}:{h}"

Problems

  1. hash() is non-deterministic: Same data + ts input → different h values after restart
  2. Truncation is fragile: str(hash(...)[:12] takes the most-significant digits of the decimal representation, which vary wildly across seeds
  3. Negative sign handling: The replace("-", "x") workaround for negative hashes produces inconsistent key formats
  4. Not collision-resistant: hash() is designed for hash table bucket distribution, not uniqueness guarantees

Impact

  • Severity: 🟡 Medium (Data Integrity)
  • Keys generated before a restart can never be reproduced, making deduplication or lookup-by-key unreliable
  • If any external system or debug workflow relies on key stability, it silently breaks
  • The replace("-", "x") hack masks an underlying design issue

Recommendations

Replace hash() with hashlib.blake2b — a modern, SOTA cryptographic hash that is:

  • Part of Python stdlib since 3.6 (zero dependencies)
  • Deterministic — same input always produces the same output
  • Faster than MD5/SHA-256 on modern CPUs (SIMD-optimized, ~3× faster than SHA-256)
  • Cryptographically secure — not obsolete like MD5/SHA-1
  • Configurable digest size — perfect for short key suffixes
import hashlib

def _make_key(category: str, is_global: bool, data: str) -> str:
    """Build a unique key from memory attributes."""
    scope = SCOPE_GLOBAL if is_global else SCOPE_LOCAL
    ts = int(time.time() * 1000)
    # BLAKE2b: SOTA crypto hash, stdlib, faster than MD5, deterministic across restarts
    h = hashlib.blake2b((data + str(ts)).encode("utf-8"), digest_size=6).hexdigest()
    return f"{PREFIX}:{scope}:{category}::{ts}:{h}"

digest_size=6 produces a 12-character hex string — same length as the current suffix, but uniform, clean, and stable.

Why BLAKE2b over alternatives

Aspect hash() MD5 SHA-256 BLAKE2b
Deterministic
Cryptographically secure ❌ (broken)
Speed (cycles/byte) ~1 ~5 ~11 ~3
Python stdlib ✅ (3.6+)
Configurable output size ❌ (fixed 128-bit) ❌ (fixed 256-bit) ✅ (1–64 bytes)
Negative values Yes No No No
RFC standardized No RFC 1321 FIPS 180-4 RFC 7693

Acceptance Criteria

  • _make_key() produces identical keys for identical inputs across process restarts
  • Key suffix uses clean hex characters (no x substitution hack)
  • Hash algorithm is BLAKE2b (not MD5 or other obsolete hashes)
  • Existing tests in test_memory_mcp.py pass without modification
  • Key uniqueness guarantees are preserved

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcode-qualityCode quality improvementjulesIssues for Jules

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions