Skip to content

Bug: Boyer-Moore bad character shift has no effect (for-loop variable reassignment) #14844

@devladpopov

Description

@devladpopov

Description

In strings/boyer_moore_search.py, the bad_character_heuristic() method attempts to skip positions using the bad character rule, but the shift has no effect because it reassigns the for-loop variable i:

def bad_character_heuristic(self) -> list[int]:
    positions = []
    for i in range(self.textLen - self.patLen + 1):
        mismatch_index = self.mismatch_in_text(i)
        if mismatch_index == -1:
            positions.append(i)
        else:
            match_index = self.match_in_pattern(self.text[mismatch_index])
            i = (mismatch_index - match_index)  # <-- THIS HAS NO EFFECT
    return positions

In Python, reassigning the loop variable inside a for loop does not change the iteration. The next iteration always takes the next value from range(). This means:

  • The bad character shift is completely ignored
  • The algorithm checks every position sequentially
  • It behaves as brute-force O(nm), not Boyer-Moore O(n/m)
  • Results are correct, but performance is identical to naive search

Reproduction

class InstrumentedBMS(BoyerMooreSearch):
    def bad_character_heuristic(self):
        positions = []
        self.iterations = 0
        for i in range(self.textLen - self.patLen + 1):
            self.iterations += 1
            mismatch_index = self.mismatch_in_text(i)
            if mismatch_index == -1:
                positions.append(i)
            else:
                match_index = self.match_in_pattern(self.text[mismatch_index])
                i = (mismatch_index - match_index)
        return positions

text = "ABCDEFGHIJKLMNOPABCDEFGHIJKLMNOP"  # 32 chars
pattern = "MNOP"
bms = InstrumentedBMS(text, pattern)
positions = bms.bad_character_heuristic()
print(f"Iterations: {bms.iterations}")  # 29 (brute-force)
print(f"Expected for BM: ~8")           # should skip

Output: Iterations: 29 — same as brute-force. The shift assignment is dead code.

Suggested Fix

Replace for loop with while loop to allow the shift to take effect:

def bad_character_heuristic(self) -> list[int]:
    positions = []
    i = 0
    while i <= self.textLen - self.patLen:
        mismatch_index = self.mismatch_in_text(i)
        if mismatch_index == -1:
            positions.append(i)
            i += 1
        else:
            match_index = self.match_in_pattern(self.text[mismatch_index])
            i = max(i + 1, mismatch_index - match_index)
    return positions

Impact

This repository has 190k+ stars and is widely used as an educational reference. Students copying this implementation believe they have a Boyer-Moore algorithm with O(n/m) average performance, but actually have O(nm) brute-force search.

Found during systematic algorithm audit: https://github.com/devladpopov/algorithm-autopsy

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions