Skip to content

Reassemble inbound multipart SMS before public delivery #10

Description

@Justinabox

Motivation

Callstack now has multipart UDH parsing groundwork, but SMSService still persists/emits every inbound +CMT/+CMTI part as if it were a complete message. That is dangerous for MFA, alerting, and automation flows: a long SMS can reach webhooks, future WebSocket subscribers, and storage as partial fragments that look final.

This decomposes the v0.3 roadmap item from #9 into a single implementable PR: reassemble inbound multipart SMS parts before emitting the public incoming-message event.

User journey

A user leaves Callstack running on a Raspberry Pi and receives a long carrier/MFA/automation SMS split into multiple segments. Their integration should see one logical incoming SMS with the full body, not two or more fragment events. Single-part SMS behavior must remain unchanged.

API / UX sketch

  • Existing user-facing subscriptions keep working:
    • sms_service.on_message(handler) receives exactly one IncomingSMSEvent for a complete multipart message.
    • async with sms_service.messages() yields the complete message only after all parts arrive.
  • Storage should save one completed logical message by default.
  • Optional implementation detail: expose non-public part metadata on internal dataclasses, but avoid adding public API unless tests prove it is needed.
  • Logs may say a multipart group is waiting for missing parts, but must not log phone numbers or message bodies at info/warning level.

Technical approach

  1. Add a small multipart accumulator inside callstack/sms/service.py or a focused helper such as callstack/sms/multipart.py.
  2. Key groups by sender + concatenation reference + total-part count. Include 16-bit-vs-8-bit reference in the key to avoid collisions.
  3. Accept parts out of order, ignore/replace exact duplicate sequence numbers deterministically, and emit only when all sequences 1..total_parts are present.
  4. Add bounded cleanup for incomplete groups using a configurable TTL or max-age constant so a process cannot leak memory forever.
  5. Extend PDU/text-mode parsing only as far as needed for receive reassembly. If text-mode +CMT cannot reliably expose UDH for the supported modem path, add/route a PDU-mode receive path for multipart messages rather than guessing.
  6. Ensure normal single-part +CMT and +CMTI flows still persist/delete SIM messages exactly as they do today.

Affected modules

  • callstack/sms/service.py — route inbound parts through the accumulator before public emit/store.
  • callstack/sms/pdu.py — reuse/extend MultipartInfo and PDUDecoder.parse_concatenation_udh; add UDH-aware deliver decoding if necessary.
  • callstack/sms/types.py — only if a small internal/public part metadata type is needed.
  • callstack/sms/store.py — only if partial part persistence is chosen for this PR; otherwise explicitly document partial persistence as out of scope.
  • tests/test_sms_service.py and tests/test_sms_pdu.py — regression coverage.

Hardware / modem caveats

  • 3GPP TS 23.040 defines 8-bit and 16-bit concatenation references in UDH; the reference, total parts, and sequence number are what Callstack should use for reassembly.
  • Some modem APIs/text-mode paths hide or reshape UDH. Quectel's SMS guidance notes that received long SMS should be read in PDU format so the receiver can parse the UDH reference, total, and sequence for merging.
  • Do not assume SIMCOM text-mode behavior generalizes to Quectel/Huawei/Sierra devices. If reliable UDH extraction is unavailable in text mode, use PDU-mode receive for multipart support or gate it behind a documented capability.

Research links:

Acceptance criteria

  • Single-part direct +CMT still emits and stores one message immediately.
  • Single-part notification +CMTI/AT+CMGR still emits/stores one message and deletes the SIM slot after successful read.
  • Two-part 8-bit-reference multipart SMS arriving in order emits exactly one public IncomingSMSEvent with the concatenated body.
  • Two/three-part multipart SMS arriving out of order emits exactly one public event after the final missing part arrives.
  • 16-bit-reference multipart SMS is supported.
  • Duplicate part delivery does not emit duplicate public messages.
  • Incomplete groups are expired/cleaned deterministically and do not grow without bound.
  • Partial parts are not delivered to webhooks, future WebSocket streams, or sms_service.messages() as final messages.
  • Tests do not require real hardware; use mock transport/event injection.

Exact gates

git diff --check
PYTHONPATH=. uv run --no-project --with pytest --with pytest-asyncio --with pytest-aiohttp --with pyserial-asyncio --with aiosqlite pytest tests/test_sms_pdu.py tests/test_sms_service.py -q
PYTHONPATH=. uv run --no-project --with pytest --with pytest-asyncio --with pytest-aiohttp --with pyserial-asyncio --with aiosqlite pytest tests/ -q

Non-goals for this PR

  • Outbound long-SMS splitting.
  • Full UCS2/PDU send support.
  • WebSocket or webhook retry/signature work.
  • Permanent storage of incomplete partial groups unless the implementation naturally fits within this small slice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions