Skip to content

Unit tests for osism/commands/ — task lifecycle (task, worker, reconciler, lock, service, container, configuration, set, noset) #2362

Description

@berendt

Background

Follow-up to #2192 (foundation) and PR #2193 (pytest + Zuul infrastructure). Part of Tier 8 (#2199). Combined issue for the nine small task-lifecycle CLI command modules under osism/commands/ (~579 LOC total): task.py (20 LOC, revoke Celery tasks), worker.py (57 LOC, start Celery workers), reconciler.py (71 LOC, deprecated worker runner + inventory sync), lock.py (91 LOC, task-lock management), service.py (100 LOC, service launcher incl. inventory watchdog), container.py (57 LOC, remote docker commands via SSH), configuration.py (39 LOC, configuration sync), set.py and noset.py (72 LOC each, maintenance/bootstrap state toggles). All are thin cliff Command classes orchestrating Celery .delay() calls, subprocess, and the Redis task lock — mock-heavy but individually small.

Scope

Add tests/unit/commands/test_task.py, test_worker.py, test_lock.py, test_service.py, test_container.py, test_configuration.py, test_set.py, test_noset.py and extend the existing tests/unit/commands/test_reconciler.py.

Already covered (do not duplicate): tests/unit/commands/test_reconciler.py has one test — Sync.take_action returns 1 when utils.fetch_task_output raises TimeoutError. Only the gap cases listed below are in scope for that module.

Follow the established pattern from test_reconciler.py / test_wait.py: cmd = module.Class(MagicMock(), MagicMock()), then parsed_args = cmd.get_parser("test").parse_args([...]).

This issue may be split further during implementation if it turns out too large (natural sub-chunks: set/noset, lock, worker/service/reconciler) — Tier 3 issues did the same.

Test targets

Revoketask.py:6

Patch celery.Celery at the source (lazy import inside take_action).

  • Parser: positional task with nargs=1parse_args(["abc"]) yields parsed_args.task == ["abc"]
  • take_action: Celery constructed with name "task", config_from_object called with osism.tasks.Config, app.control.revoke called with terminate=True
  • Quirk to pin down: task = parsed_args.task is the one-element list (nargs=1 is never unpacked), so control.revoke receives ["abc"], not "abc" — assert current behavior and note the latent bug in a comment

Run (worker) — worker.py:11

Patch osism.commands.worker.subprocess.Popen, osism.commands.worker.utils.check_task_lock_and_exit; env via monkeypatch.

get_parser()worker.py:12

  • Default --number-of-workers is min(cpu_count, 4) when OSISM_CELERY_CONCURRENCY is unset (use monkeypatch.delenv(..., raising=False); patch multiprocessing.cpu_count for determinism)
  • OSISM_CELERY_CONCURRENCY=7 set before get_parser() → default is 7 (the env var is read at parser-build time)

take_action()worker.py:30

  • utils.check_task_lock_and_exit is called before anything else
  • Queue mapping: openstack / netbox / conductorcelery -A osism.tasks.<queue> worker -n <queue> ... -Q <queue>
  • osism-kubernetes → both tasks module and queue become kubernetes
  • Suffix stripping: kolla-ansible-A osism.tasks.kolla ... -Q kolla-ansible; ceph-ansibleosism.tasks.ceph
  • Special case osism-ansible → tasks "osism" triggers the dedicated branch: -A osism.tasks.ansible worker -n osism-ansible ... -Q osism-ansible
  • -c <n> in the command string reflects --number-of-workers 3
  • p.wait() is called on the Popen return value
  • Edge documenting current behavior: an unknown short type (e.g. "foo") yields "foo"[:-8] == "" → command celery -A osism.tasks. worker ... — assert as-is with a comment (latent bug, no validation)

Run (reconciler, deprecated) — reconciler.py:13 — gap

Patch osism.commands.reconciler.subprocess.Popen; env via monkeypatch.

  • take_action (reconciler.py:14) logs the deprecation warning ("deprecated ... Use osism service reconciler")
  • Concurrency: OSISM_CELERY_CONCURRENCY unset → min(cpu_count, 4); set to 2-c 2 in the command string
  • Command string is celery -A osism.tasks.reconciler worker -n reconciler --loglevel=INFO -Q reconciler -c <n> and p.wait() is called

Sync (reconciler) — reconciler.py:30 — gaps only

Patch osism.tasks.reconciler.run.delay (lazy import at the call site), osism.commands.reconciler.utils.fetch_task_output, osism.commands.reconciler.utils.check_task_lock_and_exit. Timeout → 1 is already covered.

  • Wait path (default): delay called with publish=True, fetch_task_output called with the task id and timeout=300; its return value is returned by take_action
  • --no-wait: delay called with publish=False, fetch_task_output not called, returns None, "No more output" logged
  • --task-timeout 60fetch_task_output called with timeout=60
  • OSISM_TASK_TIMEOUT=600 set before get_parser() → parsed default is int 600 (argparse applies type=int to string defaults)
  • check_task_lock_and_exit invoked before scheduling

Locklock.py:9

Patch osism.commands.lock.utils.is_task_locked, osism.commands.lock.utils.set_task_lock.

  • Already locked ({"locked": True, "user": "alice", "timestamp": "..."}) → warning logged, returns None, set_task_lock not called; with reason present an additional "Existing reason" warning is logged
  • Not locked (None or {"locked": False}), set_task_lock returns True → returns None; --user bob --reason maintenance passed through to set_task_lock("bob", "maintenance")
  • --user omitted → defaults to settings.OPERATOR_USER ("dragon" unless OSISM_OPERATOR_USER set — patch osism.commands.lock.settings.OPERATOR_USER for determinism); note the default is also baked into the --user help string at parser-build time
  • set_task_lock returns False → "Failed to set task lock" logged, returns 1

Unlocklock.py:51

Patch osism.commands.lock.utils.is_task_locked, osism.commands.lock.utils.remove_task_lock.

  • Not locked (None or {"locked": False}) → "Tasks are not currently locked", returns None, remove_task_lock not called
  • Locked, remove_task_lock returns True → returns None, log mentions previous user/timestamp; previous reason logged only when present
  • remove_task_lock returns False → returns 1

LockStatuslock.py:78

Patch osism.commands.lock.utils.is_task_locked.

  • Locked with reason → "Tasks are LOCKED by ..." plus "Reason: ..." logged
  • Locked without reason and with missing user/timestamp keys → falls back to "unknown"
  • Not locked → "Tasks are UNLOCKED"

Run (service) — service.py:13

Patch osism.commands.service.subprocess.Popen, osism.commands.service.utils.check_task_lock_and_exit; env via monkeypatch.

  • take_action (service.py:19) calls check_task_lock_and_exit first for every service type
  • apiuvicorn osism.api:app --host 0.0.0.0 --port 8000, p.wait() called
  • listenerpython3 -c 'from osism.services import listener; listener.main()'
  • beat → exactly seven Popen calls, one per task module (ansible, ceph, conductor, kolla, netbox, openstack, reconciler), each with --broker=redis://redis beat -s /tmp/celerybeat-schedule-<module>.db; wait() called on every process (inspect call_args_list)
  • flowercelery --broker=redis://redis flower
  • reconciler → same concurrency/env logic and command string as reconciler.py:14
  • Unknown service type (e.g. "bogus") → no Popen call, returns None (silent no-op; assert current behavior)

watchdog branch

Patch watchdog.observers.polling.PollingObserver at the source (lazy import inside the branch) and make osism.commands.service.time.sleep raise a sentinel exception to escape while True.

  • Observer scheduled on /opt/configuration/inventory with recursive=True, start() called
  • The finally block calls observer.stop() and observer.join() even when the loop exits via the sentinel exception (use pytest.raises)
  • The event handler's on_any_event is rebound to self.watchdog_inventory_on_any_event

Run.watchdog_inventory_on_any_event()service.py:97

Patch osism.tasks.reconciler.run.delay (lazy import).

  • Any event object → reconciler.run.delay() called exactly once with no arguments

Run (container) — container.py:14

Patch osism.commands.container.subprocess.call, osism.commands.container.ensure_known_hosts_file, osism.commands.container.prompt (imported at module top from prompt_toolkit).

  • Parser (container.py:15): host nargs=1 plus command as argparse.REMAINDERparse_args(["node1", "ps", "-a"]) yields command == ["ps", "-a"]
  • Non-interactive: command given → exactly one subprocess.call whose command string contains /usr/bin/ssh -i /ansible/secrets/id_rsa.operator, -o UserKnownHostsFile=/share/known_hosts (KNOWN_HOSTS_PATH), <OPERATOR_USER>@node1 and docker ps -a; prompt not called
  • ensure_known_hosts_file returns False → warning logged, SSH call still executed
  • Interactive: empty commandprompt loop; side_effect=["ps", "exit"] → one subprocess.call with docker ps, then loop terminates
  • Loop exit keywords: "Exit", "exit", "EXIT" each break the loop immediately (parametrize); prompt returning the keyword first → zero subprocess.call invocations
  • Operator user comes from settings.OPERATOR_USER — patch osism.commands.container.settings.OPERATOR_USER for a deterministic assertion

Sync (configuration) — configuration.py:11

Patch osism.tasks.ansible.run.delay, osism.tasks.handle_task (both lazily imported inside take_action), osism.commands.configuration.utils.check_task_lock_and_exit.

  • Parser (configuration.py:12): REMAINDER argumentsparse_args(["-e", "foo=1"]) forwards ["-e", "foo=1"]
  • take_action (configuration.py:19): delay called with ("manager", "configuration", arguments) and auto_release_time=60
  • handle_task called with the task object, True, and timeout 60; its return value is returned as the command's rc (e.g. 0 and 2)
  • check_task_lock_and_exit invoked before scheduling
  • Quirk to pin down: the third handle_task argument is the Python builtin format function, not a format string like "log" (rc = handle_task(t, True, format, 60)) — assert current behavior and note the latent bug in a comment

Maintenance / Bootstrapset.py:9 / set.py:42

Patch osism.tasks.ansible.run.delay (lazy import), osism.commands.set.utils.check_task_lock_and_exit.

  • Maintenance.take_action (set.py:20): host from nargs=1; delay called with ("generic", "state-maintenance", ["-e status=True", "-l node1"])
  • Bootstrap.take_action (set.py:53): same but playbook "state-bootstrap"
  • check_task_lock_and_exit called before scheduling in both
  • Parametrize over the two classes — they differ only in the playbook name

NoMaintenance / NoBootstrapnoset.py:9 / noset.py:42

Patch osism.tasks.ansible.run.delay, osism.commands.noset.utils.check_task_lock_and_exit.

  • NoMaintenance.take_action (noset.py:20): delay called with ("generic", "state-maintenance", ["-e status=False", "-l node1"])
  • NoBootstrap.take_action (noset.py:53): playbook "state-bootstrap", -e status=False
  • check_task_lock_and_exit called before scheduling in both
  • Mirror the parametrized structure of test_set.py (status=True vs. status=False is the only difference between the module pairs)

Mocking hints

  • Instantiate commands as Cmd(MagicMock(), MagicMock()) and build args via cmd.get_parser("test").parse_args([...]), matching the existing tests/unit/commands/test_reconciler.py.
  • utils.check_task_lock_and_exit calls exit(1) when locked — always patch it at the command module (osism.commands.<mod>.utils.check_task_lock_and_exit) and only assert it was invoked; the lock logic itself is covered by the osism.utils tier. Do not test the locked/SystemExit path here.
  • Lazy imports inside take_action mean patching at the source module: osism.tasks.ansible.run.delay, osism.tasks.reconciler.run.delay, osism.tasks.handle_task, celery.Celery. The Celery task objects are imported attributes, so mocker.patch("osism.tasks.ansible.run.delay") works without a broker — tests never execute the task bodies.
  • delay() return values: a MagicMock() suffices; Sync only reads t.task_id / t.id.
  • subprocess.Popen / subprocess.call are patched per command module; assert the exact shell command via mock.call_args[0][0] and shell=True where relevant.
  • Env-dependent parser defaults (OSISM_CELERY_CONCURRENCY, OSISM_TASK_TIMEOUT) are evaluated at get_parser() time — set/del env vars with monkeypatch before building the parser. argparse applies type=int to string defaults, so env values arrive as int.
  • set shadows a builtin module name — import as from osism.commands import set as set_cmd (and noset normally).
  • For the watchdog loop, a custom class _Stop(Exception) as time.sleep side effect keeps the test independent of KeyboardInterrupt handling.

Definition of Done

  • tests/unit/commands/test_task.py, test_worker.py, test_lock.py, test_service.py, test_container.py, test_configuration.py, test_set.py, test_noset.py created
  • tests/unit/commands/test_reconciler.py extended with the listed gap cases (no duplication of the existing timeout test)
  • All listed cases covered
  • pytest --cov shows ≥ 90 % for each module in scope
  • pipenv run pytest tests/unit/commands/ passes locally
  • flake8, mypy, python-black remain green
  • Zuul job python-osism-unit-tests passes

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Ready

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions