Background
Follow-up to #2192 (foundation) and PR #2193 (pytest + Zuul infrastructure). Part of Tier 8 (#2199). Combined issue for the nine small task-lifecycle CLI command modules under osism/commands/ (~579 LOC total): task.py (20 LOC, revoke Celery tasks), worker.py (57 LOC, start Celery workers), reconciler.py (71 LOC, deprecated worker runner + inventory sync), lock.py (91 LOC, task-lock management), service.py (100 LOC, service launcher incl. inventory watchdog), container.py (57 LOC, remote docker commands via SSH), configuration.py (39 LOC, configuration sync), set.py and noset.py (72 LOC each, maintenance/bootstrap state toggles). All are thin cliff Command classes orchestrating Celery .delay() calls, subprocess, and the Redis task lock — mock-heavy but individually small.
Scope
Add tests/unit/commands/test_task.py, test_worker.py, test_lock.py, test_service.py, test_container.py, test_configuration.py, test_set.py, test_noset.py and extend the existing tests/unit/commands/test_reconciler.py.
Already covered (do not duplicate): tests/unit/commands/test_reconciler.py has one test — Sync.take_action returns 1 when utils.fetch_task_output raises TimeoutError. Only the gap cases listed below are in scope for that module.
Follow the established pattern from test_reconciler.py / test_wait.py: cmd = module.Class(MagicMock(), MagicMock()), then parsed_args = cmd.get_parser("test").parse_args([...]).
This issue may be split further during implementation if it turns out too large (natural sub-chunks: set/noset, lock, worker/service/reconciler) — Tier 3 issues did the same.
Test targets
Revoke — task.py:6
Patch celery.Celery at the source (lazy import inside take_action).
- Parser: positional
task with nargs=1 → parse_args(["abc"]) yields parsed_args.task == ["abc"]
take_action: Celery constructed with name "task", config_from_object called with osism.tasks.Config, app.control.revoke called with terminate=True
- Quirk to pin down:
task = parsed_args.task is the one-element list (nargs=1 is never unpacked), so control.revoke receives ["abc"], not "abc" — assert current behavior and note the latent bug in a comment
Run (worker) — worker.py:11
Patch osism.commands.worker.subprocess.Popen, osism.commands.worker.utils.check_task_lock_and_exit; env via monkeypatch.
get_parser() — worker.py:12
- Default
--number-of-workers is min(cpu_count, 4) when OSISM_CELERY_CONCURRENCY is unset (use monkeypatch.delenv(..., raising=False); patch multiprocessing.cpu_count for determinism)
OSISM_CELERY_CONCURRENCY=7 set before get_parser() → default is 7 (the env var is read at parser-build time)
take_action() — worker.py:30
utils.check_task_lock_and_exit is called before anything else
- Queue mapping:
openstack / netbox / conductor → celery -A osism.tasks.<queue> worker -n <queue> ... -Q <queue>
osism-kubernetes → both tasks module and queue become kubernetes
- Suffix stripping:
kolla-ansible → -A osism.tasks.kolla ... -Q kolla-ansible; ceph-ansible → osism.tasks.ceph
- Special case
osism-ansible → tasks "osism" triggers the dedicated branch: -A osism.tasks.ansible worker -n osism-ansible ... -Q osism-ansible
-c <n> in the command string reflects --number-of-workers 3
p.wait() is called on the Popen return value
- Edge documenting current behavior: an unknown short type (e.g.
"foo") yields "foo"[:-8] == "" → command celery -A osism.tasks. worker ... — assert as-is with a comment (latent bug, no validation)
Run (reconciler, deprecated) — reconciler.py:13 — gap
Patch osism.commands.reconciler.subprocess.Popen; env via monkeypatch.
take_action (reconciler.py:14) logs the deprecation warning ("deprecated ... Use osism service reconciler")
- Concurrency:
OSISM_CELERY_CONCURRENCY unset → min(cpu_count, 4); set to 2 → -c 2 in the command string
- Command string is
celery -A osism.tasks.reconciler worker -n reconciler --loglevel=INFO -Q reconciler -c <n> and p.wait() is called
Sync (reconciler) — reconciler.py:30 — gaps only
Patch osism.tasks.reconciler.run.delay (lazy import at the call site), osism.commands.reconciler.utils.fetch_task_output, osism.commands.reconciler.utils.check_task_lock_and_exit. Timeout → 1 is already covered.
- Wait path (default):
delay called with publish=True, fetch_task_output called with the task id and timeout=300; its return value is returned by take_action
--no-wait: delay called with publish=False, fetch_task_output not called, returns None, "No more output" logged
--task-timeout 60 → fetch_task_output called with timeout=60
OSISM_TASK_TIMEOUT=600 set before get_parser() → parsed default is int 600 (argparse applies type=int to string defaults)
check_task_lock_and_exit invoked before scheduling
Lock — lock.py:9
Patch osism.commands.lock.utils.is_task_locked, osism.commands.lock.utils.set_task_lock.
- Already locked (
{"locked": True, "user": "alice", "timestamp": "..."}) → warning logged, returns None, set_task_lock not called; with reason present an additional "Existing reason" warning is logged
- Not locked (
None or {"locked": False}), set_task_lock returns True → returns None; --user bob --reason maintenance passed through to set_task_lock("bob", "maintenance")
--user omitted → defaults to settings.OPERATOR_USER ("dragon" unless OSISM_OPERATOR_USER set — patch osism.commands.lock.settings.OPERATOR_USER for determinism); note the default is also baked into the --user help string at parser-build time
set_task_lock returns False → "Failed to set task lock" logged, returns 1
Unlock — lock.py:51
Patch osism.commands.lock.utils.is_task_locked, osism.commands.lock.utils.remove_task_lock.
- Not locked (
None or {"locked": False}) → "Tasks are not currently locked", returns None, remove_task_lock not called
- Locked,
remove_task_lock returns True → returns None, log mentions previous user/timestamp; previous reason logged only when present
remove_task_lock returns False → returns 1
LockStatus — lock.py:78
Patch osism.commands.lock.utils.is_task_locked.
- Locked with reason → "Tasks are LOCKED by ..." plus "Reason: ..." logged
- Locked without reason and with missing
user/timestamp keys → falls back to "unknown"
- Not locked → "Tasks are UNLOCKED"
Run (service) — service.py:13
Patch osism.commands.service.subprocess.Popen, osism.commands.service.utils.check_task_lock_and_exit; env via monkeypatch.
take_action (service.py:19) calls check_task_lock_and_exit first for every service type
api → uvicorn osism.api:app --host 0.0.0.0 --port 8000, p.wait() called
listener → python3 -c 'from osism.services import listener; listener.main()'
beat → exactly seven Popen calls, one per task module (ansible, ceph, conductor, kolla, netbox, openstack, reconciler), each with --broker=redis://redis beat -s /tmp/celerybeat-schedule-<module>.db; wait() called on every process (inspect call_args_list)
flower → celery --broker=redis://redis flower
reconciler → same concurrency/env logic and command string as reconciler.py:14
- Unknown service type (e.g.
"bogus") → no Popen call, returns None (silent no-op; assert current behavior)
watchdog branch
Patch watchdog.observers.polling.PollingObserver at the source (lazy import inside the branch) and make osism.commands.service.time.sleep raise a sentinel exception to escape while True.
- Observer scheduled on
/opt/configuration/inventory with recursive=True, start() called
- The
finally block calls observer.stop() and observer.join() even when the loop exits via the sentinel exception (use pytest.raises)
- The event handler's
on_any_event is rebound to self.watchdog_inventory_on_any_event
Run.watchdog_inventory_on_any_event() — service.py:97
Patch osism.tasks.reconciler.run.delay (lazy import).
- Any event object →
reconciler.run.delay() called exactly once with no arguments
Run (container) — container.py:14
Patch osism.commands.container.subprocess.call, osism.commands.container.ensure_known_hosts_file, osism.commands.container.prompt (imported at module top from prompt_toolkit).
- Parser (
container.py:15): host nargs=1 plus command as argparse.REMAINDER → parse_args(["node1", "ps", "-a"]) yields command == ["ps", "-a"]
- Non-interactive: command given → exactly one
subprocess.call whose command string contains /usr/bin/ssh -i /ansible/secrets/id_rsa.operator, -o UserKnownHostsFile=/share/known_hosts (KNOWN_HOSTS_PATH), <OPERATOR_USER>@node1 and docker ps -a; prompt not called
ensure_known_hosts_file returns False → warning logged, SSH call still executed
- Interactive: empty
command → prompt loop; side_effect=["ps", "exit"] → one subprocess.call with docker ps, then loop terminates
- Loop exit keywords:
"Exit", "exit", "EXIT" each break the loop immediately (parametrize); prompt returning the keyword first → zero subprocess.call invocations
- Operator user comes from
settings.OPERATOR_USER — patch osism.commands.container.settings.OPERATOR_USER for a deterministic assertion
Sync (configuration) — configuration.py:11
Patch osism.tasks.ansible.run.delay, osism.tasks.handle_task (both lazily imported inside take_action), osism.commands.configuration.utils.check_task_lock_and_exit.
- Parser (
configuration.py:12): REMAINDER arguments → parse_args(["-e", "foo=1"]) forwards ["-e", "foo=1"]
take_action (configuration.py:19): delay called with ("manager", "configuration", arguments) and auto_release_time=60
handle_task called with the task object, True, and timeout 60; its return value is returned as the command's rc (e.g. 0 and 2)
check_task_lock_and_exit invoked before scheduling
- Quirk to pin down: the third
handle_task argument is the Python builtin format function, not a format string like "log" (rc = handle_task(t, True, format, 60)) — assert current behavior and note the latent bug in a comment
Maintenance / Bootstrap — set.py:9 / set.py:42
Patch osism.tasks.ansible.run.delay (lazy import), osism.commands.set.utils.check_task_lock_and_exit.
Maintenance.take_action (set.py:20): host from nargs=1; delay called with ("generic", "state-maintenance", ["-e status=True", "-l node1"])
Bootstrap.take_action (set.py:53): same but playbook "state-bootstrap"
check_task_lock_and_exit called before scheduling in both
- Parametrize over the two classes — they differ only in the playbook name
NoMaintenance / NoBootstrap — noset.py:9 / noset.py:42
Patch osism.tasks.ansible.run.delay, osism.commands.noset.utils.check_task_lock_and_exit.
NoMaintenance.take_action (noset.py:20): delay called with ("generic", "state-maintenance", ["-e status=False", "-l node1"])
NoBootstrap.take_action (noset.py:53): playbook "state-bootstrap", -e status=False
check_task_lock_and_exit called before scheduling in both
- Mirror the parametrized structure of
test_set.py (status=True vs. status=False is the only difference between the module pairs)
Mocking hints
- Instantiate commands as
Cmd(MagicMock(), MagicMock()) and build args via cmd.get_parser("test").parse_args([...]), matching the existing tests/unit/commands/test_reconciler.py.
utils.check_task_lock_and_exit calls exit(1) when locked — always patch it at the command module (osism.commands.<mod>.utils.check_task_lock_and_exit) and only assert it was invoked; the lock logic itself is covered by the osism.utils tier. Do not test the locked/SystemExit path here.
- Lazy imports inside
take_action mean patching at the source module: osism.tasks.ansible.run.delay, osism.tasks.reconciler.run.delay, osism.tasks.handle_task, celery.Celery. The Celery task objects are imported attributes, so mocker.patch("osism.tasks.ansible.run.delay") works without a broker — tests never execute the task bodies.
delay() return values: a MagicMock() suffices; Sync only reads t.task_id / t.id.
subprocess.Popen / subprocess.call are patched per command module; assert the exact shell command via mock.call_args[0][0] and shell=True where relevant.
- Env-dependent parser defaults (
OSISM_CELERY_CONCURRENCY, OSISM_TASK_TIMEOUT) are evaluated at get_parser() time — set/del env vars with monkeypatch before building the parser. argparse applies type=int to string defaults, so env values arrive as int.
set shadows a builtin module name — import as from osism.commands import set as set_cmd (and noset normally).
- For the watchdog loop, a custom
class _Stop(Exception) as time.sleep side effect keeps the test independent of KeyboardInterrupt handling.
Definition of Done
Dependencies
Background
Follow-up to #2192 (foundation) and PR #2193 (pytest + Zuul infrastructure). Part of Tier 8 (#2199). Combined issue for the nine small task-lifecycle CLI command modules under
osism/commands/(~579 LOC total):task.py(20 LOC, revoke Celery tasks),worker.py(57 LOC, start Celery workers),reconciler.py(71 LOC, deprecated worker runner + inventory sync),lock.py(91 LOC, task-lock management),service.py(100 LOC, service launcher incl. inventory watchdog),container.py(57 LOC, remotedockercommands via SSH),configuration.py(39 LOC, configuration sync),set.pyandnoset.py(72 LOC each, maintenance/bootstrap state toggles). All are thin cliffCommandclasses orchestrating Celery.delay()calls,subprocess, and the Redis task lock — mock-heavy but individually small.Scope
Add
tests/unit/commands/test_task.py,test_worker.py,test_lock.py,test_service.py,test_container.py,test_configuration.py,test_set.py,test_noset.pyand extend the existingtests/unit/commands/test_reconciler.py.Already covered (do not duplicate):
tests/unit/commands/test_reconciler.pyhas one test —Sync.take_actionreturns1whenutils.fetch_task_outputraisesTimeoutError. Only the gap cases listed below are in scope for that module.Follow the established pattern from
test_reconciler.py/test_wait.py:cmd = module.Class(MagicMock(), MagicMock()), thenparsed_args = cmd.get_parser("test").parse_args([...]).This issue may be split further during implementation if it turns out too large (natural sub-chunks:
set/noset,lock,worker/service/reconciler) — Tier 3 issues did the same.Test targets
Revoke—task.py:6Patch
celery.Celeryat the source (lazy import insidetake_action).taskwithnargs=1→parse_args(["abc"])yieldsparsed_args.task == ["abc"]take_action:Celeryconstructed with name"task",config_from_objectcalled withosism.tasks.Config,app.control.revokecalled withterminate=Truetask = parsed_args.taskis the one-element list (nargs=1is never unpacked), socontrol.revokereceives["abc"], not"abc"— assert current behavior and note the latent bug in a commentRun(worker) —worker.py:11Patch
osism.commands.worker.subprocess.Popen,osism.commands.worker.utils.check_task_lock_and_exit; env viamonkeypatch.get_parser()—worker.py:12--number-of-workersismin(cpu_count, 4)whenOSISM_CELERY_CONCURRENCYis unset (usemonkeypatch.delenv(..., raising=False); patchmultiprocessing.cpu_countfor determinism)OSISM_CELERY_CONCURRENCY=7set beforeget_parser()→ default is7(the env var is read at parser-build time)take_action()—worker.py:30utils.check_task_lock_and_exitis called before anything elseopenstack/netbox/conductor→celery -A osism.tasks.<queue> worker -n <queue> ... -Q <queue>osism-kubernetes→ both tasks module and queue becomekuberneteskolla-ansible→-A osism.tasks.kolla ... -Q kolla-ansible;ceph-ansible→osism.tasks.cephosism-ansible→ tasks"osism"triggers the dedicated branch:-A osism.tasks.ansible worker -n osism-ansible ... -Q osism-ansible-c <n>in the command string reflects--number-of-workers 3p.wait()is called on thePopenreturn value"foo") yields"foo"[:-8] == ""→ commandcelery -A osism.tasks. worker ...— assert as-is with a comment (latent bug, no validation)Run(reconciler, deprecated) —reconciler.py:13— gapPatch
osism.commands.reconciler.subprocess.Popen; env viamonkeypatch.take_action(reconciler.py:14) logs the deprecation warning ("deprecated ... Use osism service reconciler")OSISM_CELERY_CONCURRENCYunset →min(cpu_count, 4); set to2→-c 2in the command stringcelery -A osism.tasks.reconciler worker -n reconciler --loglevel=INFO -Q reconciler -c <n>andp.wait()is calledSync(reconciler) —reconciler.py:30— gaps onlyPatch
osism.tasks.reconciler.run.delay(lazy import at the call site),osism.commands.reconciler.utils.fetch_task_output,osism.commands.reconciler.utils.check_task_lock_and_exit. Timeout →1is already covered.delaycalled withpublish=True,fetch_task_outputcalled with the task id andtimeout=300; its return value is returned bytake_action--no-wait:delaycalled withpublish=False,fetch_task_outputnot called, returnsNone, "No more output" logged--task-timeout 60→fetch_task_outputcalled withtimeout=60OSISM_TASK_TIMEOUT=600set beforeget_parser()→ parsed default isint600 (argparse appliestype=intto string defaults)check_task_lock_and_exitinvoked before schedulingLock—lock.py:9Patch
osism.commands.lock.utils.is_task_locked,osism.commands.lock.utils.set_task_lock.{"locked": True, "user": "alice", "timestamp": "..."}) → warning logged, returnsNone,set_task_locknot called; withreasonpresent an additional "Existing reason" warning is loggedNoneor{"locked": False}),set_task_lockreturnsTrue→ returnsNone;--user bob --reason maintenancepassed through toset_task_lock("bob", "maintenance")--useromitted → defaults tosettings.OPERATOR_USER("dragon"unlessOSISM_OPERATOR_USERset — patchosism.commands.lock.settings.OPERATOR_USERfor determinism); note the default is also baked into the--userhelp string at parser-build timeset_task_lockreturnsFalse→ "Failed to set task lock" logged, returns1Unlock—lock.py:51Patch
osism.commands.lock.utils.is_task_locked,osism.commands.lock.utils.remove_task_lock.Noneor{"locked": False}) → "Tasks are not currently locked", returnsNone,remove_task_locknot calledremove_task_lockreturnsTrue→ returnsNone, log mentions previous user/timestamp; previousreasonlogged only when presentremove_task_lockreturnsFalse→ returns1LockStatus—lock.py:78Patch
osism.commands.lock.utils.is_task_locked.user/timestampkeys → falls back to"unknown"Run(service) —service.py:13Patch
osism.commands.service.subprocess.Popen,osism.commands.service.utils.check_task_lock_and_exit; env viamonkeypatch.take_action(service.py:19) callscheck_task_lock_and_exitfirst for every service typeapi→uvicorn osism.api:app --host 0.0.0.0 --port 8000,p.wait()calledlistener→python3 -c 'from osism.services import listener; listener.main()'beat→ exactly sevenPopencalls, one per task module (ansible,ceph,conductor,kolla,netbox,openstack,reconciler), each with--broker=redis://redis beat -s /tmp/celerybeat-schedule-<module>.db;wait()called on every process (inspectcall_args_list)flower→celery --broker=redis://redis flowerreconciler→ same concurrency/env logic and command string asreconciler.py:14"bogus") → noPopencall, returnsNone(silent no-op; assert current behavior)watchdogbranchPatch
watchdog.observers.polling.PollingObserverat the source (lazy import inside the branch) and makeosism.commands.service.time.sleepraise a sentinel exception to escapewhile True./opt/configuration/inventorywithrecursive=True,start()calledfinallyblock callsobserver.stop()andobserver.join()even when the loop exits via the sentinel exception (usepytest.raises)on_any_eventis rebound toself.watchdog_inventory_on_any_eventRun.watchdog_inventory_on_any_event()—service.py:97Patch
osism.tasks.reconciler.run.delay(lazy import).reconciler.run.delay()called exactly once with no argumentsRun(container) —container.py:14Patch
osism.commands.container.subprocess.call,osism.commands.container.ensure_known_hosts_file,osism.commands.container.prompt(imported at module top fromprompt_toolkit).container.py:15):hostnargs=1pluscommandasargparse.REMAINDER→parse_args(["node1", "ps", "-a"])yieldscommand == ["ps", "-a"]subprocess.callwhose command string contains/usr/bin/ssh -i /ansible/secrets/id_rsa.operator,-o UserKnownHostsFile=/share/known_hosts(KNOWN_HOSTS_PATH),<OPERATOR_USER>@node1anddocker ps -a;promptnot calledensure_known_hosts_filereturnsFalse→ warning logged, SSH call still executedcommand→promptloop;side_effect=["ps", "exit"]→ onesubprocess.callwithdocker ps, then loop terminates"Exit","exit","EXIT"each break the loop immediately (parametrize);promptreturning the keyword first → zerosubprocess.callinvocationssettings.OPERATOR_USER— patchosism.commands.container.settings.OPERATOR_USERfor a deterministic assertionSync(configuration) —configuration.py:11Patch
osism.tasks.ansible.run.delay,osism.tasks.handle_task(both lazily imported insidetake_action),osism.commands.configuration.utils.check_task_lock_and_exit.configuration.py:12): REMAINDERarguments→parse_args(["-e", "foo=1"])forwards["-e", "foo=1"]take_action(configuration.py:19):delaycalled with("manager", "configuration", arguments)andauto_release_time=60handle_taskcalled with the task object,True, and timeout60; its return value is returned as the command's rc (e.g.0and2)check_task_lock_and_exitinvoked before schedulinghandle_taskargument is the Python builtinformatfunction, not a format string like"log"(rc = handle_task(t, True, format, 60)) — assert current behavior and note the latent bug in a commentMaintenance/Bootstrap—set.py:9/set.py:42Patch
osism.tasks.ansible.run.delay(lazy import),osism.commands.set.utils.check_task_lock_and_exit.Maintenance.take_action(set.py:20): host fromnargs=1;delaycalled with("generic", "state-maintenance", ["-e status=True", "-l node1"])Bootstrap.take_action(set.py:53): same but playbook"state-bootstrap"check_task_lock_and_exitcalled before scheduling in bothNoMaintenance/NoBootstrap—noset.py:9/noset.py:42Patch
osism.tasks.ansible.run.delay,osism.commands.noset.utils.check_task_lock_and_exit.NoMaintenance.take_action(noset.py:20):delaycalled with("generic", "state-maintenance", ["-e status=False", "-l node1"])NoBootstrap.take_action(noset.py:53): playbook"state-bootstrap",-e status=Falsecheck_task_lock_and_exitcalled before scheduling in bothtest_set.py(status=Truevs.status=Falseis the only difference between the module pairs)Mocking hints
Cmd(MagicMock(), MagicMock())and build args viacmd.get_parser("test").parse_args([...]), matching the existingtests/unit/commands/test_reconciler.py.utils.check_task_lock_and_exitcallsexit(1)when locked — always patch it at the command module (osism.commands.<mod>.utils.check_task_lock_and_exit) and only assert it was invoked; the lock logic itself is covered by theosism.utilstier. Do not test the locked/SystemExitpath here.take_actionmean patching at the source module:osism.tasks.ansible.run.delay,osism.tasks.reconciler.run.delay,osism.tasks.handle_task,celery.Celery. The Celery task objects are imported attributes, somocker.patch("osism.tasks.ansible.run.delay")works without a broker — tests never execute the task bodies.delay()return values: aMagicMock()suffices;Synconly readst.task_id/t.id.subprocess.Popen/subprocess.callare patched per command module; assert the exact shell command viamock.call_args[0][0]andshell=Truewhere relevant.OSISM_CELERY_CONCURRENCY,OSISM_TASK_TIMEOUT) are evaluated atget_parser()time — set/del env vars withmonkeypatchbefore building the parser. argparse appliestype=intto string defaults, so env values arrive asint.setshadows a builtin module name — import asfrom osism.commands import set as set_cmd(andnosetnormally).class _Stop(Exception)astime.sleepside effect keeps the test independent ofKeyboardInterrupthandling.Definition of Done
tests/unit/commands/test_task.py,test_worker.py,test_lock.py,test_service.py,test_container.py,test_configuration.py,test_set.py,test_noset.pycreatedtests/unit/commands/test_reconciler.pyextended with the listed gap cases (no duplication of the existing timeout test)pytest --covshows ≥ 90 % for each module in scopepipenv run pytest tests/unit/commands/passes locallyflake8,mypy,python-blackremain greenpython-osism-unit-testspassesDependencies