Skip to content

Commit 6834094

Browse files
cexllSWE-Agent.ai
andcommitted
feat: add harness skill with hooks install/uninstall support (#156)
Add multi-session autonomous agent harness with progress checkpointing, failure recovery, task dependencies, and post-completion self-reflection. - Add harness module to config.json (copy_dir with hooks.json) - Add 7 hook scripts: stop, sessionstart, teammateidle, subagentstop, claim, renew, self-reflect-stop + shared _harness_common.py - Fix self-reflect-stop: only triggers when harness was initialized (checks harness-tasks.json existence), not on every session - Add unmerge_hooks_from_settings() to uninstall.py for clean hook removal - Add unit tests (57 tests) and E2E test (100 tasks + 5 self-reflect) Generated with SWE-Agent.ai Co-Authored-By: SWE-Agent.ai <noreply@swe-agent.ai>
1 parent 62309d1 commit 6834094

14 files changed

Lines changed: 3051 additions & 10 deletions

config.json

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,18 @@
196196
}
197197
]
198198
},
199+
"harness": {
200+
"enabled": false,
201+
"description": "Multi-session autonomous agent harness with progress checkpointing, failure recovery, task dependencies, and post-completion self-reflection",
202+
"operations": [
203+
{
204+
"type": "copy_dir",
205+
"source": "skills/harness",
206+
"target": "skills/harness",
207+
"description": "Install harness skill with hooks (Stop, SessionStart, TeammateIdle, SubagentStop, self-reflect)"
208+
}
209+
]
210+
},
199211
"claudekit": {
200212
"enabled": false,
201213
"description": "ClaudeKit workflow: skills/do + global hooks (pre-bash, inject-spec, log-prompt)",

skills/harness/SKILL.md

Lines changed: 53 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,15 @@ Executable protocol enabling any agent task to run continuously across multiple
2626
/harness add "task description" # Add a task to the list
2727
```
2828

29+
## Activation Marker
30+
31+
Hooks only take effect when `.harness-active` marker file exists in the harness root (same directory as `harness-tasks.json`).
32+
Hook 注册配置在 `hooks/hooks.json`
33+
34+
- `/harness init` and `/harness run` MUST create this marker: `touch <project-path>/.harness-active`
35+
- When all tasks complete (no pending/in_progress/retryable left), remove it: `rm <project-path>/.harness-active`
36+
- Without this marker, all hooks are no-ops — they exit 0 immediately
37+
2938
## Progress Persistence (Dual-File System)
3039

3140
Maintain two files in the project working directory:
@@ -54,6 +63,7 @@ Free-text log of all agent actions across sessions. Never truncate.
5463
"version": 2,
5564
"created": "2025-07-01T10:00:00Z",
5665
"session_config": {
66+
"concurrency_mode": "exclusive",
5767
"max_tasks_per_session": 20,
5868
"max_sessions": 50
5969
},
@@ -126,6 +136,8 @@ Free-text log of all agent actions across sessions. Never truncate.
126136

127137
Task statuses: `pending``in_progress` (transient, set only during active execution) → `completed` or `failed`. A task found as `in_progress` at session start means the previous session was interrupted — handle via Context Window Recovery Protocol.
128138

139+
In concurrent mode (see Concurrency Control), tasks may also carry claim metadata: `claimed_by` and `lease_expires_at` (ISO timestamp).
140+
129141
**Session boundary**: A session starts when the agent begins executing the Session Start protocol and ends when a Stopping Condition is met or the context window resets. Each session gets a unique `SESSION-N` identifier (N = `session_count` after increment).
130142

131143
## Concurrency Control
@@ -134,7 +146,23 @@ Before modifying `harness-tasks.json`, acquire an exclusive lock using portable
134146

135147
```bash
136148
# Acquire lock (fail fast if another agent is running)
137-
LOCKDIR="/tmp/harness-$(printf '%s' "$(pwd)" | shasum -a 256 2>/dev/null || sha256sum | cut -c1-8).lock"
149+
# Lock key must be stable even if invoked from a subdirectory.
150+
ROOT="$PWD"
151+
SEARCH="$PWD"
152+
while [ "$SEARCH" != "/" ] && [ ! -f "$SEARCH/harness-tasks.json" ]; do
153+
SEARCH="$(dirname "$SEARCH")"
154+
done
155+
if [ -f "$SEARCH/harness-tasks.json" ]; then
156+
ROOT="$SEARCH"
157+
fi
158+
159+
PWD_HASH="$(
160+
printf '%s' "$ROOT" |
161+
(shasum -a 256 2>/dev/null || sha256sum 2>/dev/null) |
162+
awk '{print $1}' |
163+
cut -c1-16
164+
)"
165+
LOCKDIR="/tmp/harness-${PWD_HASH:-unknown}.lock"
138166
if ! mkdir "$LOCKDIR" 2>/dev/null; then
139167
# Check if lock holder is still alive
140168
LOCK_PID=$(cat "$LOCKDIR/pid" 2>/dev/null)
@@ -158,15 +186,24 @@ trap 'rm -rf "$LOCKDIR"' EXIT
158186
Log lock acquisition: `[timestamp] [SESSION-N] LOCK acquired (pid=<PID>)`
159187
Log lock release: `[timestamp] [SESSION-N] LOCK released`
160188

161-
The lock is held for the entire session. The `trap EXIT` handler releases it automatically on normal exit, errors, or signals. Never release the lock between tasks within a session.
189+
Modes:
190+
191+
- **Exclusive (default)**: hold the lock for the entire session (the `trap EXIT` handler releases it automatically). Any second session in the same state root fails fast.
192+
- **Concurrent (opt-in via `session_config.concurrency_mode: "concurrent"`)**: treat this as a **state transaction lock**. Hold it only while reading/modifying/writing `harness-tasks.json` (including `.bak`/`.tmp`) and appending to `harness-progress.txt`. Release it immediately before doing real work.
193+
194+
Concurrent mode invariants:
195+
196+
- All workers MUST point at the same state root (the directory that contains `harness-tasks.json`). If you are using separate worktrees/clones, pin it explicitly (e.g., `HARNESS_STATE_ROOT=/abs/path/to/state-root`).
197+
- Task selection is advisory; the real gate is **atomic claim** under the lock: set `status="in_progress"`, set `claimed_by` (stable worker id, e.g., `HARNESS_WORKER_ID`), set `lease_expires_at`. If claim fails (already `in_progress` with a valid lease), pick another eligible task and retry.
198+
- Never run two workers in the same git working directory. Use separate worktrees/clones. Otherwise rollback (`git reset --hard` / `git clean -fd`) will destroy other workers.
162199

163200
## Infinite Loop Protocol
164201

165202
### Session Start (Execute Every Time)
166203

167204
1. **Read state**: Read last 200 lines of `harness-progress.txt` + full `harness-tasks.json`. If JSON is unparseable, see JSON corruption recovery in Error Handling.
168205
2. **Read git**: Run `git log --oneline -20` and `git diff --stat` to detect uncommitted work
169-
3. **Acquire lock**: Fail if another session is active
206+
3. **Acquire lock** (mode-dependent): Exclusive mode fails if another session is active. Concurrent mode uses the lock only for state transactions.
170207
4. **Recover interrupted tasks** (see Context Window Recovery below)
171208
5. **Health check**: Run `harness-init.sh` if it exists
172209
6. **Track session**: Increment `session_count` in JSON. Check `session_count` against `max_sessions` — if reached, log STATS and STOP. Initialize per-session task counter to 0.
@@ -189,13 +226,13 @@ Then pick the next task in this priority order:
189226

190227
For each task, execute this exact sequence:
191228

192-
1. **Claim**: Record `started_at_commit` = current HEAD hash. Set status to `in_progress`, log `Starting [<task-id>] <title> (base=<hash>)`
229+
1. **Claim** (atomic, under lock): Record `started_at_commit` = current HEAD hash. Set status to `in_progress`, set `claimed_by`, set `lease_expires_at`, log `Starting [<task-id>] <title> (base=<hash>)`. If the task is already claimed (`in_progress` with a valid lease), pick another eligible task and retry.
193230
2. **Execute with checkpoints**: Perform the work. After each significant step, log:
194231
```
195232
[timestamp] [SESSION-N] CHECKPOINT [task-id] step=M/N "description of what was done"
196233
```
197-
Also append to the task's `checkpoints` array: `{ "step": M, "total": N, "description": "...", "timestamp": "ISO" }`
198-
3. **Validate**: Run the task's `validation.command` wrapped with `timeout`: `timeout <timeout_seconds> <command>`. If no validation command, skip. Before running, verify the command exists (e.g., `command -v <binary>`) — if missing, treat as `ENV_SETUP` error.
234+
Also append to the task's `checkpoints` array: `{ "step": M, "total": N, "description": "...", "timestamp": "ISO" }`. In concurrent mode, renew the lease at each checkpoint (push `lease_expires_at` forward).
235+
3. **Validate**: Run the task's `validation.command` with a timeout wrapper (prefer `timeout`; on macOS use `gtimeout` from coreutils). If `validation.command` is empty/null, log `ERROR [<task-id>] [CONFIG] Missing validation.command` and STOP — do not declare completion without an objective check. Before running, verify the command exists (e.g., `command -v <binary>`) — if missing, treat as `ENV_SETUP` error.
199236
- Command exits 0 → PASS
200237
- Command exits non-zero → FAIL
201238
- Command exceeds timeout → TIMEOUT
@@ -217,6 +254,9 @@ For each task, execute this exact sequence:
217254

218255
When a new session starts and finds a task with `status: "in_progress"`:
219256

257+
- Exclusive mode: treat this as an interrupted previous session and run the Recovery Protocol below.
258+
- Concurrent mode: only recover a task if either (a) `claimed_by` matches this worker, or (b) `lease_expires_at` is in the past (stale lease). Otherwise, treat it as owned by another worker and do not modify it.
259+
220260
1. **Check git state**:
221261
```bash
222262
git diff --stat # Uncommitted changes?
@@ -243,6 +283,7 @@ Each error category has a default recovery strategy:
243283
| Category | Default Recovery | Agent Action |
244284
|----------|-----------------|--------------|
245285
| `ENV_SETUP` | Re-run init, then STOP if still failing | Run `harness-init.sh` again immediately. If fails twice, log and stop — environment is broken |
286+
| `CONFIG` | STOP (requires human fix) | Log the config error precisely (file + field), then STOP. Do not guess or auto-mutate task metadata |
246287
| `TASK_EXEC` | Rollback via `git reset --hard <started_at_commit>`, retry | Verify `started_at_commit` exists (`git cat-file -t <hash>`). If missing, mark failed at max_attempts. Otherwise reset, run `on_failure.cleanup` if defined, retry if attempts < max_attempts |
247288
| `TEST_FAIL` | Rollback via `git reset --hard <started_at_commit>`, retry | Reset to `started_at_commit`, analyze test output to identify fix, retry with targeted changes |
248289
| `TIMEOUT` | Kill process, execute cleanup, retry | Wrap validation with `timeout <seconds> <command>`. On timeout, run `on_failure.cleanup`, retry (consider splitting task if repeated) |
@@ -251,7 +292,7 @@ Each error category has a default recovery strategy:
251292

252293
**JSON corruption**: If `harness-tasks.json` cannot be parsed, check for `harness-tasks.json.bak` (written before each modification). If backup exists and is valid, restore from it. If no valid backup, log `ERROR [ENV_SETUP] harness-tasks.json corrupted and unrecoverable` and STOP — task metadata (validation commands, dependencies, cleanup) cannot be reconstructed from logs alone.
253294

254-
**Backup protocol**: Before every write to `harness-tasks.json`, copy the current file to `harness-tasks.json.bak`.
295+
**Backup protocol**: Before every write to `harness-tasks.json`, copy the current file to `harness-tasks.json.bak`. Write updates atomically: write JSON to `harness-tasks.json.tmp` then `mv` it into place (readers should never see a partial file).
255296

256297
## Environment Initialization
257298

@@ -279,7 +320,7 @@ All log entries use grep-friendly format on a single line:
279320

280321
Types: `INIT`, `Starting`, `Completed`, `ERROR`, `CHECKPOINT`, `ROLLBACK`, `RECOVERY`, `STATS`, `LOCK`, `WARN`
281322

282-
Error categories: `ENV_SETUP`, `TASK_EXEC`, `TEST_FAIL`, `TIMEOUT`, `DEPENDENCY`, `SESSION_TIMEOUT`
323+
Error categories: `ENV_SETUP`, `CONFIG`, `TASK_EXEC`, `TEST_FAIL`, `TIMEOUT`, `DEPENDENCY`, `SESSION_TIMEOUT`
283324

284325
Filtering:
285326
```bash
@@ -293,7 +334,7 @@ grep "RECOVERY" harness-progress.txt # All recovery actions
293334

294335
## Session Statistics
295336

296-
At session end, update `harness-tasks.json`: increment `session_count`, set `last_session` to current timestamp. Then append:
337+
At session end, update `harness-tasks.json`: set `last_session` to current timestamp. (Do NOT increment `session_count` here — it is incremented at Session Start.) Then append:
297338

298339
```
299340
[timestamp] [SESSION-N] STATS tasks_total=10 completed=7 failed=1 pending=2 blocked=0 attempts_total=12 checkpoints=23
@@ -321,9 +362,11 @@ Does NOT acquire the lock (read-only operation).
321362

322363
## Add Command (`/harness add`)
323364

324-
Append a new task to `harness-tasks.json` with auto-incremented id (`task-NNN`), status `pending`, default `max_attempts: 3`, empty `depends_on`, and no validation command. Prompt user for optional fields: `priority`, `depends_on`, `validation.command`, `timeout_seconds`. Requires lock acquisition (modifies JSON).
365+
Append a new task to `harness-tasks.json` with auto-incremented id (`task-NNN`), status `pending`, default `max_attempts: 3`, empty `depends_on`, and no validation command (required before the task can be completed). Prompt user for optional fields: `priority`, `depends_on`, `validation.command`, `timeout_seconds`. Requires lock acquisition (modifies JSON).
325366

326367
## Tool Dependencies
327368

328369
Requires: Bash, file read/write, git. All harness operations must be executed from the project root directory.
329370
Does NOT require: specific MCP servers, programming languages, or test frameworks.
371+
372+
Concurrent mode requires isolated working directories (`git worktree` or separate clones). Do not run concurrent workers in the same working tree.

0 commit comments

Comments
 (0)