Add AGENTVIZ session replay integration#494
Conversation
- Add workflow_dispatch trigger to evaluation.yml - Add --keep-sessions to skill-validator evaluate step - Add publish-session-data job (mirrors publish-token-data) - Add replay link to PR comments (comment-on-pr) - Add AGENTVIZ SPA build/deploy to deploy-dashboard job - Add setup-node step to deploy-dashboard - Add per-plugin Sessions Visualisation links to dashboard.js - Create build-replay-sessions.ps1 (manifest generation from sessions.db) - Create purge-replay-sessions.ps1 (7-day retention management)
|
/evaluate |
Skill Validation Results
[1] (Plugin) Quality improved but weighted score is -1.9% due to: completion (✓ → ✗), tokens (13184 → 47663), tool calls (0 → 5) Model: claude-opus-4.6 | Judge: claude-opus-4.6 🔍 Full Results - additional metrics and failure investigation steps |
🎬 Session ReplayEvaluation sessions captured for this PR are available for interactive replay in AGENTVIZ: 3 sessions captured (baseline, isolated, plugin) for dotnet / Test a C# language feature with a script. Each session shows the full agent conversation timeline -- tool calls, reasoning, context reads -- as an interactive replay you can scrub through, inspect events, and compare roles. This link is auto-generated by the \publish-session-data\ + \comment-on-pr\ pipeline jobs after evaluation completes. |
🎬 Session ReplayEvaluation sessions captured for this PR are available for interactive replay in AGENTVIZ: 3 sessions captured (baseline, isolated, plugin) for dotnet / Test a C# language feature with a script. Each session shows the full agent conversation timeline -- tool calls, reasoning, context reads -- as an interactive replay you can scrub through, inspect events, and compare roles. This link is auto-generated by the publish-session-data + comment-on-pr pipeline jobs after evaluation completes. |
There was a problem hiding this comment.
Pull request overview
Integrates AGENTVIZ session replay into the evaluation workflow and the GitHub Pages dashboard by preserving eval session artifacts, publishing a session manifest + JSONL data to a dedicated branch, and adding dashboard/PR links to open the replay UI.
Changes:
- Persist eval session artifacts (
--keep-sessions) and publish flattened session JSONL +manifest.jsontodashboard-session-data. - Deploy/update the AGENTVIZ SPA under
gh-pages/replay/and add replay links to PR comments. - Add a per-plugin “Sessions Visualisation” link in the dashboard UI.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
.github/workflows/evaluation.yml |
Adds session publishing + AGENTVIZ SPA build/deploy and PR replay links. |
eng/dashboard/build-replay-sessions.ps1 |
Builds AGENTVIZ-compatible session directory structure and manifest from eval artifacts. |
eng/dashboard/purge-replay-sessions.ps1 |
Merges new session data, purges by retention, and regenerates manifest. |
eng/dashboard/dashboard.js |
Adds a per-plugin link to open AGENTVIZ replay with manifest + tag filters. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Use [IO.Path]::PathSeparator instead of hardcoded ';' for cross-platform PATH - Compute cutoff date in UTC for correct retention comparisons - Precompute ID HashSet before merge loop to avoid O(n^2) - Pin actions/setup-node to commit SHA (49933ea5...#v4) - Pin AGENTVIZ clone to commit SHA with verification - Skip npm ci + build when deployed commit SHA matches pinned SHA
- Resolve AGENTVIZ target SHA via git ls-remote (no clone) - Read deployed SHA via curl from raw.githubusercontent.com (no clone) - Cache build output keyed by commit SHA (skip npm ci+build on hit) - Only clone AGENTVIZ repo on cache miss when SPA needs rebuild
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.
Comments suppressed due to low confidence (1)
eng/dashboard/purge-replay-sessions.ps1:92
- Retention logic for PR sessions relies on
$file.LastWriteTimeUtc, but these files are sourced from a git checkout (dashboard-session-data branch) where mtimes are typically set to checkout time, not original creation time. This means oldsessions/pr/<number>/...files will likely never be marked expired and will continue to be copied into the merged output (even if they’re filtered out of the manifest later). Consider determining expiry for PR sessions from the existingmanifest.jsonmtimevalues (and copying only URLs present in the retained manifest), or include a date component in the PR directory structure so expiry can be computed from the path similarly to scheduled runs.
if (Test-Path $existingSessionsDir) {
# For scheduled runs, the path structure is sessions/scheduled/<date>/...
# For PR runs, the path structure is sessions/pr/<number>/...
# We check dated directories for retention, PR dirs are always kept within window (use file mtime)
$existingFiles = Get-ChildItem -Path $existingSessionsDir -Recurse -File -ErrorAction SilentlyContinue
foreach ($file in $existingFiles) {
$relativePath = $file.FullName.Substring($existingSessionsDir.Length).TrimStart([IO.Path]::DirectorySeparatorChar, [IO.Path]::AltDirectorySeparatorChar)
$destPath = Join-Path $sessionsWorkDir $relativePath
# Skip if already copied from new data
if (Test-Path $destPath) { continue }
# Check retention: try to extract date from path (scheduled/YYYY-MM-DD/...)
$isExpired = $false
if ($relativePath -match 'scheduled[/\\](\d{4}-\d{2}-\d{2})[/\\]') {
$dirDate = [DateTime]::ParseExact($Matches[1], 'yyyy-MM-dd', $null)
if ($dirDate -lt $cutoffDate) {
$isExpired = $true
}
} elseif ($file.LastWriteTimeUtc -lt $cutoffDate) {
# For PR sessions without date in path, use file modification time
$isExpired = $true
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Compare scheduled dir dates at day granularity (dirDate.Date vs cutoffDate.Date) - Fix useTempDir null-safe path comparison via GetFullPath + null guard - Use UTC for dateTag in build-replay-sessions.ps1 (consistent with purge) - Fix cache/deploy path mismatch: deploy from /tmp/agentviz-dist (cache path) - Deterministic clone: full clone + git checkout TARGET_SHA (fail on mismatch) - URL-encode manifest param in PR comment link (jq @uri) - Derive manifest generated timestamp from newest session mtime to avoid churn
|
/evaluate |
Skill Validation Results
[1] (Plugin) Quality unchanged but weighted score is -0.7% due to: tokens (24255 → 31041)
Model: claude-opus-4.6 | Judge: claude-opus-4.6 🔍 Full Results - additional metrics and failure investigation steps |
* Add AGENTVIZ session replay integration - Add workflow_dispatch trigger to evaluation.yml - Add --keep-sessions to skill-validator evaluate step - Add publish-session-data job (mirrors publish-token-data) - Add replay link to PR comments (comment-on-pr) - Add AGENTVIZ SPA build/deploy to deploy-dashboard job - Add setup-node step to deploy-dashboard - Add per-plugin Sessions Visualisation links to dashboard.js - Create build-replay-sessions.ps1 (manifest generation from sessions.db) - Create purge-replay-sessions.ps1 (7-day retention management) * Address code review feedback on PR dotnet#494 - Use [IO.Path]::PathSeparator instead of hardcoded ';' for cross-platform PATH - Compute cutoff date in UTC for correct retention comparisons - Precompute ID HashSet before merge loop to avoid O(n^2) - Pin actions/setup-node to commit SHA (49933ea5...#v4) - Pin AGENTVIZ clone to commit SHA with verification - Skip npm ci + build when deployed commit SHA matches pinned SHA * Remove hardcoded AGENTVIZ SHA; use cache + zero-clone checks - Resolve AGENTVIZ target SHA via git ls-remote (no clone) - Read deployed SHA via curl from raw.githubusercontent.com (no clone) - Cache build output keyed by commit SHA (skip npm ci+build on hit) - Only clone AGENTVIZ repo on cache miss when SPA needs rebuild * Address round 2 code review feedback on PR dotnet#494 - Compare scheduled dir dates at day granularity (dirDate.Date vs cutoffDate.Date) - Fix useTempDir null-safe path comparison via GetFullPath + null guard - Use UTC for dateTag in build-replay-sessions.ps1 (consistent with purge) - Fix cache/deploy path mismatch: deploy from /tmp/agentviz-dist (cache path) - Deterministic clone: full clone + git checkout TARGET_SHA (fail on mismatch) - URL-encode manifest param in PR comment link (jq @uri) - Derive manifest generated timestamp from newest session mtime to avoid churn
Integrates AGENTVIZ session replay visualization into the skills evaluation pipeline and dashboard.
Changes
workflow_dispatchtrigger,--keep-sessionsflag,publish-session-datajob, replay links in PR comments, AGENTVIZ SPA deployment indeploy-dashboardsessions.db-- flattens JSONL files and creates AGENTVIZ-compatible manifestHow it works
evaluatejob now runs with--keep-sessions, preserving native SDKevents.jsonlfilespublish-session-datajob flattens JSONL and pushes manifest + sessions todashboard-session-databranchgh-pages/replay/deploy-dashboardwith skip-if-unchanged guardPrerequisites already deployed
dashboard-session-databranch created with stub manifestgh-pages/replay/See
docs/agentviz-integration-plan.mdfor full design.