Add AGENTVIZ session replay integration by JanKrivanek · Pull Request #494 · dotnet/skills

JanKrivanek · 2026-04-01T15:23:28Z

Integrates AGENTVIZ session replay visualization into the skills evaluation pipeline and dashboard.

Changes

evaluation.yml: Add workflow_dispatch trigger, --keep-sessions flag, publish-session-data job, replay links in PR comments, AGENTVIZ SPA deployment in deploy-dashboard
build-replay-sessions.ps1: Manifest generation from sessions.db -- flattens JSONL files and creates AGENTVIZ-compatible manifest
purge-replay-sessions.ps1: 7-day retention management for session data
dashboard.js: Per-plugin Sessions Visualisation links

How it works

evaluate job now runs with --keep-sessions, preserving native SDK events.jsonl files
New publish-session-data job flattens JSONL and pushes manifest + sessions to dashboard-session-data branch
PR comments include a replay link pointing to the AGENTVIZ SPA on gh-pages/replay/
AGENTVIZ SPA is built and deployed during deploy-dashboard with skip-if-unchanged guard

Prerequisites already deployed

dashboard-session-data branch created with stub manifest
AGENTVIZ SPA deployed to gh-pages/replay/

See docs/agentviz-integration-plan.md for full design.

- Add workflow_dispatch trigger to evaluation.yml - Add --keep-sessions to skill-validator evaluate step - Add publish-session-data job (mirrors publish-token-data) - Add replay link to PR comments (comment-on-pr) - Add AGENTVIZ SPA build/deploy to deploy-dashboard job - Add setup-node step to deploy-dashboard - Add per-plugin Sessions Visualisation links to dashboard.js - Create build-replay-sessions.ps1 (manifest generation from sessions.db) - Create purge-replay-sessions.ps1 (7-day retention management)

JanKrivanek · 2026-04-01T15:23:36Z

/evaluate

github-actions · 2026-04-01T15:33:40Z

Skill Validation Results

Skill	Scenario	Quality	Skills Loaded	Overfit	Verdict
dotnet-maui-doctor	Plan macOS MAUI setup with Xcode	3.0/5 → 5.0/5 🟢	✅ dotnet-maui-doctor; tools: skill, report_intent, view / ✅ dotnet-maui-doctor; tools: report_intent, skill, view	✅ 0.17	✅
dotnet-maui-doctor	Plan Linux MAUI environment for Android	3.0/5 → 5.0/5 🟢	✅ dotnet-maui-doctor; tools: skill, view	✅ 0.17	✅
dotnet-maui-doctor	Guardrail against workload update and repair	1.0/5 → 3.0/5 🟢	✅ dotnet-maui-doctor; tools: report_intent, skill	✅ 0.17	✅
dotnet-maui-doctor	Diagnose non-Microsoft JDK causing build failure	4.0/5 → 5.0/5 🟢	✅ dotnet-maui-doctor; tools: report_intent, skill, view	✅ 0.17	❌ [1]
dotnet-maui-doctor	Plan complete MAUI setup on Windows	4.0/5 → 5.0/5 🟢	✅ dotnet-maui-doctor; tools: report_intent, skill, view	✅ 0.17	❌ [2]
dotnet-maui-doctor	Prevent incorrect JAVA_HOME configuration	2.0/5 → 5.0/5 🟢	✅ dotnet-maui-doctor; tools: report_intent, skill	✅ 0.17	✅
dotnet-maui-doctor	Determine required Android SDK packages for specific .NET version	3.0/5 → 4.0/5 🟢	✅ dotnet-maui-doctor; tools: skill, view	✅ 0.17	✅
dotnet-maui-doctor	Fix stale MAUI workloads after SDK update	2.0/5 → 4.0/5 🟢	✅ dotnet-maui-doctor; tools: report_intent, skill, view	✅ 0.17	✅
optimizing-ef-core-queries	Optimize bulk operations with EF Core 7+ ExecuteUpdate and ExecuteDelete	5.0/5 → 4.0/5 🔴	✅ optimizing-ef-core-queries; tools: skill / ✅ optimizing-ef-core-queries; tools: report_intent, skill	🟡 0.28	❌

[1] (Plugin) Quality improved but weighted score is -1.9% due to: completion (✓ → ✗), tokens (13184 → 47663), tool calls (0 → 5)
[2] (Isolated) Quality improved but weighted score is -5.6% due to: completion (✓ → ✗), tokens (13439 → 56419), tool calls (0 → 9), time (36.7s → 52.6s)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

🔍 Full Results - additional metrics and failure investigation steps

JanKrivanek · 2026-04-01T15:46:23Z

🎬 Session Replay

Evaluation sessions captured for this PR are available for interactive replay in AGENTVIZ:

▶️ Open Session Replay (PR #494)

3 sessions captured (baseline, isolated, plugin) for dotnet / Test a C# language feature with a script.

Each session shows the full agent conversation timeline -- tool calls, reasoning, context reads -- as an interactive replay you can scrub through, inspect events, and compare roles.

This link is auto-generated by the \publish-session-data\ + \comment-on-pr\ pipeline jobs after evaluation completes.

JanKrivanek · 2026-04-01T15:46:29Z

🎬 Session Replay

Evaluation sessions captured for this PR are available for interactive replay in AGENTVIZ:

▶️ Open Session Replay (PR #494)

3 sessions captured (baseline, isolated, plugin) for dotnet / Test a C# language feature with a script.

Each session shows the full agent conversation timeline -- tool calls, reasoning, context reads -- as an interactive replay you can scrub through, inspect events, and compare roles.

This link is auto-generated by the publish-session-data + comment-on-pr pipeline jobs after evaluation completes.

Copilot

Pull request overview

Integrates AGENTVIZ session replay into the evaluation workflow and the GitHub Pages dashboard by preserving eval session artifacts, publishing a session manifest + JSONL data to a dedicated branch, and adding dashboard/PR links to open the replay UI.

Changes:

Persist eval session artifacts (--keep-sessions) and publish flattened session JSONL + manifest.json to dashboard-session-data.
Deploy/update the AGENTVIZ SPA under gh-pages/replay/ and add replay links to PR comments.
Add a per-plugin “Sessions Visualisation” link in the dashboard UI.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File	Description
`.github/workflows/evaluation.yml`	Adds session publishing + AGENTVIZ SPA build/deploy and PR replay links.
`eng/dashboard/build-replay-sessions.ps1`	Builds AGENTVIZ-compatible session directory structure and manifest from eval artifacts.
`eng/dashboard/purge-replay-sessions.ps1`	Merges new session data, purges by retention, and regenerates manifest.
`eng/dashboard/dashboard.js`	Adds a per-plugin link to open AGENTVIZ replay with manifest + tag filters.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Use [IO.Path]::PathSeparator instead of hardcoded ';' for cross-platform PATH - Compute cutoff date in UTC for correct retention comparisons - Precompute ID HashSet before merge loop to avoid O(n^2) - Pin actions/setup-node to commit SHA (49933ea5...#v4) - Pin AGENTVIZ clone to commit SHA with verification - Skip npm ci + build when deployed commit SHA matches pinned SHA

- Resolve AGENTVIZ target SHA via git ls-remote (no clone) - Read deployed SHA via curl from raw.githubusercontent.com (no clone) - Cache build output keyed by commit SHA (skip npm ci+build on hit) - Only clone AGENTVIZ repo on cache miss when SPA needs rebuild

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (1)

eng/dashboard/purge-replay-sessions.ps1:92

Retention logic for PR sessions relies on $file.LastWriteTimeUtc, but these files are sourced from a git checkout (dashboard-session-data branch) where mtimes are typically set to checkout time, not original creation time. This means old sessions/pr/<number>/... files will likely never be marked expired and will continue to be copied into the merged output (even if they’re filtered out of the manifest later). Consider determining expiry for PR sessions from the existing manifest.json mtime values (and copying only URLs present in the retained manifest), or include a date component in the PR directory structure so expiry can be computed from the path similarly to scheduled runs.

if (Test-Path $existingSessionsDir) {
    # For scheduled runs, the path structure is sessions/scheduled/<date>/...
    # For PR runs, the path structure is sessions/pr/<number>/...
    # We check dated directories for retention, PR dirs are always kept within window (use file mtime)

    $existingFiles = Get-ChildItem -Path $existingSessionsDir -Recurse -File -ErrorAction SilentlyContinue
    foreach ($file in $existingFiles) {
        $relativePath = $file.FullName.Substring($existingSessionsDir.Length).TrimStart([IO.Path]::DirectorySeparatorChar, [IO.Path]::AltDirectorySeparatorChar)
        $destPath = Join-Path $sessionsWorkDir $relativePath

        # Skip if already copied from new data
        if (Test-Path $destPath) { continue }

        # Check retention: try to extract date from path (scheduled/YYYY-MM-DD/...)
        $isExpired = $false
        if ($relativePath -match 'scheduled[/\\](\d{4}-\d{2}-\d{2})[/\\]') {
            $dirDate = [DateTime]::ParseExact($Matches[1], 'yyyy-MM-dd', $null)
            if ($dirDate -lt $cutoffDate) {
                $isExpired = $true
            }
        } elseif ($file.LastWriteTimeUtc -lt $cutoffDate) {
            # For PR sessions without date in path, use file modification time
            $isExpired = $true
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@uri

- Compare scheduled dir dates at day granularity (dirDate.Date vs cutoffDate.Date) - Fix useTempDir null-safe path comparison via GetFullPath + null guard - Use UTC for dateTag in build-replay-sessions.ps1 (consistent with purge) - Fix cache/deploy path mismatch: deploy from /tmp/agentviz-dist (cache path) - Deterministic clone: full clone + git checkout TARGET_SHA (fail on mismatch) - URL-encode manifest param in PR comment link (jq @uri) - Derive manifest generated timestamp from newest session mtime to avoid churn

JanKrivanek · 2026-04-01T17:14:41Z

/evaluate

github-actions · 2026-04-01T18:29:55Z

Skill Validation Results

Skill	Scenario	Quality	Skills Loaded	Overfit	Verdict
mtp-hot-reload	Suggest hot reload for failing test in MTP project (SDK 9)	1.0/5 → 2.0/5 ⏰ 🟢	✅ mtp-hot-reload; tools: skill / ✅ mtp-hot-reload; tools: skill, stop_bash	—	✅
mtp-hot-reload	Suggest hot reload for failing test in MTP project (SDK 10)	1.0/5 → 4.0/5 🟢	✅ mtp-hot-reload; tools: skill, bash, create	—	✅
mtp-hot-reload	Enable hot reload when package already installed	2.0/5 → 5.0/5 🟢	✅ mtp-hot-reload; tools: skill	—	✅
mtp-hot-reload	Suggest launchSettings.json configuration for hot reload	1.0/5 → 4.0/5 🟢	✅ mtp-hot-reload; tools: skill, bash, create	—	✅
mtp-hot-reload	Use dotnet run not dotnet test for hot reload	1.0/5 → 4.0/5 🟢	✅ mtp-hot-reload; tools: skill	—	✅
mtp-hot-reload	Negative: VSTest project cannot use MTP hot reload	1.0/5 → 2.0/5 ⏰ 🟢	✅ mtp-hot-reload; tools: skill, create	—	✅
mtp-hot-reload	Run specific failing test with hot reload filter	1.0/5 → 3.0/5 🟢	✅ mtp-hot-reload; tools: skill	—	✅
migrate-vstest-to-mtp	Migrate MSTest project from VSTest to Microsoft.Testing.Platform	4.0/5 → 5.0/5 🟢	✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: report_intent, skill	—	✅
migrate-vstest-to-mtp	Migrate NUnit project from VSTest to Microsoft.Testing.Platform	1.0/5 → 5.0/5 🟢	✅ migrate-vstest-to-mtp; tools: skill / ✅ migrate-vstest-to-mtp; tools: report_intent, skill	—	✅
migrate-vstest-to-mtp	Migrate xUnit.net v2 project from VSTest to Microsoft.Testing.Platform	2.0/5 → 4.0/5 🟢	✅ migrate-vstest-to-mtp; tools: skill, report_intent, bash / ✅ migrate-vstest-to-mtp; tools: skill	—	✅
migrate-vstest-to-mtp	Update Azure DevOps pipeline from VSTest task to MTP	2.0/5 → 5.0/5 🟢	✅ migrate-vstest-to-mtp; tools: skill	—	✅
migrate-vstest-to-mtp	Migrate MSTest.Sdk project that explicitly uses VSTest	3.0/5 → 5.0/5 🟢	✅ migrate-vstest-to-mtp; tools: skill	—	✅
migrate-vstest-to-mtp	Translate dotnet test VSTest arguments to MTP equivalents	3.0/5 → 5.0/5 🟢	✅ migrate-vstest-to-mtp; tools: skill	—	✅
migrate-vstest-to-mtp	Handle exit code 8 when migrating from VSTest to MTP	3.0/5 → 5.0/5 🟢	✅ migrate-vstest-to-mtp; tools: skill / ⚠️ NOT ACTIVATED	—	❌
migrate-vstest-to-mtp	Configure dotnet test MTP mode on .NET 10 SDK	2.0/5 → 5.0/5 🟢	✅ migrate-vstest-to-mtp; tools: skill	—	✅
migrate-vstest-to-mtp	Migrate xUnit.net VSTest filter syntax to MTP	2.0/5 → 4.0/5 🟢	✅ migrate-vstest-to-mtp; tools: skill	—	✅
migrate-vstest-to-mtp	Full VSTest to MTP migration plan for MSTest solution	1.0/5 ⏰ → 5.0/5 🟢	✅ migrate-vstest-to-mtp; tools: skill	—	✅
migrate-mstest-v1v2-to-v3	Migrate MSTest v1 project with assembly reference	3.0/5 → 5.0/5 🟢	✅ migrate-mstest-v1v2-to-v3; tools: skill, edit, bash	✅ 0.04	✅
migrate-mstest-v1v2-to-v3	Migrate MSTest v2 NuGet project to v3	4.0/5 → 3.0/5 🔴	✅ migrate-mstest-v1v2-to-v3; tools: skill	✅ 0.04	❌
migrate-mstest-v1v2-to-v3	Fix Assert.AreEqual object overload errors after v3 upgrade	3.0/5 → 5.0/5 🟢	✅ migrate-mstest-v1v2-to-v3; tools: skill, edit	✅ 0.04	✅
migrate-mstest-v1v2-to-v3	Migrate from .testsettings to .runsettings	3.0/5 → 4.0/5 🟢	✅ migrate-mstest-v1v2-to-v3; tools: skill, bash / ✅ migrate-mstest-v1v2-to-v3; tools: skill	✅ 0.04	✅
migrate-mstest-v1v2-to-v3	Fix DataRow type mismatch errors after v3 upgrade	4.0/5 → 3.0/5 🔴	✅ migrate-mstest-v1v2-to-v3; tools: skill	✅ 0.04	❌
migrate-mstest-v1v2-to-v3	Migrate to MSTest.Sdk project style	3.0/5 → 5.0/5 🟢	✅ migrate-mstest-v1v2-to-v3; tools: skill, bash	✅ 0.04	✅
migrate-mstest-v1v2-to-v3	Handle dropped target framework during v3 migration	5.0/5 → 5.0/5	⚠️ NOT ACTIVATED	✅ 0.04	❌ [1]
migrate-mstest-v1v2-to-v3	Migrate complex MSTest v2 project with testsettings, DataRow issues, and dropped TFM	4.0/5 → 5.0/5 🟢	✅ migrate-mstest-v1v2-to-v3; tools: skill	✅ 0.04	✅
migrate-mstest-v1v2-to-v3	Correctly identify MSTest v1 vs v2 and recommend different migration paths	3.0/5 → 5.0/5 🟢	✅ migrate-mstest-v1v2-to-v3; tools: skill, task, glob, read_agent / ✅ migrate-mstest-v1v2-to-v3; tools: skill, task, glob, read_agent, bash	✅ 0.04	✅
migrate-mstest-v3-to-v4	Migrate custom TestMethodAttribute from Execute to ExecuteAsync	2.0/5 → 3.0/5 🟢	✅ migrate-mstest-v3-to-v4; tools: skill	—	✅
migrate-mstest-v3-to-v4	Replace ExpectedExceptionAttribute with Assert.ThrowsExactly	1.0/5 ⏰ → 4.0/5 ⏰ 🟢	✅ migrate-mstest-v3-to-v4; tools: skill / ⚠️ NOT ACTIVATED	—	✅
migrate-mstest-v3-to-v4	Fix multiple v4 breaking changes: Assert, ClassCleanup, TestContext, Timeout	3.0/5 ⏰ → 4.0/5 ⏰ 🟢	✅ migrate-mstest-v3-to-v4; tools: skill	—	✅
migrate-mstest-v3-to-v4	Handle net6.0 target framework dropped in MSTest v4	3.0/5 → 5.0/5 🟢	⚠️ NOT ACTIVATED	—	✅
migrate-mstest-v3-to-v4	Fix TestMethodAttribute CallerInfo constructor breaking change	3.0/5 → 4.0/5 🟢	✅ migrate-mstest-v3-to-v4; tools: skill	—	✅
migrate-mstest-v3-to-v4	Understand behavioral changes after MSTest v4 upgrade	3.0/5 → 5.0/5 🟢	✅ migrate-mstest-v3-to-v4; tools: skill	—	✅
migrate-mstest-v3-to-v4	Handle MSTest.Sdk and MTP changes in v4	2.0/5 → 3.0/5 🟢	✅ migrate-mstest-v3-to-v4; tools: skill	—	✅
migrate-mstest-v3-to-v4	Full MSTest v3 to v4 migration with multiple breaking changes	3.0/5 → 5.0/5 🟢	✅ migrate-mstest-v3-to-v4; tools: skill	—	✅
migrate-mstest-v3-to-v4	Migrate MSTest.Sdk v3 project using ManagedType and TestTimeout	3.0/5 → 4.0/5 🟢	✅ migrate-mstest-v3-to-v4; tools: skill	—	✅
migrate-mstest-v3-to-v4	Correctly identify MSTest v3 project and recommend v4 migration	4.0/5 → 5.0/5 🟢	✅ migrate-mstest-v3-to-v4; tools: skill	—	✅
run-tests	Run tests in a VSTest MSTest project	4.0/5 → 5.0/5 🟢	✅ run-tests; tools: skill	—	✅
run-tests	Run tests with trx reporting on MTP project (SDK 9)	4.0/5 → 4.0/5	✅ run-tests; tools: skill	—	✅
run-tests	Run tests with blame-hang on MTP project (SDK 10)	2.0/5 → 2.0/5 ⏰	✅ run-tests; tools: skill / ⚠️ NOT ACTIVATED	—	✅
run-tests	Run tests in a multi-TFM project targeting a specific framework	2.0/5 → 5.0/5 🟢	✅ run-tests; tools: skill, bash / ✅ run-tests; tools: skill, bash, glob	—	✅
run-tests	Filter MSTest tests by category on VSTest	5.0/5 → 5.0/5	✅ run-tests; tools: skill, bash / ✅ run-tests; tools: skill	—	❌ [2]
run-tests	Filter NUnit tests by class name on VSTest	4.0/5 → 5.0/5 🟢	✅ run-tests; tools: skill, bash / ⚠️ NOT ACTIVATED	—	✅
run-tests	Filter xUnit v3 tests by class on MTP	1.0/5 → 5.0/5 🟢	✅ run-tests; tools: skill, bash / ✅ run-tests; tools: skill	—	✅
run-tests	Filter xUnit v3 tests by trait on MTP	1.0/5 → 5.0/5 🟢	✅ run-tests; tools: skill, view	—	✅
run-tests	Filter TUnit tests by class using treenode-filter	2.0/5 → 5.0/5 🟢	✅ run-tests; tools: skill, bash / ⚠️ NOT ACTIVATED	—	✅
run-tests	Combine multiple filter criteria on VSTest MSTest	5.0/5 → 5.0/5	⚠️ NOT ACTIVATED / ✅ run-tests; tools: skill	—	❌ [3]
run-tests	MTP project on SDK 9 must use -- separator for args	1.0/5 → 5.0/5 🟢	✅ run-tests; tools: skill / ⚠️ NOT ACTIVATED	—	✅
run-tests	MTP project on SDK 10 passes args directly	2.0/5 → 4.0/5 🟢	✅ run-tests; tools: skill / ✅ run-tests; tools: skill, create	—	✅
run-tests	Detect test platform from Directory.Build.props	1.0/5 → 5.0/5 🟢	✅ run-tests; tools: skill	—	✅
run-tests	Negative test: do not use MTP syntax for a VSTest project	4.0/5 → 5.0/5 🟢	✅ run-tests; tools: skill, view / ⚠️ NOT ACTIVATED	—	❌ [4]
writing-mstest-tests	Write unit tests for a service class	4.0/5 → 4.0/5	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.29	❌
writing-mstest-tests	Write data-driven tests for a calculator	5.0/5 → 4.0/5 🔴	✅ writing-mstest-tests; tools: skill	🟡 0.29	❌ [5]
writing-mstest-tests	Write async tests with cancellation	2.0/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: skill	🟡 0.29	✅
writing-mstest-tests	Fix swapped Assert.AreEqual arguments	5.0/5 → 5.0/5	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.29	❌ [6]
writing-mstest-tests	Modernize legacy test patterns	5.0/5 → 4.0/5 🔴	✅ writing-mstest-tests; tools: skill	🟡 0.29	✅
writing-mstest-tests	Replace ExpectedException with Assert.Throws	3.0/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: skill	🟡 0.29	✅
writing-mstest-tests	Use proper collection assertions	3.0/5 → 3.0/5	✅ writing-mstest-tests; tools: skill	🟡 0.29	❌ [7]
writing-mstest-tests	Use proper type assertions instead of casts	3.0/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: skill	🟡 0.29	✅
writing-mstest-tests	Set up test lifecycle correctly	3.0/5 → 4.0/5 🟢	✅ writing-mstest-tests; tools: skill	🟡 0.29	✅
writing-mstest-tests	Use DynamicData with ValueTuples over object arrays	1.0/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.29	✅
crap-score	Calculate CRAP score for a single method with partial coverage	4.0/5 → 5.0/5 🟢	✅ crap-score; tools: skill	✅ 0.13	✅
crap-score	Identify riskiest methods across a file	4.0/5 → 5.0/5 🟢	✅ crap-score; tools: skill, glob / ✅ crap-score; tools: skill	✅ 0.13	✅
crap-score	Generate coverage then compute CRAP score	4.0/5 → 4.0/5	✅ crap-score; tools: skill	✅ 0.13	❌ [8]
code-testing-agent	Generate tests for ContosoUniversity ASP.NET Core MVC app	3.0/5 → 3.0/5	✅ code-testing-agent; tools: skill, grep / ✅ code-testing-agent; tools: skill	✅ 0.02	❌
test-anti-patterns	Detect mixed severity anti-patterns in repository service tests	5.0/5 → 5.0/5	⚠️ NOT ACTIVATED	✅ 0.07	❌ [9]
test-anti-patterns	Detect flakiness indicators and test coupling	3.0/5 → 4.0/5 🟢	✅ test-anti-patterns; tools: report_intent, skill	✅ 0.07	✅
test-anti-patterns	Detect duplicated tests and magic values	3.0/5 → 4.0/5 ⏰ 🟢	✅ test-anti-patterns; tools: report_intent, skill	✅ 0.07	✅
test-anti-patterns	Recognize well-written tests without inventing false positives	2.0/5 → 5.0/5 🟢	✅ test-anti-patterns; tools: report_intent, skill	✅ 0.07	✅
directory-build-organization	Organize build infrastructure for a multi-project repo	3.0/5 → 5.0/5 🟢	✅ directory-build-organization; tools: skill, read_agent / ⚠️ NOT ACTIVATED	—	❌
msbuild-antipatterns	Review MSBuild files for anti-patterns and style issues	3.0/5 ⏰ → 1.0/5 ⏰ 🔴	✅ msbuild-antipatterns; tools: skill, glob / ⚠️ NOT ACTIVATED	✅ 0.06	❌
binlog-generation	Build project with /bl flag	1.0/5 → 5.0/5 🟢	✅ binlog-generation; tools: skill	—	❌ [10]
binlog-generation	Build with /bl in PowerShell	4.0/5 → 5.0/5 🟢	✅ binlog-generation; tools: skill	—	✅
binlog-generation	Build multiple configurations with unique binlogs	5.0/5 → 5.0/5	✅ binlog-generation; tools: skill / ⚠️ NOT ACTIVATED	—	❌ [11]
msbuild-server	Recommend MSBuild Server for slow CLI incremental builds	3.0/5 → 5.0/5 🟢	✅ msbuild-server; tools: skill / ✅ msbuild-server; tools: skill, bash	🟡 0.37	✅
binlog-failure-analysis	Diagnose build failures from binlog only (no source files)	1.0/5 ⏰ → 1.0/5 ⏰	✅ binlog-failure-analysis; tools: skill	✅ 0.05	✅
incremental-build	Analyze incremental build issues	2.0/5 ⏰ → 1.0/5 ⏰ 🔴	✅ incremental-build; tools: skill, bash	✅ 0.13	❌
check-bin-obj-clash	Diagnose bin/obj output path clashes	5.0/5 → 5.0/5	✅ check-bin-obj-clash; tools: skill	✅ 0.14	❌ [12]
build-perf-diagnostics	Diagnose slow build for a small project	3.0/5 → 1.0/5 ⏰ 🔴	⚠️ NOT ACTIVATED	✅ 0.16	❌
eval-performance	Analyze MSBuild evaluation performance issues	4.0/5 → 4.0/5	✅ eval-performance; tools: skill, bash / ✅ eval-performance; tools: skill	✅ 0.15	✅
resolve-project-references	Explain misleading ResolveProjectReferences time	3.0/5 → 5.0/5 🟢	✅ resolve-project-references; tools: skill	✅ 0.14	✅
build-perf-baseline	Establish build performance baseline and recommend optimizations	3.0/5 → 4.0/5 🟢	✅ build-perf-baseline; tools: skill, glob / ⚠️ NOT ACTIVATED	🟡 0.26	✅
build-parallelism	Analyze build parallelism bottlenecks	4.0/5 → 1.0/5 ⏰ 🔴	✅ build-parallelism; tools: skill, glob / ⚠️ NOT ACTIVATED	✅ 0.14	❌
msbuild-modernization	Modernize legacy project to SDK-style	5.0/5 → 5.0/5	✅ msbuild-modernization; tools: skill	✅ 0.06	❌ [13]
including-generated-files	Diagnose generated file inclusion failure	3.0/5 → 5.0/5 🟢	⚠️ NOT ACTIVATED / ✅ including-generated-files; tools: skill	🟡 0.26	✅

[1] (Plugin) Quality unchanged but weighted score is -0.7% due to: tokens (24255 → 31041)
[2] (Plugin) Quality unchanged but weighted score is -3.4% due to: time (13.0s → 21.8s), tokens (36760 → 49120)
[3] (Isolated) Quality unchanged but weighted score is -3.5% due to: tokens (24791 → 39453), tool calls (3 → 4)
[4] (Plugin) Quality unchanged but weighted score is -0.5% due to: tokens (24237 → 30041)
[5] (Plugin) Quality unchanged but weighted score is -1.9% due to: tokens (246590 → 511374), tool calls (19 → 29), time (97.3s → 133.6s)
[6] (Plugin) Quality unchanged but weighted score is -0.8% due to: tokens (12068 → 15051)
[7] (Plugin) Quality unchanged but weighted score is -16.3% due to: quality, tokens (12370 → 33755), tool calls (0 → 1), time (11.6s → 15.5s)
[8] (Isolated) Quality unchanged but weighted score is -17.2% due to: judgment, quality, tokens (298616 → 403411)
[9] (Plugin) Quality unchanged but weighted score is -1.1% due to: tokens (13343 → 16287)
[10] (Isolated) Quality improved but weighted score is -2.6% due to: tokens (49026 → 68308), time (87.6s → 109.0s)
[11] (Plugin) Quality unchanged but weighted score is -1.4% due to: tokens (36532 → 47830)
[12] (Plugin) Quality unchanged but weighted score is -10.0% due to: tokens (69215 → 350689), tool calls (8 → 21), time (36.3s → 114.4s)
[13] (Plugin) Quality unchanged but weighted score is -2.7% due to: tokens (73707 → 126120)

⏰ timeout — run(s) hit the (120s, 160s, 180s, 240s, 300s, 360s) scenario timeout limit; scoring may be impacted by aborting model execution before it could produce its full output (increase via timeout in eval.yaml)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

🔍 Full Results - additional metrics and failure investigation steps

@uri

* Add AGENTVIZ session replay integration - Add workflow_dispatch trigger to evaluation.yml - Add --keep-sessions to skill-validator evaluate step - Add publish-session-data job (mirrors publish-token-data) - Add replay link to PR comments (comment-on-pr) - Add AGENTVIZ SPA build/deploy to deploy-dashboard job - Add setup-node step to deploy-dashboard - Add per-plugin Sessions Visualisation links to dashboard.js - Create build-replay-sessions.ps1 (manifest generation from sessions.db) - Create purge-replay-sessions.ps1 (7-day retention management) * Address code review feedback on PR dotnet#494 - Use [IO.Path]::PathSeparator instead of hardcoded ';' for cross-platform PATH - Compute cutoff date in UTC for correct retention comparisons - Precompute ID HashSet before merge loop to avoid O(n^2) - Pin actions/setup-node to commit SHA (49933ea5...#v4) - Pin AGENTVIZ clone to commit SHA with verification - Skip npm ci + build when deployed commit SHA matches pinned SHA * Remove hardcoded AGENTVIZ SHA; use cache + zero-clone checks - Resolve AGENTVIZ target SHA via git ls-remote (no clone) - Read deployed SHA via curl from raw.githubusercontent.com (no clone) - Cache build output keyed by commit SHA (skip npm ci+build on hit) - Only clone AGENTVIZ repo on cache miss when SPA needs rebuild * Address round 2 code review feedback on PR dotnet#494 - Compare scheduled dir dates at day granularity (dirDate.Date vs cutoffDate.Date) - Fix useTempDir null-safe path comparison via GetFullPath + null guard - Use UTC for dateTag in build-replay-sessions.ps1 (consistent with purge) - Fix cache/deploy path mismatch: deploy from /tmp/agentviz-dist (cache path) - Deterministic clone: full clone + git checkout TARGET_SHA (fail on mismatch) - URL-encode manifest param in PR comment link (jq @uri) - Derive manifest generated timestamp from newest session mtime to avoid churn

github-actions Bot added a commit that referenced this pull request Apr 1, 2026

Update PR token usage data (PR #494)

0f6736f

JanKrivanek added a commit that referenced this pull request Apr 1, 2026

Publish session data for PR #494 (dotnet skill evaluation)

4c8eb71

JanKrivanek marked this pull request as ready for review April 1, 2026 15:55

JanKrivanek requested a review from ViktorHofer as a code owner April 1, 2026 15:55

Copilot AI review requested due to automatic review settings April 1, 2026 15:55

Copilot started reviewing on behalf of JanKrivanek April 1, 2026 15:56 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

JanKrivanek added 2 commits April 1, 2026 18:05

Copilot AI review requested due to automatic review settings April 1, 2026 16:18

Copilot started reviewing on behalf of JanKrivanek April 1, 2026 16:19 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

JanKrivanek enabled auto-merge (squash) April 1, 2026 17:14

github-actions Bot added a commit that referenced this pull request Apr 1, 2026

Update PR token usage data (PR #494)

e6aa031

JanKrivanek mentioned this pull request Apr 1, 2026

feat: static manifest mode jayparikh/agentviz#39

Merged

ViktorHofer approved these changes Apr 2, 2026

View reviewed changes

JanKrivanek merged commit d861bbf into main Apr 2, 2026
31 checks passed

JanKrivanek deleted the dev/jankrivanek/agentviz-integration-poc branch April 2, 2026 15:40

github-actions Bot mentioned this pull request Apr 4, 2026

🏥 Repository Health Dashboard #288

Open

Conversation

JanKrivanek commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

How it works

Prerequisites already deployed

Uh oh!

JanKrivanek commented Apr 1, 2026

Uh oh!

github-actions Bot commented Apr 1, 2026

Skill Validation Results

Uh oh!

JanKrivanek commented Apr 1, 2026

🎬 Session Replay

Uh oh!

JanKrivanek commented Apr 1, 2026

🎬 Session Replay

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JanKrivanek commented Apr 1, 2026

Uh oh!

github-actions Bot commented Apr 1, 2026

Skill Validation Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JanKrivanek commented Apr 1, 2026 •

edited

Loading