Skip to content

Enable NativeAOT publishing for SkillValidator#221

Merged
ViktorHofer merged 18 commits into
mainfrom
vihofer/skill-validator-native-aot
Mar 5, 2026
Merged

Enable NativeAOT publishing for SkillValidator#221
ViktorHofer merged 18 commits into
mainfrom
vihofer/skill-validator-native-aot

Conversation

@ViktorHofer
Copy link
Copy Markdown
Member

Summary

Migrate SkillValidator (\�ng/skill-validator) to publish as a NativeAOT binary, producing a single ~15 MB native executable with fast startup and no JIT dependency.

Changes

Source-generated JSON serialization

  • New \SkillValidatorJsonContext\ with [JsonSerializable]\ for all 25+ serialized types
  • All \JsonSerializer.Serialize/Deserialize\ call sites updated to use context-based overloads
  • Replaced anonymous type in \Reporter.cs\ with named \ResultsOutput\ record
  • Fixed \JsonStringEnumConverter\ to generic AOT-safe \JsonStringEnumConverter\

Source-generated YAML deserialization

  • Added \Vecc.YamlDotNet.Analyzers.StaticGenerator\ package
  • New \SkillValidatorYamlContext\ with [YamlSerializable]\ for eval config and frontmatter types
  • Switched \DeserializerBuilder\ to \StaticDeserializerBuilder\ in \EvalSchema.cs\ and \SkillDiscovery.cs\
  • Added \RawFrontmatter\ strongly-typed model for YAML frontmatter (replaces \Dictionary<string, string>)

AOT-safe AgentEvent data

  • Changed \AgentEvent.Data\ from \Dictionary<string, object?>\ to \Dictionary<string, JsonNode?>\
  • Updated all construction sites (\AgentRunner.cs, \Judge.cs, \PairwiseJudge.cs) and reading sites (\MetricsCollector.cs)

Validation

  • \dotnet publish -c Release\ produces a clean NativeAOT binary with zero trim/AOT warnings
  • All 198 existing tests pass

ViktorHofer and others added 16 commits March 5, 2026 18:40
Migrate all serialization to source-generated, AOT-compatible patterns:

- Add PublishAot to csproj and Vecc.YamlDotNet.Analyzers.StaticGenerator package
- Create SkillValidatorJsonContext with source-generated serializers for all types
- Create SkillValidatorYamlContext with static YAML deserialization for eval config
  and frontmatter parsing
- Change AgentEvent.Data from Dictionary<string, object?> to
  Dictionary<string, JsonNode?> for AOT-safe serialization
- Replace anonymous type in Reporter.cs with named ResultsOutput record
- Fix JsonStringEnumConverter to generic AOT-safe JsonStringEnumConverter<T>
- Move ConsolidateData to Models.cs for JSON context accessibility
- Switch EvalSchema and SkillDiscovery to StaticDeserializerBuilder
- Add RawFrontmatter model for strongly-typed YAML frontmatter parsing
- Update all JsonSerializer call sites to use context-based overloads

The published binary is a single 14.6 MB native executable.
All 198 existing tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Single build job with matrix for linux-x64, linux-arm64, win-x64, win-arm64, osx-arm64
- Each matrix entry: build, test, NativeAOT publish, pack NuGet, upload tarball
- collect-packages job: builds RID-agnostic package, merges with per-RID packages
  into a single skill-validator-nupkgs artifact
- Add daily schedule trigger (08:00 UTC) with 90-day retention
- Keep existing path-based triggers for push/PR

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Single build job with matrix for linux-x64, linux-arm64, win-x64, win-arm64, osx-arm64
- Each matrix entry: build, test, NativeAOT publish, pack NuGet, upload binary
- Upload publish directory directly (Actions wraps in zip automatically)
- collect-packages job: builds RID-agnostic package, merges with per-RID packages
  into a single skill-validator-nupkgs artifact
- Add daily schedule trigger (08:00 UTC) with 90-day retention
- Keep existing path-based triggers for push/PR

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Increase totalTimeoutMs from 200ms to 2000ms. The tight budget was
causing the CTS to fire before the retry attempt on slower CI runners
(macOS arm64). The test still validates that the 60s base delay gets
clamped — the 5s wall-time assertion is the real check.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add concurrency group so that pushing a new commit to a PR
automatically cancels any still-running workflow for that ref.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The pack step was producing the same agnostic package for every RID.
Adding --use-current-runtime ensures each matrix entry produces a
RID-specific nupkg that gets collected into the combined artifact.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
dotnet pack --use-current-runtime already runs publish internally,
producing the native AOT binary in artifacts/publish/. The separate
dotnet publish step was redundant.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Increase baseDelayMs from 50ms to 200ms so that OS scheduling jitter
on slow CI runners (macOS arm64) doesn't dominate the measured gaps.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
NuGet packages are zip files containing the native binaries, so the
separate skill-validator-<rid> publish folder uploads are redundant.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pack the RID-agnostic package from the linux-x64 matrix entry instead
of a separate downstream job, eliminating the bundle step entirely.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use precise glob patterns with the stable package name to distinguish
RID-specific from agnostic nupkgs. The agnostic upload uses a negation
pattern to exclude the RID-specific package on linux-x64.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ViktorHofer ViktorHofer marked this pull request as ready for review March 5, 2026 19:03
@ViktorHofer ViktorHofer requested a review from JanKrivanek as a code owner March 5, 2026 19:03
Copilot AI review requested due to automatic review settings March 5, 2026 19:03
On scheduled runs, downloads all nupkg artifacts and publishes them
as a rolling 'skill-validator-nightly' pre-release on the repo.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ViktorHofer ViktorHofer requested review from Copilot and removed request for Copilot March 5, 2026 19:18
The original MCPServerDef deserialization used PropertyNameCaseInsensitive
which was dropped when switching to source-generated context. Add it back
to the JsonSourceGenerationOptions to preserve flexible property casing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@ViktorHofer ViktorHofer requested a review from Copilot March 5, 2026 19:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 5, 2026

Skill Validation Results

Skill Scenario Baseline With Skill Δ Skills Loaded Overfit Verdict
dotnet-trace-collect High CPU in Kubernetes on Linux (.NET 8) 4.0/5 4.5/5 +0.5 ✅ dotnet-trace-collect; tools: skill, report_intent, view, glob ✅ 0.14
dotnet-trace-collect .NET Framework on Windows without admin privileges 2.0/5 5.0/5 +3.0 ✅ dotnet-trace-collect; tools: skill ✅ 0.14
dotnet-trace-collect .NET 10 on Linux with root access and native call stacks 1.0/5 4.0/5 +3.0 ✅ dotnet-trace-collect; tools: skill, bash ✅ 0.14
dotnet-trace-collect Memory leak on Linux (.NET 8) 3.0/5 3.0/5 0.0 ✅ dotnet-trace-collect; tools: skill, report_intent, view, bash ✅ 0.14
dotnet-trace-collect Slow requests on Windows with PerfView 4.0/5 5.0/5 +1.0 ✅ dotnet-trace-collect; tools: skill, report_intent, view, glob ✅ 0.14
dotnet-trace-collect Excessive GC on Linux (.NET 8) 3.0/5 5.0/5 +2.0 ✅ dotnet-trace-collect; tools: skill, bash ✅ 0.14
dotnet-trace-collect Hang or deadlock diagnosis on Linux 3.0/5 4.0/5 +1.0 ✅ dotnet-trace-collect; tools: skill, report_intent, view, glob ✅ 0.14
dotnet-trace-collect Windows container high CPU with PerfView 1.0/5 5.0/5 +4.0 ✅ dotnet-trace-collect; tools: skill, report_intent, view ✅ 0.14
dotnet-trace-collect Long-running intermittent issue with PerfView triggers 3.0/5 4.5/5 ⏰ timeout +1.5 ✅ dotnet-trace-collect; tools: skill, report_intent, view, glob ✅ 0.14
dotnet-trace-collect Linux pre-.NET 10 needing native call stacks 2.0/5 4.5/5 +2.5 ✅ dotnet-trace-collect; tools: skill ✅ 0.14
dotnet-trace-collect Windows modern .NET with admin high CPU 2.5/5 4.5/5 +2.0 ✅ dotnet-trace-collect; tools: skill, report_intent, view, glob, bash ✅ 0.14
dotnet-trace-collect Memory leak on .NET Framework Windows 3.0/5 4.0/5 +1.0 ✅ dotnet-trace-collect; dump-collect; tools: skill, report_intent, view, glob ✅ 0.14
dotnet-trace-collect Kubernetes with console access prefers console tools 5.0/5 5.0/5 0.0 ✅ dotnet-trace-collect; tools: skill, report_intent, view ✅ 0.14
dotnet-trace-collect Container installation without .NET SDK 4.0/5 3.5/5 -0.5 ✅ dotnet-trace-collect; tools: skill, glob, bash ✅ 0.14
dotnet-trace-collect HTTP 500s from downstream service on Linux (.NET 8) 4.0/5 5.0/5 +1.0 ✅ dotnet-trace-collect; tools: skill, report_intent, view, glob, bash ✅ 0.14
dotnet-trace-collect Networking timeouts on Windows with admin (.NET 8) 2.0/5 5.0/5 +3.0 ✅ dotnet-trace-collect; tools: skill, report_intent, view ✅ 0.14
microbenchmarking Investigate runtime upgrade performance impact 2.0/5 ⏰ timeout 3.0/5 ⏰ timeout +1.0 ✅ microbenchmarking; tools: skill, glob ✅ 0.10
csharp-scripts Test a C# language feature with a script 3.0/5 4.0/5 +1.0 ✅ csharp-scripts; tools: skill, create, edit, view 🟡 0.32
clr-activation-debugging Diagnose unexpected FOD dialog from native build tool 1.0/5 5.0/5 +4.0 ✅ clr-activation-debugging; tools: skill ✅ 0.08
clr-activation-debugging Diagnose FOD suppressed but activation still failing 1.0/5 5.0/5 +4.0 ✅ clr-activation-debugging; tools: skill ✅ 0.08
clr-activation-debugging Explain why same binary behaves differently under different launch methods 1.0/5 5.0/5 +4.0 ✅ clr-activation-debugging; tools: skill ✅ 0.08
clr-activation-debugging Analyze healthy managed EXE activation 1.5/5 5.0/5 +3.5 ✅ clr-activation-debugging; tools: skill ✅ 0.08
clr-activation-debugging Identify multiple activation sequences in a single log 1.0/5 5.0/5 +4.0 ✅ clr-activation-debugging; tools: skill ✅ 0.08
clr-activation-debugging Explain useLegacyV2RuntimeActivationPolicy in activation log 2.0/5 4.0/5 +2.0 ✅ clr-activation-debugging; tools: skill ✅ 0.08
clr-activation-debugging Decline non-CLR-activation issue 1.0/5 5.0/5 +4.0 ✅ clr-activation-debugging; tools: skill ✅ 0.08
thread-abort-migration Worker thread with abort-based cancellation 5.0/5 5.0/5 0.0 ✅ thread-abort-migration; tools: skill, report_intent, bash ✅ 0.10
thread-abort-migration Timeout enforcement via Thread.Abort 4.0/5 5.0/5 +1.0 ✅ thread-abort-migration; tools: skill ✅ 0.10
thread-abort-migration Blocking WaitHandle with Thread.Interrupt 3.5/5 4.5/5 +1.0 ✅ thread-abort-migration; tools: skill ✅ 0.10
thread-abort-migration ASP.NET Response.End and Response.Redirect with Thread.Abort 4.0/5 4.5/5 +0.5 ✅ thread-abort-migration; tools: skill, report_intent, bash ✅ 0.10
thread-abort-migration Thread.Join and Thread.Sleep only — should not migrate 3.5/5 5.0/5 +1.5 ✅ thread-abort-migration; tools: skill ✅ 0.10
migrate-nullable-references Enable NRT in a small library with mixed nullability 5.0/5 5.0/5 0.0 ✅ migrate-nullable-references; tools: skill, glob ✅ 0.15
migrate-nullable-references File-by-file migration: only modify the targeted file 5.0/5 5.0/5 0.0 ⚠️ NOT ACTIVATED ✅ 0.15
migrate-nullable-references Enable NRT in ASP.NET Core Web API with EF Core 3.0/5 3.0/5 0.0 ✅ migrate-nullable-references; tools: skill ✅ 0.15
nuget-trusted-publishing Set up trusted publishing for a new NuGet library 3.0/5 4.0/5 +1.0 ✅ nuget-trusted-publishing; tools: skill, edit ✅ 0.20
nuget-trusted-publishing Set up NuGet publishing without mentioning trusted publishing 2.0/5 4.5/5 +2.5 ✅ nuget-trusted-publishing; tools: report_intent, skill, glob, view ✅ 0.20
nuget-trusted-publishing Migrate existing workflow from API key to trusted publishing 2.5/5 5.0/5 +2.5 ✅ nuget-trusted-publishing; tools: skill, view, bash ✅ 0.20
analyzing-dotnet-performance Detects compiled regex startup budget and regex chain allocations 1.0/5 4.5/5 ⏰ timeout +3.5 ✅ analyzing-dotnet-performance; tools: skill ✅ 0.13
analyzing-dotnet-performance Detects CurrentCulture comparer and compiled regex budget in inflection rules 1.0/5 2.0/5 ⏰ timeout +1.0 ✅ analyzing-dotnet-performance; tools: skill, grep ✅ 0.13
analyzing-dotnet-performance Finds per-call Dictionary allocation not hoisted to static 1.0/5 5.0/5 ⏰ timeout +4.0 ✅ analyzing-dotnet-performance; tools: skill ✅ 0.13
analyzing-dotnet-performance Catches compound allocations in recursive number converter with ToLower 1.0/5 4.5/5 +3.5 ✅ analyzing-dotnet-performance; tools: skill ✅ 0.13
analyzing-dotnet-performance Finds StringComparison.Ordinal missing and FrozenDictionary opportunities 1.0/5 5.0/5 +4.0 ✅ analyzing-dotnet-performance; tools: skill ✅ 0.13
analyzing-dotnet-performance Detects Aggregate+Replace chain and struct missing IEquatable 1.0/5 4.5/5 ⏰ timeout +3.5 ✅ analyzing-dotnet-performance; tools: skill ✅ 0.13
analyzing-dotnet-performance Finds branched Replace chain in format string manipulation 1.0/5 4.0/5 +3.0 ✅ analyzing-dotnet-performance; tools: skill ✅ 0.13
analyzing-dotnet-performance Catches LINQ on hot-path string processing and All(char.IsUpper) 1.0/5 4.5/5 ⏰ timeout +3.5 ✅ analyzing-dotnet-performance; tools: skill ✅ 0.13
analyzing-dotnet-performance Detects LINQ pipeline in TimeSpan formatting and collection processing 1.0/5 5.0/5 +4.0 ✅ analyzing-dotnet-performance; tools: skill ✅ 0.13
analyzing-dotnet-performance Flags Span inconsistencies and compound method chains in truncation library 1.0/5 5.0/5 +4.0 ✅ analyzing-dotnet-performance; tools: skill, task ✅ 0.13
analyzing-dotnet-performance Identifies unsealed leaf classes and locale hierarchy patterns 1.0/5 5.0/5 ⏰ timeout +4.0 ✅ analyzing-dotnet-performance; tools: skill ✅ 0.13
dotnet-aot-compat Make Azure.ResourceManager AOT-compatible 2.0/5 3.5/5 ⏰ timeout +1.5 ✅ dotnet-aot-compat; tools: skill, read_agent, read_bash, create ✅ 0.14
optimizing-ef-core-queries Optimize bulk operations with EF Core 7+ ExecuteUpdate and ExecuteDelete 4.5/5 5.0/5 +0.5 ✅ optimizing-ef-core-queries; tools: skill 🟡 0.27
android-tombstone-symbolication Symbolicate .NET frames in an Android tombstone 4.0/5 5.0/5 +1.0 ✅ android-tombstone-symbolication; tools: skill, glob, stop_bash 🟡 0.21
android-tombstone-symbolication Recognize tombstone with no .NET frames 5.0/5 5.0/5 0.0 ✅ android-tombstone-symbolication; tools: skill, bash 🟡 0.21
android-tombstone-symbolication Symbolicate CoreCLR frames in an Android tombstone 3.5/5 4.0/5 +0.5 ✅ android-tombstone-symbolication; tools: skill, stop_bash 🟡 0.21
android-tombstone-symbolication Recognize NativeAOT tombstone with app binary and libSystem.Native.so 3.0/5 4.0/5 +1.0 ✅ android-tombstone-symbolication; tools: skill, bash, stop_bash 🟡 0.21
android-tombstone-symbolication Symbolicate multi-thread tombstone 3.5/5 ⏰ timeout 5.0/5 +1.5 ✅ android-tombstone-symbolication; tools: skill, stop_bash 🟡 0.21
android-tombstone-symbolication Handle .NET frames with no BuildId metadata 5.0/5 5.0/5 0.0 ✅ android-tombstone-symbolication; tools: skill, glob, bash, stop_bash 🟡 0.21
android-tombstone-symbolication Symbolicate tombstone with multiple .NET libraries and different BuildIds 3.0/5 4.5/5 +1.5 ✅ android-tombstone-symbolication; tools: skill, glob 🟡 0.21
android-tombstone-symbolication Reject iOS crash log as wrong format 5.0/5 5.0/5 0.0 ℹ️ not activated (expected) 🟡 0.21
dotnet-pinvoke Generate LibraryImport declaration from C header (.NET 8+) 4.5/5 5.0/5 +0.5 ✅ dotnet-pinvoke; tools: skill ✅ 0.11
dotnet-pinvoke Generate LibraryImport declaration from C header (.NET Framework) 5.0/5 5.0/5 0.0 ✅ dotnet-pinvoke; tools: skill ✅ 0.11
dump-collect Configure automatic crash dumps for CoreCLR app on Linux 5.0/5 4.5/5 -0.5 ✅ dump-collect; tools: skill, report_intent, view, glob 🟡 0.31
dump-collect Set up NativeAOT crash dumps with createdump in Kubernetes 1.5/5 ⏰ timeout 5.0/5 +3.5 ✅ dump-collect; tools: skill 🟡 0.31
dump-collect Recover crash dump from macOS NativeAOT without createdump 4.0/5 4.5/5 +0.5 ✅ dump-collect; tools: skill, report_intent, view, glob, bash 🟡 0.31
dump-collect Configure CoreCLR dump collection in Alpine Docker as non-root 3.5/5 4.0/5 +0.5 ✅ dump-collect; tools: skill, bash, report_intent, view, glob 🟡 0.31
dump-collect Advisory: macOS NativeAOT crash dump recovery steps 4.0/5 4.0/5 0.0 ✅ dump-collect; tools: skill, bash, glob 🟡 0.31
dump-collect Advisory: CoreCLR Alpine Docker non-root configuration 4.0/5 5.0/5 +1.0 ✅ dump-collect; tools: skill 🟡 0.31
dump-collect Advisory: NativeAOT Kubernetes dump collection setup 2.5/5 3.5/5 +1.0 ✅ dump-collect; tools: skill 🟡 0.31
dump-collect Detect runtime and configure crash dumps for unknown .NET app on Linux 3.5/5 4.0/5 +0.5 ✅ dump-collect; tools: skill 🟡 0.31
dump-collect Decline dump analysis request 2.0/5 5.0/5 +3.0 ℹ️ not activated (expected) 🟡 0.31

timeout — run hit the scenario timeout limit; scoring may be impacted by aborting model execution before it could produce its full output

Model: claude-opus-4.6 | Judge: claude-opus-4.6

Full results

@ViktorHofer ViktorHofer merged commit 54e6fe8 into main Mar 5, 2026
12 of 13 checks passed
@ViktorHofer ViktorHofer deleted the vihofer/skill-validator-native-aot branch March 5, 2026 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants