Add Apple crash symbolication skill by kotlarmilos · Pull Request #201 · dotnet/skills

kotlarmilos · 2026-03-04T16:57:57Z

Description

Adds automation, test coverage, and review fixes for the iOS crash symbolication skill.

Automation Script (`Symbolicate-Crash.ps1`)

New 664-line PowerShell script that automates the full .ips crash log symbolication workflow:

Parses two-part .ips JSON format (iOS 15+)
Identifies .NET runtime libraries (libcoreclr, libmonosgen-2.0, libSystem.*) in usedImages
Searches for dSYM debug symbols: user-provided paths → SDK packs → NuGet cache
Verifies UUID match via dwarfdump --uuid
Batch-symbolicates with atos (groups addresses per library for efficiency)
Identifies .NET runtime version by matching UUIDs against local packs
Supports -ParseOnly, -CrashingThreadOnly, -SkipVersionLookup, -DsymSearchPaths

Mirrors the Android sibling's Symbolicate-Tombstone.ps1 for structural parity.

SKILL.md Updates

Added Automation Script and Runtime Version Identification sections
Fixed atos -o to point inside dSYM bundle (Contents/Resources/DWARF/) — not the bundle itself
Added dwarfdump and Symbolicate-Crash.ps1 to INVOKES in frontmatter
Added "MAUI" keyword for trigger matching (matches Android sibling's phrasing)
Wrapped steps under ## Workflow heading for consistency with Android skill
Fixed misleading "rebuild" guidance per @rolfbjarne's review — dSYM mismatch means locating the original build artifacts, not rebuilding

Test Suite (7 scenarios)

Scenario	Tests
Mono crash symbolication	UUID extraction, atos commands, ASI/NullRefException
No .NET frames (pure Swift)	Correctly stops, no false symbolication
CoreCLR crash	Identifies CoreCLR (not Mono), EXC_BAD_ACCESS
NativeAOT crash	Recognizes static linking, libSystem.* BCL libs
Multiple .NET libraries	Distinct UUIDs per library, separate atos calls
ASI field priority	Checks managed exception before native symbolication
Reject Android tombstone	Wrong format detection, suggests Android skill

Validation

Multi-model review (Sonnet 4, GPT-5.1-Codex, Opus 4.5): 4/5 across all 3 models
skill-validator A/B testing: 5/7 scenarios show improvement (NativeAOT +2.0, Android rejection +2.0, ASI priority +1.0, multi-lib +1.0), 2 ties on baseline-strong scenarios
Overfitting score: ✅ 0.12 (low — eval tests outcomes, not skill vocabulary)
All pre-submission checklist items pass: trigger coverage 8/8, stop signals explicit, domain examples present, token budget ~2K (under 4K limit)

Co-author: @steveisok

Co-authored-by: Steve Pfister <steveisok@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Adds a new ios-crash-symbolication skill under the .NET plugin to guide retrieval and symbolication of iOS .ips crash logs, focused on resolving .NET runtime native frames via dSYMs and atos.

Changes:

Introduces a new skill markdown (SKILL.md) documenting an end-to-end workflow for .ips parsing, runtime image identification, dSYM discovery, and atos invocation.
Adds validation criteria, stop signals, and common pitfalls specific to iOS crash logs and .NET runtime components.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…ation - Add Symbolicate-Crash.ps1 (664 lines): parses .ips JSON, searches local dSYMs (SDK packs, NuGet cache, user paths), verifies UUIDs via dwarfdump, batch-symbolicates with atos, identifies runtime version - Update SKILL.md: add Automation Script and Runtime Version Identification sections, fix atos -o to point inside dSYM bundle, add dwarfdump and MAUI to frontmatter, wrap steps in Workflow heading for consistency with Android sibling, fix misleading 'rebuild' guidance per review feedback (Rolf) - Add eval.yaml with 7 test scenarios: Mono crash, CoreCLR crash, no .NET frames, NativeAOT, multi-library UUIDs, ASI field priority, and Android tombstone rejection - Add 5 .ips test fixture files (two-part JSON format) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…los/skills into feature/ios-crash-symbolication

Address PR feedback to support all Apple platforms (tvOS, Mac Catalyst, macOS) not just iOS: - Add $appleRids array covering ios, tvos, maccatalyst, and osx RIDs - Refactor Find-Dsym and Find-RuntimeVersion to search all platform packs - Rewrite SKILL.md for orchestration focus and reduced token budget - Extract domain knowledge to references/ips-crash-format.md - Tune eval.yaml: outcome-based rubrics, broad assertions, overfit 0.26 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…cess - Simplify base address cast to [uint64] without manual hex prefix stripping - Use PSObject.Properties check before accessing lastExceptionBacktrace Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add rubric items testing for skill-specific domain knowledge (SDK pack paths, NuGet cache directories) that baseline agents cannot provide. This creates the quality delta needed to pass the 10% improvement threshold while keeping overfit at 0.12 (Low). 3-run validation: 30.1% improvement, 5/7 scenarios positive. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…tion' into feature/ios-crash-symbolication

ViktorHofer · 2026-03-30T11:24:15Z

@kotlarmilos looks like this needs a bit more work

The apple-crash-symbolication skill's stop signal for wrong file formats was naming specific Android tools (ndk-stack, addr2line) and the android-tombstone-symbolication skill. This caused models to learn about Android symbolication from the Apple skill and then actually execute it, resulting in 4-5x token bloat and completion regression on the Android rejection eval scenario. Remove tool suggestions from the stop signal — just say 'stop, don't symbolicate.' The Android skill handles its own routing when loaded. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

steveisok · 2026-03-30T12:12:41Z

@kotlarmilos looks like this needs a bit more work

I think the scoring for this type of skill is tough.

For the MAUI scenario:

Quality:
0.0 — baseline 2.3/5, skilled 3/5... but the pairwise judge said tie (position-swap inconsistent → defaulted to 0). So the 0.7 quality improvement gets zeroed out.
Pairwise:
0.0 — same reason, inconsistent swap → tie
The only non-zero terms are time (-0.54, penalizing the skill for 75s→116s) and tool calls (-0.29)

So the skill gets zero credit for quality improvement because the pairwise judge couldn't make up its mind when the response positions were swapped. Then the small overhead penalties push it negative.

This is an eval robustness issue — the pairwise judge is position-swap-sensitive, and when it defaults to "tie," the skill gets no quality credit despite the rubric judge scoring it 0.7 points higher. The outputs are genuinely different in quality (the skilled version actually symbolicates frames),
but the pairwise comparison is noisy enough that swapping A/B changes the winner.

Summary: The skill isn't performing poorly — the scoring is fragile for scenarios where the improvement is "more of the same kind of work but better." The skill's clear wins (symbol server, .NET library identification, actual symbolication) get discounted by a noisy pairwise judge, and then the
overhead from doing that valuable work pushes the score negative.

ViktorHofer · 2026-03-30T12:26:04Z

If you have suggestions how to improve the scoring, please let us know and submit a PR. We need to trust and rely on the scoring of skill-validator. We will eventually also make as positive required for merging.

What about the "not activated" scenario that also shows a negative verdict?

steveisok · 2026-03-30T12:31:07Z

What about the "not activated" scenario that also shows a negative verdict?

I fixed that to be really basic. The problem before was that it was "too good". The wording caused it to execute the actual android-symbolication skill instead of skip the apple skill outright lol.

ViktorHofer · 2026-03-30T13:06:20Z

/evaluate

github-actions · 2026-03-30T13:17:57Z

Skill Validation Results

Skill	Scenario	Quality	Skills Loaded	Overfit	Verdict
apple-crash-symbolication	Parse .NET frames and locate dSYMs from an iOS crash log	3.0/5 → 3.7/5 🟢	✅ apple-crash-symbolication; tools: skill	✅ 0.19	✅
apple-crash-symbolication	Investigate root cause of a .NET MAUI iOS crash	2.7/5 → 3.3/5 🟢	✅ apple-crash-symbolication; tools: skill	✅ 0.19	✅
apple-crash-symbolication	Reject Android tombstone passed as iOS crash log	5.0/5 → 4.7/5 🔴	ℹ️ not activated (expected)	✅ 0.19	❌

⏰ timeout — run(s) hit the (300s) scenario timeout limit; scoring may be impacted by aborting model execution before it could produce its full output (increase via timeout in eval.yaml)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

📖 See InvestigatingResults.md for how to diagnose failures. Additional debugging guidance may be provided by your workflow.

🔍 Full Results - additional metrics and failure investigation steps

ViktorHofer · 2026-03-30T13:26:03Z

@steveisok looks like in this run (this whole process is non-deterministic as we know) the scenario that we talked about now has a winning verdict but the non-activate scenario is still failing.

steveisok · 2026-03-30T13:49:00Z

@steveisok looks like in this run (this whole process is non-deterministic as we know) the scenario that we talked about now has a winning verdict but the non-activate scenario is still failing.

Interesting analysis:

Baseline (no skills, 2 tool calls): Gives a perfect textual analysis — "here's how to symbolicate" — instructions only, no actual work. Score: 5.0/5.

Isolated (apple skill loaded, 8 tool calls): Apple skill correctly doesn't activate. But the model goes further — it actually downloads symbols with dotnet-symbol, runs addr2line, verifies BuildIds with readelf. Real symbolication work! But the captured output is just "Cleaned up temporary symbol
files." — the model cleaned up after itself and the final message is useless. Judge gave 4.7/5 noting the analysis was excellent mid-conversation.

Plugin (all dotnet-diag skills, 10 tool calls): The android-tombstone-symbolication skill correctly activates (skillEventCount: 3). The model runs its PowerShell symbolication script, which times out at 300s. Final output: "Let me run the symbolication script" — incomplete.

The irony: The skilled runs are actually doing better work (real symbolication vs. just instructions), but:

The baseline gets 5/5 for telling you what to do
The skilled runs get penalized for actually trying to do it (more tokens/tools/time)
Isolated mode's final captured output is a cleanup message
Plugin mode's real work timed out

Root causes:

Plugin mode: Cross-skill contamination — the android skill correctly activates for an Android tombstone, but its overhead is charged to the apple skill's eval. This is working as designed (user would have both skills), but the eval can't distinguish "good activation of sibling skill" from "bad
overhead."
Isolated mode: Even without activation, the model does 4x more work with the skill loaded. It's going from "give instructions" to "actually symbolicate" — arguably better behavior, but the eval penalizes the overhead.

What I'd recommend: The expectActivation: false scoring path needs special handling — when a skill correctly doesn't activate, efficiency penalties should be heavily dampened. The skill did its job (stayed out of the way).

steveisok · 2026-03-30T14:47:10Z

@ViktorHofer one additional point. The validator runs on linux, but this is by and large a mac-based skill. For example, atos does not exist on linux and (seemingly) no matter how much you try, the model will try to execute and fail. That is extra time and tokens spent on a dead-end. The relative scores are also higher on the mac as a result.

ViktorHofer · 2026-03-30T14:48:29Z

If the skill only works on Mac, should it require such an environment in the description or later in the skill content?

steveisok · 2026-03-30T14:57:14Z

If the skill only works on Mac, should it require such an environment in the description or later in the skill content?

You would think, but even the most aggressive stop signals have a tendency to be 'suggestions'.

ViktorHofer · 2026-03-30T15:03:14Z

Sorry for the naive question but did you try system prompt terms like CRITICAL: ..., etc in the description? We previously had that in our msbuild skills and that worked well.

danmoseley · 2026-03-30T15:03:14Z

You would think, but even the most aggressive stop signals have a tendency to be 'suggestions'.

That's fine it's just an optimization it's ok if it's not perfect

steveisok · 2026-03-30T15:12:54Z

Sorry for the naive question but did you try system prompt terms like CRITICAL: ..., etc in the description? We previously had that in our msbuild skills and that worked well.

Early commits in this PR showed it didn't really matter. Worth another try though. I'll push something up.

steveisok · 2026-03-30T19:36:50Z

Sorry for the naive question but did you try system prompt terms like CRITICAL: ..., etc in the description? We previously had that in our msbuild skills and that worked well.

Early commits in this PR showed it didn't really matter. Worth another try though. I'll push something up.

It doesn't address the actual failure. The Android rejection scenario fails because:

Isolated: The apple skill isn't activated — the model just does more work with bash (8 vs 2 tool calls). No amount of "CRITICAL" in the apple skill description helps when the skill isn't even invoked.
Plugin: The android skill activates and times out. Stronger language in the apple skill can't prevent a sibling skill from activating.

We already learned this anti-pattern. The original stop signal said "use ndk-stack/addr2line instead" — teaching the model about Android tools caused it to use them. Adding "CRITICAL: DO NOT execute atos on Linux" would similarly draw attention to atos on a platform where it doesn't exist. The model
discovers atos doesn't exist in ~1 tool call; the "CRITICAL" preamble wouldn't save much.

It pollutes the skill for real users. This skill targets macOS developers. Adding Linux-avoidance language to help CI pass is the tail wagging the dog.

Where "CRITICAL" does work (and why Viktor saw success with msbuild): when the problem is the model misapplying the skill — doing the wrong thing when the skill IS activated. That's a routing/behavior problem. Here, the skill correctly doesn't activate. The problem is environmental (Linux CI) and
scoring (expectActivation: false penalties).

steveisok · 2026-03-30T19:42:31Z

Isolated: The apple skill isn't activated — the model just does more work with bash (8 vs 2 tool calls). No amount of "CRITICAL" in the apple skill description helps when the skill isn't even invoked.

Plugin: The android skill activates and times out. Stronger language in the apple skill can't prevent a sibling skill from activating.

I'm translating this to mean that the skill is getting punished for correctly not activating the skill to be tested, but instead activating the android-symbolication skill and getting hung up there. Nothing we can do in the skill itself will prevent sibling skills from activating.

Replace 4 rubric items (including 2 'Did NOT' items) with 2 items that test positive knowledge the skill provides. The 'Did NOT' items gave credit to baseline responses that also don't attempt iOS workflow (because they don't know about it), minimizing the quality delta. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ns.txt Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Main changed known-domains.txt to use path-scoped entries (nuget.org/account/trustedpublishing instead of bare nuget.org). Switch the symbols download URL to the v3 flatcontainer endpoint on api.nuget.org which is in the allowlist. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

JanKrivanek · 2026-04-10T12:03:12Z

/evaluate

github-actions · 2026-04-10T12:11:29Z

Skill Validation Results

Skill	Scenario	Quality	Skills Loaded	Overfit	Verdict
apple-crash-symbolication	Parse .NET frames and locate dSYMs from an iOS crash log	3.0/5 → 3.0/5	✅ apple-crash-symbolication; tools: skill	✅ 0.19	❌ [1]
apple-crash-symbolication	Investigate root cause of a .NET MAUI iOS crash	3.0/5 → 4.0/5 🟢	✅ apple-crash-symbolication; tools: skill, bash	✅ 0.19	✅ [2]
apple-crash-symbolication	Reject Android tombstone passed as iOS crash log	3.7/5 → 4.3/5 🟢	ℹ️ not activated (expected)	✅ 0.19	❌ [3]

[1] ⚠️ High run-to-run variance (CV=22.67) — consider re-running with --runs 5. (Isolated) Quality unchanged but weighted score is -16.9% due to: judgment, quality
[2] ⚠️ High run-to-run variance (CV=0.64) — consider re-running with --runs 5
[3] ⚠️ High run-to-run variance (CV=2.67) — consider re-running with --runs 5. (Plugin) Quality improved but weighted score is -14.2% due to: completion (✓ → ✗), tokens (27515 → 70333), tool calls (2 → 7), time (33.0s → 44.7s)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

🔍 Full Results - additional metrics and failure investigation steps

▶ Sessions Visualisation -- interactive replay of all evaluation sessions

JanKrivanek

The skils themselves seems solid.
The actual scenarios or rubrics might need some tunning - but that can be done as on optional followup (as of now - the eval scores are informational)

* Add SKILL.md for iOS crash symbolication process Co-authored-by: Steve Pfister <steveisok@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Update plugins/dotnet/skills/ios-crash-symbolication/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Add automation script, tests, and review fixes for ios-crash-symbolication - Add Symbolicate-Crash.ps1 (664 lines): parses .ips JSON, searches local dSYMs (SDK packs, NuGet cache, user paths), verifies UUIDs via dwarfdump, batch-symbolicates with atos, identifies runtime version - Update SKILL.md: add Automation Script and Runtime Version Identification sections, fix atos -o to point inside dSYM bundle, add dwarfdump and MAUI to frontmatter, wrap steps in Workflow heading for consistency with Android sibling, fix misleading 'rebuild' guidance per review feedback (Rolf) - Add eval.yaml with 7 test scenarios: Mono crash, CoreCLR crash, no .NET frames, NativeAOT, multi-library UUIDs, ASI field priority, and Android tombstone rejection - Add 5 .ips test fixture files (two-part JSON format) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add iOS crash log and evaluation scenarios for symbolication tests * Remove unused file * Add CODEOWNERS entry for iOS crash symbolication * Rename ios-crash-symbolication to apple-crash-symbolication Address PR feedback to support all Apple platforms (tvOS, Mac Catalyst, macOS) not just iOS: - Add $appleRids array covering ios, tvos, maccatalyst, and osx RIDs - Refactor Find-Dsym and Find-RuntimeVersion to search all platform packs - Rewrite SKILL.md for orchestration focus and reduced token budget - Extract domain knowledge to references/ips-crash-format.md - Tune eval.yaml: outcome-based rubrics, broad assertions, overfit 0.26 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix script robustness: base address parsing and null-safe property access - Simplify base address cast to [uint64] without manual hex prefix stripping - Use PSObject.Properties check before accessing lastExceptionBacktrace Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add dSYM-path rubric items to CoreCLR and ASI scenarios Add rubric items testing for skill-specific domain knowledge (SDK pack paths, NuGet cache directories) that baseline agents cannot provide. This creates the quality delta needed to pass the 10% improvement threshold while keeping overfit at 0.12 (Low). 3-run validation: 30.1% improvement, 5/7 scenarios positive. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Refactor code structure for improved readability and maintainability * Fix crash symbolication script for real .ips format and improve eval scenarios - Fix Symbolicate-Crash.ps1 strict-mode bugs with real .ips crash logs: safe property access for image name/path (sentinel entries), thread name (most threads unnamed), and single-element array unwrapping (.Count) - Add SKILL.md efficiency guidance: resolve script path from skill directory (no find /), start with -ParseOnly, don't run broad filesystem searches - Reframe eval scenarios for platform-independent evaluation (parse/analyze instead of requiring macOS-only atos/dwarfdump), tighten rubric to test skill-specific knowledge (NuGet package name, all .NET binary images) Validation results (3 runs, claude-opus-4.6): Scenario 1 (parse frames): 3.3 → 4.0 (+0.7) ✅ Scenario 2 (investigate crash): 3.3 → 4.0 (+0.7) ✅ Scenario 3 (reject Android): 3.3 → 4.0 (+0.7) ✅ Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Auto-fallback to parse-only output when atos is unavailable On Linux (CI), atos and xcrun don't exist. Previously the script would error out after completing all parsing, losing the results. Now it detects the missing tool and falls back to ParseOnly output automatically, ensuring the agent always gets structured parse data regardless of platform. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Move apple-crash-symbolication to dotnet-diag plugin Relocate skill from plugins/dotnet to plugins/dotnet-diag and tests from tests/dotnet to tests/dotnet-diag. Update CODEOWNERS accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Improve apple-crash-symbolication: macOS symbols, bug fixes, training log - Fix JSON case-conflict parsing (vmRegionInfo/vmregioninfo duplicate keys) - Fix strict-mode safe access for optional asi field - Expand Step 4 with macOS-specific symbol package guidance (.symbols NuGet) - Add .dwarf to .dSYM conversion instructions - Add src/coreclr/ to validation paths - Soften stop signals to allow crash analysis and deeper investigation - Add macOS Symbol Packages and JSON Parsing Gotchas to reference doc - Create training log documenting session findings - Add .github/skills/ project-local copy for CLI testing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * apple-crash-symbolication: automated version extraction and symbol acquisition Script improvements: - Preserve full image paths from crash log (previously discarded by GetFileName) - Add Get-RuntimeVersionFromPath: extracts .NET version from image paths (e.g., .../shared/Microsoft.NETCore.App/10.0.4/libcoreclr.dylib) - Add Get-RidFromPath: infers RID from path or crash metadata (OS/CPU) - Path-based version detection as fast primary method, UUID matching as fallback - Emit copy-pasteable symbol acquisition commands when dSYMs are missing - Show .NET version in ParseOnly library listing SKILL.md updates: - Step 2: document automated version detection and acquisition commands - Step 4: script now prints ready-to-run download/conversion commands Training log: record session 2 findings (5 issues, script + SKILL.md changes) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * apple-crash-symbolication: add symbol server anti-pattern dotnet-symbol and msdl.microsoft.com do not serve macOS dSYM/DWARF symbols — only Windows PDBs and Linux ELF debug info. NuGet packages are the only public source. Added anti-pattern to SKILL.md (both copies) and reference doc to prevent wasted tool calls. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * apple-crash-symbolication: fix symbol server guidance — dotnet-symbol works for macOS dotnet-symbol --symbols <binary> successfully downloads .dwarf debug symbols for macOS Mach-O binaries from msdl.microsoft.com. Previous commit incorrectly claimed this didn't work. Replaced anti-pattern with positive guidance in SKILL.md (both copies) and reference doc. Added training log entry documenting the correction. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Promote dotnet-symbol as preferred macOS symbol acquisition method - Script: macOS acquisition block now shows Option A (dotnet-symbol, preferred) and Option B (.symbols NuGet, fallback). Fixed .dwarf filename doubling bug in cp command. - SKILL.md (both copies): Step 2 updated for dotnet-symbol preference, Step 4 reordered with dotnet-symbol as dotnet#2 and NuGet symbols as dotnet#3. - Reference doc: Restructured macOS Symbol Packages section with Preferred/Fallback subsections and shared .dwarf→.dSYM conversion. - Training log: Session 4 entry documenting the promotion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add automated symbol server download for macOS crash symbolication - Add Get-DebugSymbols function: downloads .dwarf files from Microsoft symbol server using Mach-O UUID (mirrors android tombstone approach) - Add Convert-DwarfToDsym function: creates .dSYM bundle from .dwarf with UUID verification via dwarfdump - New params: -SymbolCacheDir, -SymbolServerUrl, -SkipSymbolDownload - Wire download+conversion into main flow after local dSYM search - Refactor manual acquisition guidance as fallback-only - Update both SKILL.md copies: frontmatter, Step 4, new flags - Update ips-crash-format.md: automated download as primary method - Add training log session 5 URL pattern: https://msdl.microsoft.com/download/symbols/_.dwarf/mach-uuid-sym-{UUID}/_.dwarf Verified end-to-end: 391/391 .NET frames symbolicated with clean cache. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * apple-crash-symbolication: add triage order guidance to Step 3 Step 3 now instructs the agent to explain the faulting mechanism (frames #0-dotnet#1) before examining cross-thread context. This addresses a misdiagnosis where GC activity on neighboring threads was mistaken for causation when the actual root cause (_sigtramp -> NULL signal handler) was visible in the crashing thread's first two frames. Training log updated with session retrospective and corrected the original crash description from GC race to NULL signal handler. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Remove stale .github/skills/apple-crash-symbolication remnant Skill lives in plugins/dotnet-diag/skills/apple-crash-symbolication/ since the plugin restructuring. The old .github/skills/ copy was a leftover. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add libimobiledevice.org to known-domains.txt The reference scanner CI check fails because the apple-crash-symbolication skill references https://libimobiledevice.org/ which is not in the allowed domains list. This domain hosts the libimobiledevice project, a community library for communicating with iOS devices. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Delete eng/reference-scanner/known-domains.txt * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Add libimobiledevice.org to known-domains allowlist The apple-crash-symbolication SKILL.md references libimobiledevice.org for the idevicecrashreport tool. Add the domain to the known-domains file to fix the skill-check CI reference validation failure. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Restructure apple-crash-symbolication for analysis-first approach The skill was getting ❌ verdicts on all eval scenarios because it directed the LLM to 'run the script' instead of teaching crash analysis domain knowledge. This mirrors the problem where quality decreased from 3.0 to 2.7 in the parse scenario. Restructure to follow the Android sibling's proven pattern: - Lead with parsing (.ips two-part JSON format, key fields) - Teach .NET library identification (inline library table) - Teach crash interpretation (asi, faulting thread, exception) - Teach atos command construction (concrete examples) - Teach dSYM search paths (ordered list with commands) - Move automation script to optional section at the end - Move crash log retrieval to a separate section (not Step 1) Trim the reference doc to avoid duplication, keeping only macOS symbol distribution differences and supported RID list. Token budget: ~2,100 tokens (within 800-2,500 optimal range). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Refine prompts and assertions in eval.yaml for iOS crash symbolication scenarios * Add format verification guard and fix overfitting in reject scenario - SKILL.md: Add explicit format check at start of Step 1 to verify .ips JSON before proceeding; add 'Wrong file format' stop signal - eval.yaml: Remove direct skill reference from scenario 3 prompt to reduce overfitting (was triggering skill activation in plugin mode) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix scenario 3: provide actual Android tombstone file for rejection test - eval.yaml: Replace invalid 'extra_files' with correct 'files' syntax (extra_files was silently ignored, so crash_android.txt was never copied) - eval.yaml: Remove copy_test_files for scenario 3 to avoid copying ios_crash.ips which tempts the agent in plugin mode - Add tombstone_sample.txt to test directory (copy from android sibling) - SKILL.md: Mention ndk-stack/addr2line in wrong-format stop signal Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Update tests/dotnet-diag/apple-crash-symbolication/tombstone_sample.txt Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update eng/known-domains.txt Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Address review feedback: simulator RIDs, perf, safety, accuracy fixes - Add simulator RIDs (iossimulator, tvossimulator) to script search list - Build hashtable for O(1) image lookups in Get-ThreadFrames - Remove -UseBasicParsing (unnecessary in pwsh 7+) - Include UUID in Convert-DwarfToDsym cache path; sanitize library name - Make version regex greedy to capture pre-release suffixes - Improve format detection error message for non-.ips files - Guard xcrun fallback with Get-Command check - Add .dwarf-to-.dSYM conversion guidance for macOS manual fallback - Escape .ips in eval.yaml regex assertion - Fix UUID note in reference doc (normalize, not assume lowercase) - Fix symbol server example to use crash-log image name Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address Copilot review round 2: case-sensitive replace, dedup, path fix, redact PII - Use -creplace with guard for vmregioninfo duplicate key handling - Use Sort-Object -Unique for proper library deduplication - Fix macOS fallback to create .dSYM bundles under symbols-out/ - Redact device identifiers in test fixture .ips file Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Prevent dSYM cache poisoning: validate UUID on cache hit, remove on mismatch - Convert-DwarfToDsym now verifies cached dSYM UUID before reusing - On UUID mismatch during download, remove the bad cached bundle so subsequent runs can retry cleanly Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix cache validation: normalize UUID comparison, clean up dwarf + bundle on mismatch - Use Format-Uuid to normalize dwarfdump output before comparing to already-normalized \ in Convert-DwarfToDsym cache check - On UUID mismatch after download, remove the entire .dSYM bundle directory and the cached .dwarf file to prevent repeat failures Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix Resolve-Frames null handling: use List[object] to preserve null entries PowerShell array += \ silently drops the element, breaking index alignment between results and input addresses. Switch to List[object].Add() which correctly preserves null entries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix dSYM bundle root traversal, qualify version-from-path note - Walk up directory tree until *.dSYM is found instead of going up only 2 levels (which lands at Contents/Resources, not the bundle) - SKILL.md: note that version-in-path only works on macOS shared framework installs; iOS paths don't embed the runtime version Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Remove incorrect .github/skills path reference from training log The .github/skills/ directory doesn't exist in this repo. The file was already listed under its actual plugins/dotnet-diag/ path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Detect simulator for RID inference in manual fallback guidance Detect CoreSimulator in image paths to use iossimulator-/tvossimulator- RIDs for simulator crashes, avoiding UUID mismatches from wrong runtime packs. Also handle arm64e CPU type. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix Resolve-Frames return type and scope issue in simulator detection - Return \.ToArray() instead of ,\ to avoid wrapping the list in a single-element array - Use \.usedImages instead of undefined \ in RID inference block Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Clarify ParseOnly .NET Libraries section shows only frame-relevant images Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix YAML escape in assertion pattern (use single quotes for regex backslash) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Tighten stop signal to prevent Android symbolication spillover The apple-crash-symbolication skill's stop signal for wrong file formats was naming specific Android tools (ndk-stack, addr2line) and the android-tombstone-symbolication skill. This caused models to learn about Android symbolication from the Apple skill and then actually execute it, resulting in 4-5x token bloat and completion regression on the Android rejection eval scenario. Remove tool suggestions from the stop signal — just say 'stop, don't symbolicate.' The Android skill handles its own routing when loaded. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Simplify scenario 3 rubric to focus on positive skill knowledge Replace 4 rubric items (including 2 'Did NOT' items) with 2 items that test positive knowledge the skill provides. The 'Did NOT' items gave credit to baseline responses that also don't attempt iOS workflow (because they don't know about it), minimizing the quality delta. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix nuget.org domain reference: drop www. prefix to match known-domains.txt Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Merge upstream main and use api.nuget.org v3 endpoint Main changed known-domains.txt to use path-scoped entries (nuget.org/account/trustedpublishing instead of bare nuget.org). Switch the symbols download URL to the v3 flatcontainer endpoint on api.nuget.org which is in the allowlist. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix .gitignore merge conflict Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Steve Pfister <steveisok@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Steve Pfister <stpfiste@microsoft.com> Co-authored-by: Viktor Hofer <viktor.hofer@microsoft.com> Co-authored-by: Viktor Hofer <7412651+ViktorHofer@users.noreply.github.com> Co-authored-by: Dan Moseley <danmose@microsoft.com>

Add SKILL.md for iOS crash symbolication process

3db425e

Co-authored-by: Steve Pfister <steveisok@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

kotlarmilos requested review from dbreshears and timheuer as code owners March 4, 2026 16:57

kotlarmilos requested review from Copilot and steveisok March 4, 2026 16:57

kotlarmilos assigned kotlarmilos and steveisok Mar 4, 2026

Copilot started reviewing on behalf of kotlarmilos March 4, 2026 16:58 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

Comment thread plugins/dotnet/skills/ios-crash-symbolication/SKILL.md Outdated

Update plugins/dotnet/skills/ios-crash-symbolication/SKILL.md

c75673c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

kotlarmilos temporarily deployed to evaluation March 4, 2026 17:24 — with GitHub Actions Inactive

steveisok requested review from Redth and rolfbjarne March 4, 2026 17:58

rolfbjarne reviewed Mar 4, 2026

View reviewed changes

Comment thread plugins/dotnet/skills/ios-crash-symbolication/SKILL.md Outdated

dotnet deleted a comment from github-actions Bot Mar 5, 2026

kotlarmilos added 4 commits March 5, 2026 10:45

Add iOS crash log and evaluation scenarios for symbolication tests

7a1c949

Merge branch 'feature/ios-crash-symbolication' of github.com:kotlarmi…

d00fce7

…los/skills into feature/ios-crash-symbolication

Remove unused file

ca09a9d

Add CODEOWNERS entry for iOS crash symbolication

34bdfa4

rolfbjarne reviewed Mar 5, 2026

View reviewed changes

Comment thread plugins/dotnet/skills/ios-crash-symbolication/scripts/Symbolicate-Crash.ps1 Outdated

Comment thread tests/dotnet/apple-crash-symbolication/crash_nativeaot.ips Outdated

kotlarmilos commented Mar 5, 2026

View reviewed changes

Comment thread plugins/dotnet/skills/ios-crash-symbolication/SKILL.md Outdated

Comment thread tests/dotnet/apple-crash-symbolication/crash_coreclr.ips Outdated

steveisok and others added 4 commits March 5, 2026 11:41

Merge remote-tracking branch 'kotlarmilos/feature/ios-crash-symbolica…

96763f2

…tion' into feature/ios-crash-symbolication

rolfbjarne reviewed Mar 5, 2026

View reviewed changes

Comment thread plugins/dotnet/skills/apple-crash-symbolication/references/ips-crash-format.md Outdated

Refactor code structure for improved readability and maintainability

fa570e1

kotlarmilos requested review from a team and rolfbjarne March 6, 2026 16:18

steveisok mentioned this pull request Mar 30, 2026

Fall back to rubric scores when pairwise judge is position-swap-inconsistent #475

Closed

Merge branch 'main' into feature/ios-crash-symbolication

bf20737

kotlarmilos and others added 5 commits April 8, 2026 11:17

Fix nuget.org domain reference: drop www. prefix to match known-domai…

19e754a

…ns.txt Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge branch 'upstream-main' into feature/ios-crash-symbolication

49cedea

Fix .gitignore merge conflict

1cc4479

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions Bot added a commit that referenced this pull request Apr 10, 2026

Update session data (PR #201)

31bbe6e

JanKrivanek approved these changes Apr 10, 2026

View reviewed changes

steveisok merged commit 038dd4f into dotnet:main Apr 10, 2026
32 checks passed

Conversation

kotlarmilos commented Mar 4, 2026 • edited by steveisok Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Automation Script (Symbolicate-Crash.ps1)

SKILL.md Updates

Test Suite (7 scenarios)

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ViktorHofer commented Mar 30, 2026

Uh oh!

steveisok commented Mar 30, 2026

Uh oh!

ViktorHofer commented Mar 30, 2026

Uh oh!

steveisok commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ViktorHofer commented Mar 30, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026

Skill Validation Results

Uh oh!

ViktorHofer commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steveisok commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steveisok commented Mar 30, 2026

Uh oh!

ViktorHofer commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steveisok commented Mar 30, 2026

Uh oh!

ViktorHofer commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danmoseley commented Mar 30, 2026

Uh oh!

steveisok commented Mar 30, 2026

Uh oh!

steveisok commented Mar 30, 2026

Uh oh!

steveisok commented Mar 30, 2026

Uh oh!

JanKrivanek commented Apr 10, 2026

Uh oh!

github-actions Bot commented Apr 10, 2026

Skill Validation Results

Uh oh!

JanKrivanek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

kotlarmilos commented Mar 4, 2026 •

edited by steveisok

Loading

Automation Script (`Symbolicate-Crash.ps1`)

steveisok commented Mar 30, 2026 •

edited

Loading

ViktorHofer commented Mar 30, 2026 •

edited

Loading

steveisok commented Mar 30, 2026 •

edited

Loading

ViktorHofer commented Mar 30, 2026 •

edited

Loading

ViktorHofer commented Mar 30, 2026 •

edited

Loading