Technical reference for YAML frontmatter parsing, error classification, and auto-fix algorithms.
This document describes the implementation details of docmgr's YAML frontmatter validation and auto-fix system. It covers parsing algorithms, error classification, fix heuristics, and the integration points between parsing, diagnostics, and the validation CLI verb. Use this reference when debugging parsing issues, extending fix heuristics, or understanding tradeoffs in the design.
For usage instructions, examples, and troubleshooting guidance, see:
docmgr help yaml-frontmatter-validation
The frontmatter validation system consists of four main components:
internal/documents/frontmatter.go): Extracts frontmatter blocks, preprocesses YAML, decodes with position tracking, and wraps failures as diagnostics taxonomies.pkg/frontmatter/frontmatter.go): Quoting helpers that identify and quote risky scalar values before YAML decoding to reduce parse failures.pkg/commands/validate_frontmatter.go): Command that reads files, generates fix suggestions, applies auto-fix with backups, and re-parses to verify success.pkg/diagnostics/docmgrctx/frontmatter.go, pkg/diagnostics/docmgrrules/frontmatter_rules.go): Taxonomy context types and rule renderers that surface parse errors with line numbers, snippets, and fix suggestions.The design separates parsing (which can fail) from fix generation (which operates on raw bytes), allowing the validation verb to attempt repairs even when parsing fails completely.
The extractFrontmatter function in internal/documents/frontmatter.go manually scans for --- delimiters to separate frontmatter from body content.
Algorithm:
--- (trimmed) → startstart that equals --- (trimmed) → endstart != 0 or end <= start, return error: "frontmatter delimiters '---' not found"lines[start+1:end]lines[end+1:]fmStartLine = start + 2 (1-based, accounting for opening delimiter)Tradeoffs:
--- match (no tolerance for ---- or --- with trailing spaces)Before decoding, PreprocessYAML in pkg/frontmatter/frontmatter.go walks top-level key-value lines and quotes risky scalars.
Algorithm:
#)- or after trimming): to get key and value" or ') or complex ([, {, |, >), skipNeedsQuoting(value), replace value with QuoteValue(value)What NeedsQuoting detects:
@, `, #, &, *, !, |, >, %, ?: (colon-space) or trailing : # (space-hash)\t{{ or }}QuoteValue behavior:
'value''value' → ''value''Tradeoffs:
The parser uses yaml.Decoder with yaml.Node to preserve position information, then decodes the node into models.Document.
Error handling:
line ([0-9]+)absoluteLine = fmStartLine + yamlLine - 1FrontmatterParseTaxonomy with file, line, col, snippet, problemLine number extraction:
yamlLineRe = regexp.MustCompile(\line ([0-9]+)`)`---)fmStartLine + yamlLine - 1Snippet building:
max(1, line-1) to min(len(lines), line+1)%4d | %s\n with optional caret: | %s^\n (spaces for column position)Error classification:
The generateFixes function in pkg/commands/validate_frontmatter.go applies multiple heuristics in sequence to repair common issues.
normalizeDelimiters handles missing or malformed delimiters.
Algorithm:
--- line → start----like line after start → endstart, treat entire file as frontmatter (wrap it)end, scan from start+1 to first blank line or EOF → endlines[start+1:end]lines[end+1:]:)---\n + frontmatter + \n---\n + bodyTradeoffs:
scrubStrayDelimiters removes lines that look like delimiters but appear inside frontmatter content.
Algorithm:
---, skip itTradeoffs:
---, even if it's valid contentpeelTrailingNonKeyLines moves plain text lines (without :) from the end of frontmatter into the body.
Algorithm:
:, move it to bodyTradeoffs:
: are body content, not YAMLReuses PreprocessYAML to quote risky scalars (same logic as read-path preprocessing).
Tradeoffs:
generateFixes chains heuristics:
SplitFrontmatter (normal extraction)normalizeDelimiters (fallback)scrubStrayDelimiterspeelTrailingNonKeyLinesPreprocessYAML (quoting)---\n + fixed frontmatter + \n---\n + body (with peeled lines prepended)Fix descriptions:
The applyAutoFix function writes a backup (.bak) and rewrites the file.
Algorithm:
path + ".bak"pathTradeoffs:
FrontmatterParseContext in pkg/diagnostics/docmgrctx/frontmatter.go carries:
File: Document pathLine: Absolute line number (1-based)Column: Column number (usually 0, not reliably extracted)Snippet: Code snippet with line numbersProblem: User-friendly problem descriptionFixes: Array of suggested fix descriptions (populated by validation verb)Constructor:
NewFrontmatterParse: Creates taxonomy with StageFrontmatterParse, SymptomYAMLSyntaxcore.WrapWithCauseFrontmatterSyntaxRule in pkg/diagnostics/docmgrrules/frontmatter_rules.go renders taxonomies:
Output format:
YAML/frontmatter syntax error
File: <path>
Line: <line> Col: <col>
Problem: <problem>
Snippet:
<line numbers and code>
Suggested fixes:
1. <fix description>
2. <fix description>
Actions:
- Validate frontmatter: docmgr validate frontmatter --doc <path>
When fixes are present:
tryAttachFixes)--auto-fix)Frontmatter validation is fully integrated with docmgr's diagnostics taxonomy system. Parse errors are wrapped as FrontmatterParseTaxonomy objects that flow through the same rendering pipeline as other diagnostics (vocabulary warnings, missing files, stale docs, etc.). This means frontmatter errors appear consistently in docmgr doctor, docmgr list docs, docmgr doc search, and other commands that emit diagnostics.
The validation verb (docmgr validate frontmatter) can attach fix suggestions to the taxonomy context, which the rule renderer then surfaces to users. This design allows the same error taxonomy to be used both for reporting (via doctor/list/search) and for interactive fixing (via the validation verb with --suggest-fixes or --auto-fix).
For details on how to extend the diagnostics system, add new rules, or understand the taxonomy architecture, see:
docmgr help diagnostics-taxonomy-and-rules
All docmgr commands that write frontmatter (doc add, meta update, create_ticket, doc_move, rename_ticket, ticket_close, import) use WriteDocumentWithFrontmatter in internal/documents/frontmatter.go, which:
models.Document via yaml.EncoderPreprocessYAML to quote risky scalars---\n + preprocessed frontmatter + \n---\n\n + bodyTradeoffs:
pkg/commands/validate_frontmatter_test.go: Tests fix heuristics (delimiter normalization, stray delimiter cleanup)pkg/frontmatter/frontmatter_test.go: Tests quoting helpers (NeedsQuoting, QuoteValue, PreprocessYAML)internal/documents/frontmatter_test.go: Tests parsing and error extractiontest-scenarios/testing-doc-manager/18-validate-frontmatter-smoke.sh: Exercises validation verb (fail → suggest → auto-fix → success, verifies .bak creation)test-scenarios/testing-doc-manager/15-diagnostics-smoke.sh: Exercises frontmatter parse taxonomy via doctor/list/searchPreprocessYAML only processes top-level key-value pairs. Nested structures are skipped, so colons in nested values may still cause errors.---, even if it's valid content (rare edge case).: are body content (may misidentify valid YAML values)..bak files if auto-fix runs multiple times.pkg/commands/validate_frontmatter.gogenerateFixes (order matters: delimiters → cleanup → quoting)fixes arrayvalidate_frontmatter_test.goclassifyYAMLError in internal/documents/frontmatter.goNeedsQuoting to detect new risky patternsPreprocessYAML to handle nested structures (requires YAML parsing)Core parsing:
internal/documents/frontmatter.go: ReadDocumentWithFrontmatter, extractFrontmatter, buildSnippet, classifyYAMLError, WriteDocumentWithFrontmatter, SplitFrontmatterPreprocessing:
pkg/frontmatter/frontmatter.go: NeedsQuoting, QuoteValue, PreprocessYAMLValidation CLI:
pkg/commands/validate_frontmatter.go: ValidateFrontmatterCommand, generateFixes, normalizeDelimiters, scrubStrayDelimiters, peelTrailingNonKeyLines, applyAutoFixDiagnostics:
pkg/diagnostics/docmgrctx/frontmatter.go: FrontmatterParseContext, NewFrontmatterParseTaxonomypkg/diagnostics/docmgrrules/frontmatter_rules.go: FrontmatterSyntaxRuleTests:
pkg/commands/validate_frontmatter_test.go: Fix heuristic unit teststest-scenarios/testing-doc-manager/18-validate-frontmatter-smoke.sh: Validation verb smoke test