Guide for creating, structuring, and maintaining skill documents that teach LLMs disciplined workflows.
Skills are structured markdown documents that teach LLMs (and humans) to follow disciplined workflows. Unlike general documentation that describes what exists, skills prescribe how to work—they're executable playbooks that enforce best practices like test-driven development, systematic debugging, and structured design processes. A well-written skill transforms "maybe I should write tests" into "you MUST write a failing test first, watch it fail, then implement." This document teaches you how to create effective skills that LLMs will follow reliably.
Skills in docmgr are plan files (skill.yaml) that live under ttmp/skills/ or a ticket’s skills/ folder. The plan’s skill metadata includes what_for (what the skill accomplishes) and when_to_use (when to apply it). These fields, combined with clear structure and strong enforcement language, ensure skills are discoverable and consistently applied.
What you'll learn:
Create a skill when you have a workflow that:
Good candidates:
Poor candidates:
Skills in docmgr use a skill.yaml plan file with explicit metadata and sources:
skill:
name: test-driven-development
title: Test-Driven Development
description: Enforces the RED-GREEN-REFACTOR cycle.
what_for: Ensure every function has a failing test before implementation.
when_to_use: Use when implementing features or refactoring behavior.
topics: [testing, tdd, quality]
license: Proprietary
compatibility: Requires go test tooling.
sources:
- type: file
path: backend/testing/framework.md
output: references/testing-framework.md
strip-frontmatter: true
append_to_body: false
- type: binary-help
binary: glaze
topic: help-system
output: references/glaze-help-system.md
wrap: markdown
output:
skill_dir_name: test-driven-development
skill_md:
include_index: true
index_title: References
Key fields explained:
skill.what_for: Explains what the skill accomplishes. Keep it concise (2-3 sentences). Focus on outcomes and benefits, not process steps.
skill.when_to_use: The trigger condition that helps LLMs (and humans) decide when to apply this skill. Use clear "use when" language: "Use when implementing any feature", "Use when encountering any bug", "Use when starting creative work".
skill.topics: Enable filtering with docmgr skill list --topics testing. Choose topics that match how developers think about the domain.
sources: Declares explicit files or binary help output that should be packaged into the skill. docmgr skill list --file and --dir filter against file sources.
sources[].append_to_body: When true, the resolved content is appended into the main SKILL.md body (in order) before the references index, and the source output file is not written. When any append-to-body content exists, the auto-generated intro/WhatFor/WhenToUse sections are suppressed to avoid duplicate headers. If the appended content already starts with a # Title, the exporter also skips the generated title to prevent duplication.
output: Controls export naming and how SKILL.md is generated.
The document body follows a consistent structure that makes skills easy to understand and apply. Each section serves a specific purpose in guiding the LLM's behavior.
# [Skill Name]
## Overview
[2-3 sentences: what this skill does and why it matters]
## When to Use
[Clear trigger conditions with examples]
## The Iron Law (or Core Principle)
[The non-negotiable rule this skill enforces]
## The Process
[Step-by-step workflow with concrete actions]
## Red Flags
[Common rationalizations and why they don't hold]
## Verification Checklist
[Items that must be checked before marking complete]
## Integration
[Other skills this requires or references]
## Examples
[Good vs bad examples showing the pattern]
Why this structure:
Skills need strong enforcement language to prevent LLMs from skipping steps. These techniques come directly from analyzing Superpowers' most effective skills.
<EXTREMELY_IMPORTANT> TagsWrap critical rules in XML-style tags that signal high importance:
<EXTREMELY_IMPORTANT>
If you write production code before writing a failing test, DELETE the code
and start over. No exceptions. Don't keep it as "reference."
</EXTREMELY_IMPORTANT>
Lead with a non-negotiable rule in a prominent code block:
## The Iron Law
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
Write code before the test? Delete it. Start over.
Anticipate common rationalizations and provide counter-arguments:
## Red Flags
These thoughts mean STOP—you're rationalizing:
| Thought | Reality |
|---------|---------|
| "This is too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after to verify it works" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
Create checklists that must be completed before marking work done:
## Verification Checklist
Before marking work complete:
- [ ] Every new function/method has a test
- [ ] Watched each test fail before implementing
- [ ] Each test failed for expected reason
- [ ] All tests pass
- [ ] Output pristine (no errors, warnings)
Important: When you include a checklist, note in the skill that the LLM should convert checklist items to docmgr tasks:
**Checklist discipline:** Convert each checklist item into a docmgr task:
- `docmgr task add --ticket <TICKET> --text "<checklist item>"`
Don't suggest—mandate:
Not all skills need the same enforcement level. Classify your skill to set expectations:
Rigid Skills (TDD, debugging, code review):
Flexible Skills (design patterns, architecture):
Include a note in your skill:
## Skill Type
**Rigid** — Follow this process exactly. Don't adapt away discipline.
(or)
**Flexible** — Adapt these principles to your specific context.
Here's a complete skill that demonstrates all the techniques we've covered. This example is adapted from Superpowers' TDD skill but using docmgr commands.
# skill.yaml
skill:
name: test-driven-development
title: Test-Driven Development
description: Enforces RED-GREEN-REFACTOR cycle for every function.
what_for: Ensure every function has a failing test before implementation.
when_to_use: Use when implementing features or refactoring behavior.
topics: [testing, tdd, quality]
sources:
- type: file
path: backend/testing/framework.md
output: references/testing-framework.md
output:
skill_dir_name: test-driven-development
# SKILL.md
# Skill: Test-Driven Development
## Overview
Write the test first. Watch it fail. Write minimal code to pass.
**Core principle:** If you didn't watch the test fail, you don't know if it tests the right thing.
**Violating the letter of the rules is violating the spirit of the rules.**
## When to Use
**Always:**
- New features
- Bug fixes
- Refactoring
- Behavior changes
**Exceptions (ask your human partner):**
- Throwaway prototypes
- Generated code
- Configuration files
Thinking "skip TDD just this once"? Stop. That's rationalization.
## The Iron Law
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
Write code before the test? Delete it. Start over.
**No exceptions:**
- Don't keep it as "reference"
- Don't "adapt" it while writing tests
- Don't look at it
- Delete means delete
Implement fresh from tests. Period.
## The Process: Red-Green-Refactor
### RED - Write Failing Test
Write one minimal test showing what should happen.
```go
func TestRetryFailedOperations(t *testing.T) {
attempts := 0
operation := func() error {
attempts++
if attempts < 3 {
return errors.New("fail")
}
return nil
}
err := RetryOperation(operation, 3)
assert.NoError(t, err)
assert.Equal(t, 3, attempts)
}
Requirements:
MANDATORY. Never skip.
go test ./backend/retry -v -run TestRetryFailedOperations
Confirm:
Test passes? You're testing existing behavior. Fix test.
Test errors? Fix error, re-run until it fails correctly.
Write simplest code to pass the test.
func RetryOperation(fn func() error, maxAttempts int) error {
for i := 0; i < maxAttempts; i++ {
err := fn()
if err == nil {
return nil
}
if i == maxAttempts-1 {
return err
}
}
return nil
}
Don't add features, refactor other code, or "improve" beyond the test.
MANDATORY.
go test ./backend/retry -v
Confirm:
Test fails? Fix code, not test.
After green only:
Keep tests green. Don't add behavior.
Next failing test for next feature.
These thoughts mean STOP — you're rationalizing:
| Thought | Reality |
|---|---|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
| "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
| "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
| "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
All of these mean: Delete code. Start over with TDD.
Before marking work complete:
Checklist discipline: Convert each item into a docmgr task:
docmgr task add --ticket <TICKET> --text "Every new function has a test"
docmgr task add --ticket <TICKET> --text "Watched each test fail first"
# ... etc for each item
Track progress:
docmgr task list --ticket <TICKET>
docmgr task check --ticket <TICKET> --id 1
Related skills:
systematic-debugging — For creating failing test case when fixing bugsverification-before-completion — Verify fix worked before claiming successRigid — Follow this process exactly. Don't adapt away discipline.
**Key elements in this example:**
1. **Clear frontmatter** with all required fields
2. **Strong opening** with core principle
3. **Iron Law** in prominent code block
4. **Step-by-step process** with concrete commands
5. **Red flags table** preventing rationalization
6. **Verification checklist** with docmgr task mapping
7. **Examples** showing good code patterns
8. **Integration** referencing other skills
---
## Writing Effective Trigger Conditions
The `WhenToUse` field is your skill's matching criteria. Write it so LLMs can reliably determine if the skill applies.
### Good Trigger Conditions
**Be explicit about circumstances:**
```yaml
WhenToUse: |
Use when implementing any feature or bugfix, before writing implementation
code. Also use when refactoring existing code or changing behavior.
Use action-oriented language:
WhenToUse: |
Use when encountering any bug, test failure, or unexpected behavior,
before proposing fixes.
Specify the timing:
WhenToUse: |
Use BEFORE any creative work - creating features, building components,
adding functionality, or modifying behavior. Activates before coding starts.
❌ Too vague:
WhenToUse: Use for quality improvements
❌ Too narrow (misses obvious cases):
WhenToUse: Use only when fixing P0 production bugs in the auth system
❌ Passive language:
WhenToUse: Can be helpful when thinking about architecture
After writing your trigger condition, test it against real scenarios:
Effective skills use specific language patterns borrowed from Superpowers that have proven to work with LLMs.
<EXTREMELY_IMPORTANT>
[Critical rule that cannot be skipped]
</EXTREMELY_IMPORTANT>
## The Iron Law
[CAPITALIZED IMPERATIVE STATEMENT]
[Consequence if violated]
## Red Flags
These thoughts mean STOP—you're rationalizing:
| Thought | Reality |
|---------|---------|
| "[common excuse]" | [why it doesn't hold] |
| "[another excuse]" | [counter-argument] |
## Verification Checklist
Before marking work complete:
- [ ] [Required check 1]
- [ ] [Required check 2]
Can't check all boxes? You skipped [workflow]. Start over.
Use imperative mood, not suggestions:
Problem: Weak language allows LLMs to skip steps.
❌ Bad:
It's a good idea to write tests first. This helps ensure quality.
✅ Good:
You MUST write tests first. Write code before the test? Delete it. Start over.
Problem: LLM convinces itself "this time is different."
❌ Bad:
## Process
1. Write test
2. Write code
✅ Good:
## Process
1. Write test
2. Write code
## Red Flags
- "Too simple to test" → Simple code breaks. Test takes 30 seconds.
- "I'll test after" → Tests passing immediately prove nothing.
Problem: Ambiguity leads to skipped verification.
❌ Bad:
1. Write test
2. Make sure it works
3. Write code
✅ Good:
1. RED: Write failing test
2. Verify RED: Run test, confirm it fails with expected message
3. GREEN: Write minimal code to pass
4. Verify GREEN: Run test, confirm it passes
Problem: "Done" becomes subjective without explicit completion criteria.
❌ Bad:
Follow these steps and you're done.
✅ Good:
## Verification Checklist
Before marking complete:
- [ ] Watched test fail first
- [ ] Test failed for expected reason
- [ ] All tests pass
Can't check all boxes? Start over.
Make your skills discoverable through multiple paths:
Choose topics that match how developers think:
Topics:
- testing # What domain?
- quality # What goal?
- backend # What layer?
Test discovery:
docmgr skill list --topics testing
docmgr skill list --topics quality,backend
Link to code files where the skill applies:
RelatedFiles:
- Path: backend/api/handlers.go
Note: Main API handlers that should follow TDD
- Path: backend/api/handlers_test.go
Note: Example tests showing TDD pattern
Test discovery:
docmgr skill list --file backend/api/handlers.go
docmgr skill list --dir backend/api/
Reference other skills explicitly:
## Integration
**Related skills:**
- `systematic-debugging` — For creating failing test when fixing bugs
- `code-review` — Review gates after implementation
**Required skills:**
- Use `brainstorming` before starting creative work
- Use `verification-before-completion` after claiming "done"
For skills that apply across all tickets:
ttmp/skills/
├── test-driven-development/skill.yaml
├── systematic-debugging/skill.yaml
├── code-review/skill.yaml
└── brainstorming/skill.yaml
Frontmatter: Omit Ticket field or use a generic ticket like 000-WORKSPACE-SKILLS
When to use: Process skills, quality standards, team-wide workflows
For skills specific to a feature or domain:
ttmp/YYYY/MM/DD/TICKET--slug/skills/
├── auth-implementation/skill.yaml
├── websocket-testing/skill.yaml
└── frontend-component-patterns/skill.yaml
Frontmatter: Include Ticket field
When to use: Domain-specific patterns, experimental workflows, ticket-scoped processes
Note on convention: docmgr doc add --doc-type skill still creates DocType skill docs under the doc-type folder, but docmgr skill list/show operate on skill.yaml plans. Use the skills/ folders for plan-based skills.
DocType skill documents are still valid workflow docs, but they are no longer used by docmgr skill list/show. To migrate a DocType skill into a plan:
ttmp/skills/<skill-name>/skill.yaml (or <ticket>/skills/<skill-name>/skill.yaml).Title → skill.title, WhatFor → skill.what_for, WhenToUse → skill.when_to_use, and Topics → skill.topics.sources entries for any reference files the skill needs (or move the skill body into the exported SKILL.md during docmgr skill export).docmgr skill show <name> and export with docmgr skill export <name> --output-skill dist/<name>.skill (use --out-dir dist to keep the expanded skill directory).After writing a skill, test it before sharing with your team:
# Can it be found by topic?
docmgr skill list --topics testing
# Can it be found by related file?
docmgr skill list --file backend/api/handlers.go
# Can it be loaded?
docmgr skill show test-driven-development
Paste the skill content into an LLM session and ask it to apply the skill to a real task:
I need to implement a retry function for failed API calls.
Use the test-driven-development skill.
Watch for:
Review your skill's red flags table. For each entry, ask:
Add any rationalizations you've encountered.
Share the skill with teammates:
Skills evolve as you discover new failure modes and better practices.
Add to red flags when:
Update the process when:
Add examples when:
Use changelog entries to track skill changes:
docmgr changelog update --ticket <TICKET> \
--entry "Updated TDD skill: added 'Keep as reference' to red flags table" \
--file-note "ttmp/skills/test-driven-development.md:Updated red flags"
If the skill is workspace-level (not ticket-specific), relate it to a workspace documentation ticket.
When a skill becomes obsolete:
Status: archived in frontmatter> **DEPRECATED:** This skill is obsolete. Use `new-skill-name` instead.
Follow these conventions for consistency:
bash, go, ```yaml)- [ ] format for verification itemscode for literalsUse consistent header hierarchy:
## for major sections (Overview, Process, Red Flags)### for subsections (RED, GREEN, REFACTOR)#### for detailed breakdowns (rarely needed)This guide is adapted from Superpowers' skill-writing practices:
superpowers/skills/writing-skills/SKILL.mdsuperpowers/skills/test-driven-development/SKILL.mdsuperpowers/skills/using-superpowers/SKILL.mddocmgr help using-skills — LLM bootstrap prompt for skills usagedocmgr help how-to-use — General docmgr tutorialdocmgr help templates-and-guidelines — Document templates system001-ADD-CLAUDE-SKILLS — Skills feature implementation design002-ANALYZE-SUPERPOWERS — Analysis of Superpowers techniques003-CREATE-SKILL-PROMPTS — This documentation effortWhen creating a new skill, ensure it has:
skill.name, skill.description, skill.what_for, skill.when_to_use, skill.topics)WhenToUseskill list, skill show)After creating your skill:
Test discovery:
docmgr skill list --topics <your-topics>
docmgr skill show <your-skill-name>
Test with LLM: Paste into a real session and verify behavior
Document creation: Add changelog entry and relate the skill file:
docmgr changelog update --ticket <TICKET> \
--entry "Created new skill: <skill-name>" \
--file-note "<skill-path>.md:New skill for <purpose>"
Where <skill-path> is typically:
ttmp/skills/<skill-slug>ttmp/YYYY/MM/DD/<TICKET>--<slug>/skill/<skill-slug>Share with team: Get feedback on trigger conditions and enforcement
Iterate: Update based on real usage patterns