Extractor scripts
Generated Markdown for references/fact_extractor_anatomy.md.
Open book page Back to the skill graph
# Extractor scripts
Extractors are single-file Python scripts run with `uv run`: a `# /// script` inline-metadata header declares dependencies, so any source of truth — a database, workflow YAML, cargo metadata, an HTTP API — is one dependency line away. Python is the extractor language on purpose: agents write it well and the library ecosystem covers everything. Each script reads one source of truth and fully overwrites one WCL file under `data/generated/`.
```python
#!/usr/bin/env -S uv run
# /// script
# requires-python = ">=3.11"
# dependencies = [] # e.g. ["pyyaml>=6"]
# ///
"""Extract <what> from <source> into data/generated/<name>.wcl."""
from pathlib import Path
WAD_ROOT = Path(__file__).resolve().parents[1]
OUT = WAD_ROOT / "data" / "generated" / "<name>.wcl"
def wcl_str(s: str) -> str: # the ~5-line emit helper every script copies
out = s.replace("\\", "\\\\").replace('"', '\\"')
out = out.replace("\n", "\\n").replace("\t", "\\t").replace("\r", "\\r")
return f'"{out}"'
# 1. read the source of truth
# 2. map records onto WAD blocks (in-script tables for human naming)
# 3. render lines, sorted, no timestamps
OUT.write_text("\n".join(lines))
```
The eight rules that make extraction safe: uv single-file; **one script, one output file**; full overwrite, never append; generated banner first; deterministic output (sorted, no timestamps — unchanged sources are git-quiet); stable ids derived from source names; an empty result still writes the banner so imports never dangle; output passes `wcl check`. Exit non-zero on failure so `just extract` stops.
When the data doesn't fit the base blocks, declare a **typed extension block** in `schema/extensions.wcl` and have the extractor emit that, with a matching render in the book template — extraction isn't limited to the built-in families. The bundled `extractor_template.py` (skill scripts folder) is the skeleton ready to copy.
## Related
- [Generated vs hand-authored data](../references/concept_generated_vs_hand.md)
- [Write an extractor script](../references/process_writing_extractor.md)
[← Back to SKILL.md](../SKILL.md)