Codebase scan checklist (per view)

The probes of the codebase scan, in scan order (cheap, high-signal sources first). Each row: what to read → what to emit → what to ask the user when the code can't answer. (These are data — scan_probe blocks.)

ViewLook atEmitAsk if missing
overviewREADME, docs/, ADR folders, manifests (Cargo.toml / package.json / pyproject / go.mod), top-level directory nameswad summary draft, candidate systems/containers, repository (+ structure), existing decisions → adrWhich repositories are in scope, and is there prior architecture documentation anywhere?
build_deploy.github/workflows / .gitlab-ci.yml / Jenkinsfile, Dockerfiles, compose files, k8s manifests, IaC (terraform/, ansible/)pipeline (+ stages from jobs and needs-edges), environment candidates, infra_node candidates — ideal first extractorWho approves promotion between environments, and are there environments not represented in the repo?
systemsmain functions / entry points, service definitions, listen ports, Procfile / systemd units / serverless configscontainer per runnable unit (kind, technology), component per major moduleAre any of these deployed together as one unit, or is anything here dead code?
externalsconfig files, .env.example, SDK/client dependencies, outbound URLs and hostnames in codeexternalsystem stubs + relation edges (protocol inferred from the client library) — but classify first: anything that hosts/builds/ships the system (cloud, CI, CDN, git host, package registry) is an infranode, not an externalFor each external: vendor contract owner, support contact, and the security posture — never inferable from code.
domainmigrations/, ORM models, schema.sql, protobuf/GraphQL schemasdomainobject (+ fields, relations) and/or a codeitem :db_schema — write an extractor rather than hand-copyingWhich of these tables/types are core domain vs plumbing?
infrastructureIaC state, cloud config, DNS zones, deployment target names in CI, secret-store referencesenvironment, infranode (nested via parent), deploytarget rowsWhat infrastructure was provisioned outside the repo (consoles, ClickOps), and what is shared across environments?
personasauth/RBAC code (role enums, permission tables), issue templates, CODEOWNERS, on-call/rota docspersona per distinct role + accessgrant, racientry candidatesWho are the real user populations behind these roles, and how is access actually granted?
standardslinter/formatter configs, CONTRIBUTING.md, style docs, CI gate stepsstandard (+ rule rows with enforced_by from the CI step that runs it)Are there unwritten rules reviewers enforce that no tool checks?
sysadminrunbooks/, ops/ docs, Makefile/justfile targets, cron definitions, alerting configsop per procedure (automation = the target/script that runs it)What do you actually do when X breaks — walk me through the last incident?
systemsroute tables, page/component directories in UI code, navigation configscreen per surface (route, nav_to from links); wireframes sketched from the real layoutWhich screens matter enough to document, and who uses each?
documentationdocs/ folders, static-site configs, README links and badges, publishing workflows (Pages / Netlify / Read the Docs)docitem per published artefact (source, builtby from the rendering container, hosted_on from the deploy workflow, url)Is documentation published anywhere outside the repo — wikis, hosted API docs, internal portals — and who maintains it?
overvieweverything still empty after the probes abovetargeted mini-interview using the interview question bank, restricted to the gaps; unanswered items become open questions in an article, never guessesWhy is the system the way it is? (back-fill the load-bearing decisions as ADRs)