SOX Plugin

A Claude Code plugin for SOX 404 internal-controls testing. Generates control matrices, sample selections, and testing workpapers — with evidence helpers, deterministic Python, an isolated grader, and a builder that turns finished workpapers into portable replay skills.

How It Works

The SOX Plugin packages six user-facing skills, an auto-loaded methodology reference, and three leaf subagents into a single Claude Code plugin focused on SOX 404 internal-controls testing. The main /sox-testing skill walks through control identification, sample sizing, sample selection, and workpaper generation for a control area and period — delegating deterministic procedures to /sox-python, image evidence to /sox-annotate-xlsx, and recorded walkthroughs to /sox-from-video.

Every workpaper is reviewably annotated: red-bordered narrative regions on top, original screenshots below with red-rectangle Excel shapes overlaying the specific attributes tested. Pixels are never burned in — reviewers can move or delete the shapes. Deterministic procedures capture full source code, output, runtime, exit code, and a SHA-256 hash for tamper-evidence. Once a control area is tested, /sox-replay-build packages the finished workpaper into a portable .skill that replays the same test next period without rewriting the deterministic core.

Privacy by design: All testing runs locally through Claude Code. Your control data, evidence screenshots, transcripts, and generated workpapers stay on your machine. The plugin works without any MCP server; Cowork Canvas is supported when present and degrades gracefully when not.

Key Features

End-to-end SOX 404 flow — control matrix, sample sizing, sample selection, workpaper generation, deficiency framework, and a finished-workpaper rubric grade in a single skill chain
Deterministic, reproducible procedures — sample draws, three-way matches, recomputations, and threshold checks run via /sox-python so the source code, output, runtime, exit code, and SHA-256 hash all land on a per-procedure tab
Evidence annotation that survives review — red-rectangle Excel shapes anchored over the original screenshots; reviewers can move, resize, or delete them without touching the underlying image
Walkthrough video support — mp4 + transcript turn into per-sample-per-test detail tabs with annotated frames; gaps in transcript coverage are surfaced explicitly
Bring-your-own template — /sox-from-template auto-detects the shape of a firm xlsx (per-sample tabs, master matrix, per-test tabs, single tab) and writes back into the user's bespoke layout
Portable next-period replays — /sox-replay-build packages a finished workpaper into a standalone .skill with deterministic scripts preserved verbatim and SHA-verified
Isolated grader for trustworthy verdicts — sox-workpaper-grader runs in a fresh context window with no exposure to the orchestrator's reasoning, scoring against a rubric of required and recommended criteria
Methodology in one place — the auto-loaded audit-support reference holds sample-size buckets, evidence sufficiency standards, deficiency classification, and the full control-type taxonomy; other skills consult it rather than duplicating

The Skills

Main Flow

/sox-testing — Plan and Execute a Control Area

Plans a control area's testing for a period: builds the control matrix, sizes and draws samples, scaffolds the workpaper, and coordinates evidence annotation and deterministic procedures. Calls /sox-python for any deterministic step, /sox-annotate-xlsx when evidence is xlsx with embedded screenshots, and /sox-from-video when evidence is a recorded auditor walkthrough. Dispatches the sox-workpaper-grader agent at the end for an independent verdict.

Evidence Helpers

/sox-python — Deterministic Procedures with Tamper-Evidence

For any procedure expressible as code — random sample draw, three-way match, reconciliation tie-out, threshold check, exception roll-up. Generates a Python script, runs it, captures stdout/stderr, and appends a per-sample-per-test detail tab with the full source code, output, runtime, exit code, and a SHA-256 of the script.

/sox-annotate-xlsx — Red-Rectangle Excel Annotations

Takes an xlsx whose sheets contain embedded evidence screenshots, identifies pixel bounding boxes for each requested attribute via Opus 4.7 vision, and writes back red-outline rectangle Excel shapes anchored over the original images. Pixels are never burned in; reviewers can move or delete the shapes.

/sox-from-video — Walkthrough Recordings as Evidence

Takes an mp4 walkthrough plus a transcript. Parses the transcript for moments where the auditor reviews each in-scope attribute, extracts the corresponding video frames, identifies bounding boxes via vision, and writes per-sample-per-test detail tabs with annotated frames. Self-contained — vendors its own opencv-based frame extractor.

Adaptation & Reuse

/sox-from-template — Adapt to Your Firm's Layout

Pre-processor that takes a firm-supplied xlsx template (typically with one sample completed as a guide), auto-detects its shape (per-sample tabs, master matrix, per-test tabs, or single tab), profiles every placeholder cell to a semantic field, and scaffolds copies of the exemplar for the rest of the population. Emits a template-profile.json that the evidence helpers consume via --template-profile — bypassing the canonical layout when the firm template is the right fit.

/sox-replay-build — Portable Next-Period Replays

Reads a completed workpaper and builds a portable .skill ZIP that replays the same test next period. Copies the deterministic scripts verbatim (SHA-verified for tamper-evidence), captures the locked attribute lists for evidence tests, and bakes methodology and the test plan into a generated SKILL.md. The builder writes no new Python — the deterministic core is preserved exactly as it ran the first time.

Reference

audit-support — SOX 404 Methodology (Auto-Loaded)

Auto-loaded reference skill (user-invocable: false). Holds sample-size buckets by risk level, the four sample-selection methods, evidence sufficiency standards, deficiency classification with indicators, deficiency aggregation rules, and the full control-type taxonomy — ITGC, manual, automated, IT-dependent manual, and entity-level. Other skills consult this one rather than duplicating methodology.

Agents

Three leaf subagents the orchestrator skills dispatch to keep large or sensitive inputs — image bytes, transcript text, finished-workpaper introspection — out of the orchestrator's context window. Each agent has a tightly scoped tool list, and image fan-out runs in parallel so a 30-image workpaper takes one round trip, not 30.

sox-evidence-boxer — Per-Image Bounding Boxes

Dispatched by /sox-annotate-xlsx and /sox-from-video once per image. Views a single screenshot or video frame, identifies pixel bounding boxes for each requested attribute, writes a boxes JSON, and returns a small summary. The orchestrator never sees the image bytes — it only collects per-agent JSON. Run in parallel: a single message with multiple Agent calls.

sox-walkthrough-parser — Transcript to Timestamps

Dispatched by /sox-from-video once per recording. Reads a transcript (.vtt, .srt, .txt, .docx), identifies the moments where the auditor reviewed each in-scope attribute, and emits a timestamps.json. Transcript text never returns to the orchestrator — only a small summary with timestamp count, samples covered, tests covered, and any gaps.

sox-workpaper-grader — Independent Rubric Verdict

Dispatched by /sox-testing after the workpaper is fully assembled. Reads the workpaper rubric, introspects the xlsx via openpyxl, scores each criterion (Required + Recommended), and writes a verdict JSON with overall: pass | fail plus any blocking failures. Runs in a fresh context window with no exposure to the orchestrator's reasoning — that isolation is what makes the verdict meaningful.

Workpaper Layout

By default, every per-sample tab the plugin produces follows the same two-region layout:

Top: a red-bordered narrative region with Observation, Procedures Performed, and Conclusion sections describing the test in prose. Every Summary row also carries a 1–3 sentence reasoning string per test.
Below: the original screenshots, with red-rectangle Excel shapes overlaying the specific attributes that were tested. Shapes are anchored, not burned in.

When /sox-from-template is run first, the canonical layout is replaced by the firm's template layout — narrative text, results, reasoning, and image annotations all land in the cells the template profile identifies, with the firm's branding, column structure, fonts, and merges preserved.

Getting Started

Install the plugin

Download the plugin below and install it in Claude Code — the .claude-plugin/plugin.json manifest sits at the root of the zip. The plugin registers six user-facing slash-command skills, an auto-loaded methodology reference, and three subagents the orchestrator dispatches automatically.

Run /sox-testing <area> <period>

Open any directory you want to use as a SOX testing workspace and run /sox-testing procure-to-pay 2024-Q4 (or your own control area and period). The skill walks through control identification, sample sizing, and sample selection — delegating to /sox-python so the random seed and selection logic are captured in a tamper-evident detail tab.

(Optional) Adapt to your template

If your firm has a bespoke xlsx layout, run /sox-from-template <template.xlsx> first. The skill profiles the template, scaffolds copies of the exemplar tab for the remaining samples, and emits a profile JSON the evidence helpers consume to write into your layout instead of the canonical one.

Annotate evidence

For screenshot evidence, run /sox-annotate-xlsx <evidence.xlsx> "Approver, Amount Threshold, Self-Approval Checkbox". For walkthrough recordings, run /sox-from-video <walkthrough.mp4> <transcript>. Both fan out to the boxer agent in parallel and write annotated detail tabs back to the workpaper.

Build a replay skill for next period

Once the workpaper is signed off, run /sox-replay-build <workpaper.xlsx> to package the deterministic scripts, locked attribute lists, methodology, and test plan into a portable .skill. Drop the .skill into next period's workspace and replay the same test without rewriting code.

Download Plugin

Open source under the MIT License. Free to use, modify, and distribute.