SOX Plugin
A Claude Code plugin for SOX 404 internal-controls testing. Generates control matrices, sample selections, and testing workpapers — with evidence helpers, deterministic Python, an isolated grader, and a builder that turns finished workpapers into portable replay skills.
How It Works
The SOX Plugin packages six user-facing skills, an auto-loaded methodology
reference, and three leaf subagents into a single Claude Code plugin
focused on SOX 404 internal-controls testing. The main
/sox-testing skill walks through control identification,
sample sizing, sample selection, and workpaper generation for a control
area and period — delegating deterministic procedures to
/sox-python, image evidence to
/sox-annotate-xlsx, and recorded walkthroughs to
/sox-from-video.
Every workpaper is reviewably annotated: red-bordered narrative regions
on top, original screenshots below with red-rectangle Excel shapes
overlaying the specific attributes tested. Pixels are never burned in
— reviewers can move or delete the shapes. Deterministic procedures
capture full source code, output, runtime, exit code, and a SHA-256 hash
for tamper-evidence. Once a control area is tested, /sox-replay-build
packages the finished workpaper into a portable .skill that
replays the same test next period without rewriting the deterministic core.
Key Features
- End-to-end SOX 404 flow — control matrix, sample sizing, sample selection, workpaper generation, deficiency framework, and a finished-workpaper rubric grade in a single skill chain
- Deterministic, reproducible procedures — sample draws, three-way matches, recomputations, and threshold checks run via
/sox-pythonso the source code, output, runtime, exit code, and SHA-256 hash all land on a per-procedure tab - Evidence annotation that survives review — red-rectangle Excel shapes anchored over the original screenshots; reviewers can move, resize, or delete them without touching the underlying image
- Walkthrough video support — mp4 + transcript turn into per-sample-per-test detail tabs with annotated frames; gaps in transcript coverage are surfaced explicitly
- Bring-your-own template —
/sox-from-templateauto-detects the shape of a firm xlsx (per-sample tabs, master matrix, per-test tabs, single tab) and writes back into the user's bespoke layout - Portable next-period replays —
/sox-replay-buildpackages a finished workpaper into a standalone.skillwith deterministic scripts preserved verbatim and SHA-verified - Isolated grader for trustworthy verdicts —
sox-workpaper-graderruns in a fresh context window with no exposure to the orchestrator's reasoning, scoring against a rubric of required and recommended criteria - Methodology in one place — the auto-loaded
audit-supportreference holds sample-size buckets, evidence sufficiency standards, deficiency classification, and the full control-type taxonomy; other skills consult it rather than duplicating
The Skills
Main Flow
Plans a control area's testing for a period: builds the control
matrix, sizes and draws samples, scaffolds the workpaper, and
coordinates evidence annotation and deterministic procedures. Calls
/sox-python for any deterministic step,
/sox-annotate-xlsx when evidence is xlsx with embedded
screenshots, and /sox-from-video when evidence is a
recorded auditor walkthrough. Dispatches the
sox-workpaper-grader agent at the end for an
independent verdict.
Evidence Helpers
For any procedure expressible as code — random sample draw, three-way match, reconciliation tie-out, threshold check, exception roll-up. Generates a Python script, runs it, captures stdout/stderr, and appends a per-sample-per-test detail tab with the full source code, output, runtime, exit code, and a SHA-256 of the script.
Takes an xlsx whose sheets contain embedded evidence screenshots, identifies pixel bounding boxes for each requested attribute via Opus 4.7 vision, and writes back red-outline rectangle Excel shapes anchored over the original images. Pixels are never burned in; reviewers can move or delete the shapes.
Takes an mp4 walkthrough plus a transcript. Parses the transcript for moments where the auditor reviews each in-scope attribute, extracts the corresponding video frames, identifies bounding boxes via vision, and writes per-sample-per-test detail tabs with annotated frames. Self-contained — vendors its own opencv-based frame extractor.
Adaptation & Reuse
Pre-processor that takes a firm-supplied xlsx template (typically
with one sample completed as a guide), auto-detects its shape
(per-sample tabs, master matrix, per-test tabs, or single tab),
profiles every placeholder cell to a semantic field, and scaffolds
copies of the exemplar for the rest of the population. Emits a
template-profile.json that the evidence helpers
consume via --template-profile — bypassing the
canonical layout when the firm template is the right fit.
Reads a completed workpaper and builds a portable .skill
ZIP that replays the same test next period. Copies the deterministic
scripts verbatim (SHA-verified for tamper-evidence), captures the
locked attribute lists for evidence tests, and bakes methodology
and the test plan into a generated SKILL.md. The builder
writes no new Python — the deterministic core is preserved
exactly as it ran the first time.
Reference
Auto-loaded reference skill (user-invocable: false).
Holds sample-size buckets by risk level, the four sample-selection
methods, evidence sufficiency standards, deficiency classification
with indicators, deficiency aggregation rules, and the full
control-type taxonomy — ITGC, manual, automated, IT-dependent
manual, and entity-level. Other skills consult this one rather
than duplicating methodology.
Agents
Three leaf subagents the orchestrator skills dispatch to keep large or sensitive inputs — image bytes, transcript text, finished-workpaper introspection — out of the orchestrator's context window. Each agent has a tightly scoped tool list, and image fan-out runs in parallel so a 30-image workpaper takes one round trip, not 30.
Dispatched by /sox-annotate-xlsx and
/sox-from-video once per image. Views a single
screenshot or video frame, identifies pixel bounding boxes for each
requested attribute, writes a boxes JSON, and returns a small
summary. The orchestrator never sees the image bytes — it
only collects per-agent JSON. Run in parallel: a single message
with multiple Agent calls.
Dispatched by /sox-from-video once per recording.
Reads a transcript (.vtt, .srt,
.txt, .docx), identifies the moments
where the auditor reviewed each in-scope attribute, and emits a
timestamps.json. Transcript text never returns to the
orchestrator — only a small summary with timestamp count,
samples covered, tests covered, and any gaps.
Dispatched by /sox-testing after the workpaper is
fully assembled. Reads the workpaper rubric, introspects the xlsx
via openpyxl, scores each criterion (Required + Recommended), and
writes a verdict JSON with overall: pass | fail plus
any blocking failures. Runs in a fresh context window with no
exposure to the orchestrator's reasoning — that isolation
is what makes the verdict meaningful.
Workpaper Layout
By default, every per-sample tab the plugin produces follows the same two-region layout:
- Top: a red-bordered narrative region with Observation, Procedures Performed, and Conclusion sections describing the test in prose. Every Summary row also carries a 1–3 sentence reasoning string per test.
- Below: the original screenshots, with red-rectangle Excel shapes overlaying the specific attributes that were tested. Shapes are anchored, not burned in.
When /sox-from-template is run first, the canonical layout
is replaced by the firm's template layout — narrative text, results,
reasoning, and image annotations all land in the cells the template
profile identifies, with the firm's branding, column structure, fonts,
and merges preserved.
Getting Started
Download the plugin below and install it in Claude Code — the
.claude-plugin/plugin.json manifest sits at the root
of the zip. The plugin registers six user-facing slash-command
skills, an auto-loaded methodology reference, and three subagents
the orchestrator dispatches automatically.
Open any directory you want to use as a SOX testing workspace and
run /sox-testing procure-to-pay 2024-Q4 (or your own
control area and period). The skill walks through control
identification, sample sizing, and sample selection —
delegating to /sox-python so the random seed and
selection logic are captured in a tamper-evident detail tab.
If your firm has a bespoke xlsx layout, run
/sox-from-template <template.xlsx> first. The
skill profiles the template, scaffolds copies of the exemplar tab
for the remaining samples, and emits a profile JSON the evidence
helpers consume to write into your layout instead of the canonical
one.
For screenshot evidence, run
/sox-annotate-xlsx <evidence.xlsx> "Approver, Amount Threshold, Self-Approval Checkbox".
For walkthrough recordings, run
/sox-from-video <walkthrough.mp4> <transcript>.
Both fan out to the boxer agent in parallel and write annotated
detail tabs back to the workpaper.
Once the workpaper is signed off, run
/sox-replay-build <workpaper.xlsx> to package the
deterministic scripts, locked attribute lists, methodology, and
test plan into a portable .skill. Drop the
.skill into next period's workspace and replay the
same test without rewriting code.