Model card

Models

a11yequitas-gr v1.1.1 — all variants
Model	Version	Size	Exam score	Training rows
a11yequitas-gr:3b	1.1.1	~2.1 GB	96 / 100	~1,890
a11yequitas-gr:rhel10	1.1.1	~2.0 GB	94 / 100	~2,310
a11yequitas-gr:rhel9	1.1.1	~2.0 GB	98 / 100	~2,030

Released: June 2026
Base: IBM Granite 4.1 3B Instruct (Apache-2.0)
Format: Q4_K_M GGUF
Distribution: Ollama · Hugging Face

Intended users

Content editors and accessibility coordinators at US local, state, and federal agencies and nonprofit organizations who must remediate axe-core violations in a CMS (WordPress, Drupal). The model is built for the non-developer who needs to know what to change in the page editor, what to leave alone, and when to route a problem to a developer.

The gr:rhel10 and gr:rhel9 variants additionally serve system administrators running accessibility tooling on Red Hat Enterprise Linux infrastructure.

Model specializations

gr:3b — WCAG 2.2 AA remediation core. CMS routing (WordPress, Drupal), axe-core violation explanation, editor-vs-developer routing. Use when infrastructure context is not needed.
gr:rhel10 — gr:3b capabilities plus RHEL 10 system administration. Uses DNF5, ansible.builtin.dnf5, and Podman 5 quadlet files. Use for RHEL 10 / Rocky Linux 10 deployments.
gr:rhel9 — gr:3b capabilities plus RHEL 9 system administration. Uses DNF (not yum or dnf5), ansible.builtin.dnf, and podman generate systemd. Use for RHEL 9 / Rocky Linux 9 deployments.

Scope (v1.0)

v1.0 focuses on WCAG 2.2 Level AA remediation of axe-core violations. AAA is opt-in. The training set covers 14 tiers: WCAG core, AA-vs-AAA distinction, ARIA patterns, axe-core rule mapping, plain-language explanation, CMS-vs-developer routing, Drupal admin UI, WordPress admin UI, Rocky Linux 10 bare-metal, Ubuntu bare-metal, web server config, screen-reader behavior, audit reporting, and companion-agent usage patterns.

Out of scope (v1.0)

WCAG 2.2 AAA beyond opt-in upgrade-path framing
Playwright automated testing — deferred to v1.1
Containers (Podman, Docker) — bare-metal + systemd only in v1.0; containers in v1.1
Rich-text editors (CKEditor 5, Gutenberg, classic editor) — deferred to v1.1
Long legal documents — the 3B parameter footprint limits multi-document legal reasoning; users should treat legal references as directional and verify against authoritative sources

Why IBM Granite 4.1 3B

Local-first deployment. 3B parameters fit on a laptop, a Raspberry Pi-class device, or a low-cost server. No data leaves the deploying organization.
Granite native chat format.Training used IBM’s native chat template — matching training format to deployment format prevents the silent accuracy loss that comes from ChatML mismatch.
Apache-2.0 base. Permissive licensing allows government and nonprofit deployment without legal friction.

Training pipeline

Teacher model: Qwen3-Coder-Next running locally on a Mac with an M5 Max chip. Draft answers were generated locally; no cloud APIs touched the training data.
Auto-filter: Every teacher-generated row passed through a heuristic gate that rejected wrong-audience routing (telling an editor to edit CSS), over-remediation, decorative-vs-informative image confusion, and answers that would fail the axe-core check instead of fixing it.
Final datasets: ~1,890 rows (gr:3b), ~2,310 rows (gr:rhel10), ~2,030 rows (gr:rhel9) — covering 50 axe-core violation types plus infrastructure-specific topics. Training datasets themselves are not publicly distributed.
Fine-tuning: LoRA (rank 32), learning rate 5×10⁻⁵, 3 epochs, sequence length 4,096, batch size 2 with 4-step gradient accumulation. Unsloth + Hugging Face Transformers on an NVIDIA RTX 3090 Ti.
Packaging: LoRA adapter merged back into base weights, exported to GGUF, quantized to Q4_K_M (~2.0–2.1 GB). Runs via Ollama on Apple Silicon at ~50 tokens/sec on an M5 Max.

Evaluation

Scored against a 50-question axe-core Fix-It Exam with an independent Qwen3-Coder judge to avoid same-family bias. Each answer must hit five parts (problem, CMS-fixable yes/no, fix, what-not-to-change, verification) and clear an automatic-failure check (decorative-vs-informative, developer-only routing, unsafe code, false-positive fixing).

v1.1.1 exam results — gate is 85 / 100
Model	Score	Status
a11yequitas-gr:3b	96 / 100	Pass
a11yequitas-gr:rhel10	94 / 100	Pass
a11yequitas-gr:rhel9	98 / 100	Pass

Companion agent (planned)

Tier 14 of the v1.0 dataset trains usage patterns for a11y-public-agent, a separate companion product. The agent will wrap a11yequitas-gr with retrieval-augmented generation against curated WCAG and axe-core source material to reduce hallucinated rule numbers. The agent is not part of the v1.0 model release.

Known limitations

The model may generate plausible-looking WCAG criterion numbers. Verify against w3.org/TR/WCAG22.
Statutes, compliance deadlines, and dollar amounts change. The model is instructed to point users to authoritative sources (ada.gov, 28 CFR Part 35) rather than quote figures from training.
Code examples are illustrative. Test in your actual stack before deploying.
Container-based deployment guidance (Podman, Docker), rich-text editor remediation (CKEditor 5, Gutenberg), and Playwright test generation are out of scope for v1.0 and queued for v1.1.

License

MIT + Transparency & Procurement Disclosure Addendum. Vendors deploying this model to government or nonprofit clients must disclose that it is open-source and free to run. Base model (IBM Granite 4.1 3B) is Apache-2.0; the NOTICE file carries attribution.

Contact

A11y Equitas