Model card

Models

a11yequitas-gr v1.1.1 — all variants
ModelVersionSizeExam scoreTraining rows
a11yequitas-gr:3b1.1.1~2.1 GB96 / 100~1,890
a11yequitas-gr:rhel101.1.1~2.0 GB94 / 100~2,310
a11yequitas-gr:rhel91.1.1~2.0 GB98 / 100~2,030

Intended users

Content editors and accessibility coordinators at US local, state, and federal agencies and nonprofit organizations who must remediate axe-core violations in a CMS (WordPress, Drupal). The model is built for the non-developer who needs to know what to change in the page editor, what to leave alone, and when to route a problem to a developer.

The gr:rhel10 and gr:rhel9 variants additionally serve system administrators running accessibility tooling on Red Hat Enterprise Linux infrastructure.

Model specializations

  • gr:3b — WCAG 2.2 AA remediation core. CMS routing (WordPress, Drupal), axe-core violation explanation, editor-vs-developer routing. Use when infrastructure context is not needed.
  • gr:rhel10 — gr:3b capabilities plus RHEL 10 system administration. Uses DNF5, ansible.builtin.dnf5, and Podman 5 quadlet files. Use for RHEL 10 / Rocky Linux 10 deployments.
  • gr:rhel9 — gr:3b capabilities plus RHEL 9 system administration. Uses DNF (not yum or dnf5), ansible.builtin.dnf, and podman generate systemd. Use for RHEL 9 / Rocky Linux 9 deployments.

Scope (v1.0)

v1.0 focuses on WCAG 2.2 Level AA remediation of axe-core violations. AAA is opt-in. The training set covers 14 tiers: WCAG core, AA-vs-AAA distinction, ARIA patterns, axe-core rule mapping, plain-language explanation, CMS-vs-developer routing, Drupal admin UI, WordPress admin UI, Rocky Linux 10 bare-metal, Ubuntu bare-metal, web server config, screen-reader behavior, audit reporting, and companion-agent usage patterns.

Out of scope (v1.0)

  • WCAG 2.2 AAA beyond opt-in upgrade-path framing
  • Playwright automated testing — deferred to v1.1
  • Containers (Podman, Docker) — bare-metal + systemd only in v1.0; containers in v1.1
  • Rich-text editors (CKEditor 5, Gutenberg, classic editor) — deferred to v1.1
  • Long legal documents — the 3B parameter footprint limits multi-document legal reasoning; users should treat legal references as directional and verify against authoritative sources

Why IBM Granite 4.1 3B

  • Local-first deployment. 3B parameters fit on a laptop, a Raspberry Pi-class device, or a low-cost server. No data leaves the deploying organization.
  • Granite native chat format.Training used IBM’s native chat template — matching training format to deployment format prevents the silent accuracy loss that comes from ChatML mismatch.
  • Apache-2.0 base. Permissive licensing allows government and nonprofit deployment without legal friction.

Training pipeline

  • Teacher model: Qwen3-Coder-Next running locally on a Mac with an M5 Max chip. Draft answers were generated locally; no cloud APIs touched the training data.
  • Auto-filter: Every teacher-generated row passed through a heuristic gate that rejected wrong-audience routing (telling an editor to edit CSS), over-remediation, decorative-vs-informative image confusion, and answers that would fail the axe-core check instead of fixing it.
  • Final datasets: ~1,890 rows (gr:3b), ~2,310 rows (gr:rhel10), ~2,030 rows (gr:rhel9) — covering 50 axe-core violation types plus infrastructure-specific topics. Training datasets themselves are not publicly distributed.
  • Fine-tuning: LoRA (rank 32), learning rate 5×10⁻⁵, 3 epochs, sequence length 4,096, batch size 2 with 4-step gradient accumulation. Unsloth + Hugging Face Transformers on an NVIDIA RTX 3090 Ti.
  • Packaging: LoRA adapter merged back into base weights, exported to GGUF, quantized to Q4_K_M (~2.0–2.1 GB). Runs via Ollama on Apple Silicon at ~50 tokens/sec on an M5 Max.

Evaluation

Scored against a 50-question axe-core Fix-It Exam with an independent Qwen3-Coder judge to avoid same-family bias. Each answer must hit five parts (problem, CMS-fixable yes/no, fix, what-not-to-change, verification) and clear an automatic-failure check (decorative-vs-informative, developer-only routing, unsafe code, false-positive fixing).

v1.1.1 exam results — gate is 85 / 100
ModelScoreStatus
a11yequitas-gr:3b96 / 100Pass
a11yequitas-gr:rhel1094 / 100Pass
a11yequitas-gr:rhel998 / 100Pass

Companion agent (planned)

Tier 14 of the v1.0 dataset trains usage patterns for a11y-public-agent, a separate companion product. The agent will wrap a11yequitas-gr with retrieval-augmented generation against curated WCAG and axe-core source material to reduce hallucinated rule numbers. The agent is not part of the v1.0 model release.

Known limitations

  • The model may generate plausible-looking WCAG criterion numbers. Verify against w3.org/TR/WCAG22.
  • Statutes, compliance deadlines, and dollar amounts change. The model is instructed to point users to authoritative sources (ada.gov, 28 CFR Part 35) rather than quote figures from training.
  • Code examples are illustrative. Test in your actual stack before deploying.
  • Container-based deployment guidance (Podman, Docker), rich-text editor remediation (CKEditor 5, Gutenberg), and Playwright test generation are out of scope for v1.0 and queued for v1.1.

License

MIT + Transparency & Procurement Disclosure Addendum. Vendors deploying this model to government or nonprofit clients must disclose that it is open-source and free to run. Base model (IBM Granite 4.1 3B) is Apache-2.0; the NOTICE file carries attribution.

Contact

A11y Equitas