Models
The a11yequitas-gr family is a set of three fine-tuned models built on IBM Granite 4.1 3B. Each model shares the same WCAG 2.2 AA remediation core and adds specialized knowledge for a particular infrastructure stack.
a11yequitas-gr:3b
- Version: 1.1.1
- Released: June 2026
- Base: IBM Granite 4.1 3B (Apache-2.0)
- Size: ~2.1 GB quantized (Q4_K_M GGUF)
- Score: 96 / 100 on the axe-core Fix-It Exam (gate: 85)
- Training data: ~1,890 question-and-answer pairs (WCAG 2.2 AA)
Download: Ollama (:3b) · Hugging Face
What this model does
a11yequitas-gr:3b helps content editors fix web accessibility problems. When an accessibility scanner (like axe-core) finds a problem on a webpage, this model explains what is wrong, who it hurts, and how to fix it — without using technical jargon.
The model is designed for people who manage content in a CMS (like WordPress or Drupal), not developers. It knows the difference between problems an editor can fix themselves and problems that need a developer. This is the foundation model — WCAG AA remediation only, no infrastructure-specific guidance.
Limitations (gr:3b)
- Does not cover RHEL 9 or RHEL 10 system administration
- Does not cover Ansible playbooks for accessibility tooling
- WCAG 2.2 AAA guidance is voluntary opt-in only (not evaluated in the gate exam)
a11yequitas-gr:rhel10
- Version: 1.1.1
- Released: June 2026
- Base: IBM Granite 4.1 3B (Apache-2.0)
- Size: ~2.0 GB quantized (Q4_K_M GGUF)
- Score: 94 / 100 on the axe-core Fix-It Exam (gate: 85)
- Training data: ~2,310 question-and-answer pairs (WCAG 2.2 AA + RHEL 10 stack)
Download: Ollama (:rhel10) · Hugging Face
What this model does
a11yequitas-gr:rhel10 combines WCAG 2.2 AA remediation with Red Hat Enterprise Linux 10 system administration. It helps both content editors fixing accessibility violations and system administrators running accessibility tooling on RHEL 10 infrastructure.
For RHEL 10 tasks, the model uses DNF5 (not the legacy dnf4 or yum), ansible.builtin.dnf5 for Ansible playbooks, and Podman 5 quadlet files for container systemd integration — the toolchain native to RHEL 10.
Limitations (gr:rhel10)
- RHEL 10 only — do not use for RHEL 9 administration (different dnf / Podman toolchain)
- Bare-metal and systemd only in v1.0; container guidance beyond Podman 5 quadlets deferred to v1.1
- WCAG 2.2 AAA guidance is voluntary opt-in only
a11yequitas-gr:rhel9
- Version: 1.1.1
- Released: June 2026
- Base: IBM Granite 4.1 3B (Apache-2.0)
- Size: ~2.0 GB quantized (Q4_K_M GGUF)
- Score: 98 / 100 on the axe-core Fix-It Exam (gate: 85)
- Training data: ~2,030 question-and-answer pairs (WCAG 2.2 AA + RHEL 9 stack)
Download: Ollama (:rhel9) · Hugging Face
What this model does
a11yequitas-gr:rhel9 combines WCAG 2.2 AA remediation with Red Hat Enterprise Linux 9 system administration. It is the highest-scoring model in the family at 98 / 100 on the axe-core Fix-It Exam.
For RHEL 9 tasks, the model uses DNF (not yum or dnf5), ansible.builtin.dnf for Ansible playbooks, and podman generate systemd for container systemd integration — the toolchain native to RHEL 9.
Limitations (gr:rhel9)
- RHEL 9 only — do not use for RHEL 10 administration (different dnf5 / Podman 5 toolchain)
- Bare-metal and systemd only in v1.0; Podman and Docker container guidance deferred to v1.1
- WCAG 2.2 AAA guidance is voluntary opt-in only
How all three models were built
The base model
All three models start from IBM Granite 4.1 3B, a small open-source language model released under Apache 2.0. At 3 billion parameters, it runs on a laptop or a low-cost server without sending data to cloud services.
Training used IBM Granite’s native chat format — matching training format to deployment format prevents the silent accuracy loss that comes from ChatML mismatch.
Training data
Each question-and-answer pair presents a real axe-core violation snippet. The answer follows a five-part structure: explain the problem, say whether it can be fixed in the CMS, describe the fix, say what NOT to change, and explain how to verify the fix is done.
After the teacher wrote each draft, we filtered and reviewed the answers to remove:
- Wrong audience routing (telling an editor to edit CSS or raw HTML)
- Over-remediation (changing things that don’t need to change)
- Confusing decorative images with informative ones
- Answers that would fail the axe-core check instead of fixing it
Teacher model
We used Qwen3-Coder-Next as the teacher model to generate draft answers. The teacher ran locally on a Mac with an M5 Max chip. No cloud APIs touched the training data.
Fine-tuning
We used LoRA (Low-Rank Adaptation) — a technique that adds a small number of extra parameters (~30 million) and trains only those, instead of retraining all 3 billion. This makes training much faster and uses less memory.
- Hardware: NVIDIA RTX 3090 Ti GPU on a Rocky Linux server
- Framework: Unsloth + Hugging Face Transformers
- LoRA rank: 32
- Learning rate: 5×10⁻⁵ (cosine schedule)
- Epochs: 3
- Sequence length: up to 4,096 tokens per example
- Batch size: 2 examples at a time, 4 steps of gradient accumulation
Training took roughly 30–45 minutes per model on the RTX 3090 Ti.
Evaluation
Each model was tested on a 50-question axe-core Fix-It Exam. A separate AI model (Qwen3-Coder, a different model family) acts as judge to avoid same-family bias. The judge checks for automatic failures first (decorative-vs-informative image routing, developer-only fixes, unsafe code, false-positive fixing), then scores each part 0–2 points. Maximum is 100 points; gate is 85.
| Model | Score | Gate | Status |
|---|---|---|---|
| a11yequitas-gr:3b | 96 / 100 | 85 | Pass |
| a11yequitas-gr:rhel10 | 94 / 100 | 85 | Pass |
| a11yequitas-gr:rhel9 | 98 / 100 | 85 | Pass |
Packaging
After training, we merged the LoRA adapter back into the base model weights. We then converted the merged model to GGUF format and quantized it to Q4_K_M — a 4-bit compression format that reduces the model from about 6 GB to about 2 GB while keeping most of the accuracy.
The quantized GGUF runs via Ollama on Apple Silicon (Metal GPU acceleration), which gives about 50 tokens per second on an M5 Max.
What these models do not do
- Do not test AAA (highest-level) accessibility rules as a requirement — only WCAG 2.2 AA
- Do not interpret screen reader output or browser developer tools directly
- Do not know about Playwright automated testing (planned for v1.1)
- Do not cover rich-text editors like CKEditor 5 or Gutenberg (planned for v1.1)
Attribution
- Base model: IBM Granite 4.1 3B — Apache 2.0 license
- Accessibility rules: axe-core by Deque Systems — MPL 2.0 license
- Training framework: Unsloth — Apache 2.0 license
- WCAG source: W3C Web Content Accessibility Guidelines 2.2 (public specification)
- Violation text: Paraphrased from axe-core rule descriptions for IP cleanliness. Not Deque’s verbatim strings.
