Models

The a11yequitas-gr family is a set of three fine-tuned models built on IBM Granite 4.1 3B. Each model shares the same WCAG 2.2 AA remediation core and adds specialized knowledge for a particular infrastructure stack.

a11yequitas-gr:3b

Version: 1.1.1
Released: June 2026
Base: IBM Granite 4.1 3B (Apache-2.0)
Size: ~2.1 GB quantized (Q4_K_M GGUF)
Score: 96 / 100 on the axe-core Fix-It Exam (gate: 85)
Training data: ~1,890 question-and-answer pairs (WCAG 2.2 AA)

Download: Ollama (:3b) · Hugging Face

What this model does

a11yequitas-gr:3b helps content editors fix web accessibility problems. When an accessibility scanner (like axe-core) finds a problem on a webpage, this model explains what is wrong, who it hurts, and how to fix it — without using technical jargon.

The model is designed for people who manage content in a CMS (like WordPress or Drupal), not developers. It knows the difference between problems an editor can fix themselves and problems that need a developer. This is the foundation model — WCAG AA remediation only, no infrastructure-specific guidance.

Limitations (gr:3b)

Does not cover RHEL 9 or RHEL 10 system administration
Does not cover Ansible playbooks for accessibility tooling
WCAG 2.2 AAA guidance is voluntary opt-in only (not evaluated in the gate exam)

a11yequitas-gr:rhel10

Version: 1.1.1
Released: June 2026
Base: IBM Granite 4.1 3B (Apache-2.0)
Size: ~2.0 GB quantized (Q4_K_M GGUF)
Score: 94 / 100 on the axe-core Fix-It Exam (gate: 85)
Training data: ~2,310 question-and-answer pairs (WCAG 2.2 AA + RHEL 10 stack)

Download: Ollama (:rhel10) · Hugging Face

What this model does

a11yequitas-gr:rhel10 combines WCAG 2.2 AA remediation with Red Hat Enterprise Linux 10 system administration. It helps both content editors fixing accessibility violations and system administrators running accessibility tooling on RHEL 10 infrastructure.

For RHEL 10 tasks, the model uses DNF5 (not the legacy dnf4 or yum), ansible.builtin.dnf5 for Ansible playbooks, and Podman 5 quadlet files for container systemd integration — the toolchain native to RHEL 10.

Limitations (gr:rhel10)

RHEL 10 only — do not use for RHEL 9 administration (different dnf / Podman toolchain)
Bare-metal and systemd only in v1.0; container guidance beyond Podman 5 quadlets deferred to v1.1
WCAG 2.2 AAA guidance is voluntary opt-in only

a11yequitas-gr:rhel9

Version: 1.1.1
Released: June 2026
Base: IBM Granite 4.1 3B (Apache-2.0)
Size: ~2.0 GB quantized (Q4_K_M GGUF)
Score: 98 / 100 on the axe-core Fix-It Exam (gate: 85)
Training data: ~2,030 question-and-answer pairs (WCAG 2.2 AA + RHEL 9 stack)

Download: Ollama (:rhel9) · Hugging Face

What this model does

a11yequitas-gr:rhel9 combines WCAG 2.2 AA remediation with Red Hat Enterprise Linux 9 system administration. It is the highest-scoring model in the family at 98 / 100 on the axe-core Fix-It Exam.

For RHEL 9 tasks, the model uses DNF (not yum or dnf5), ansible.builtin.dnf for Ansible playbooks, and podman generate systemd for container systemd integration — the toolchain native to RHEL 9.

Limitations (gr:rhel9)

RHEL 9 only — do not use for RHEL 10 administration (different dnf5 / Podman 5 toolchain)
Bare-metal and systemd only in v1.0; Podman and Docker container guidance deferred to v1.1
WCAG 2.2 AAA guidance is voluntary opt-in only

How all three models were built

The base model

All three models start from IBM Granite 4.1 3B, a small open-source language model released under Apache 2.0. At 3 billion parameters, it runs on a laptop or a low-cost server without sending data to cloud services.

Training used IBM Granite’s native chat format — matching training format to deployment format prevents the silent accuracy loss that comes from ChatML mismatch.

Training data

Each question-and-answer pair presents a real axe-core violation snippet. The answer follows a five-part structure: explain the problem, say whether it can be fixed in the CMS, describe the fix, say what NOT to change, and explain how to verify the fix is done.

After the teacher wrote each draft, we filtered and reviewed the answers to remove:

Wrong audience routing (telling an editor to edit CSS or raw HTML)
Over-remediation (changing things that don’t need to change)
Confusing decorative images with informative ones
Answers that would fail the axe-core check instead of fixing it

Teacher model

We used Qwen3-Coder-Next as the teacher model to generate draft answers. The teacher ran locally on a Mac with an M5 Max chip. No cloud APIs touched the training data.

Fine-tuning

We used LoRA (Low-Rank Adaptation) — a technique that adds a small number of extra parameters (~30 million) and trains only those, instead of retraining all 3 billion. This makes training much faster and uses less memory.

Hardware: NVIDIA RTX 3090 Ti GPU on a Rocky Linux server
Framework: Unsloth + Hugging Face Transformers
LoRA rank: 32
Learning rate: 5×10⁻⁵ (cosine schedule)
Epochs: 3
Sequence length: up to 4,096 tokens per example
Batch size: 2 examples at a time, 4 steps of gradient accumulation

Training took roughly 30–45 minutes per model on the RTX 3090 Ti.

Evaluation

Each model was tested on a 50-question axe-core Fix-It Exam. A separate AI model (Qwen3-Coder, a different model family) acts as judge to avoid same-family bias. The judge checks for automatic failures first (decorative-vs-informative image routing, developer-only fixes, unsafe code, false-positive fixing), then scores each part 0–2 points. Maximum is 100 points; gate is 85.

v1.1.1 exam results
Model	Score	Gate	Status
a11yequitas-gr:3b	96 / 100	85	Pass
a11yequitas-gr:rhel10	94 / 100	85	Pass
a11yequitas-gr:rhel9	98 / 100	85	Pass

Packaging

After training, we merged the LoRA adapter back into the base model weights. We then converted the merged model to GGUF format and quantized it to Q4_K_M — a 4-bit compression format that reduces the model from about 6 GB to about 2 GB while keeping most of the accuracy.

The quantized GGUF runs via Ollama on Apple Silicon (Metal GPU acceleration), which gives about 50 tokens per second on an M5 Max.

What these models do not do

Do not test AAA (highest-level) accessibility rules as a requirement — only WCAG 2.2 AA
Do not interpret screen reader output or browser developer tools directly
Do not know about Playwright automated testing (planned for v1.1)
Do not cover rich-text editors like CKEditor 5 or Gutenberg (planned for v1.1)

Attribution

Base model: IBM Granite 4.1 3B — Apache 2.0 license
Accessibility rules: axe-core by Deque Systems — MPL 2.0 license
Training framework: Unsloth — Apache 2.0 license
WCAG source: W3C Web Content Accessibility Guidelines 2.2 (public specification)
Violation text: Paraphrased from axe-core rule descriptions for IP cleanliness. Not Deque’s verbatim strings.