Skip to content

Explainability — “why this task”

Rule #1 for teacher trust:

If I don’t understand why the system picks problem 147 for Ivan — I won’t rely on it.

Explainability matters for external reviewers too: without a crisp “why this task,” the pitch collapses into generic model talk. Explainability isn’t polish — it’s core product.

We don’t ask LLMs (Claude / GPT) to narrate numeric facts.

Models sometimes hallucinate:

  • wrong micro-skills named;
  • confuse P(L)P(L) with P(solve)P(\text{solve});
  • invent “reasons” unsupported by data.

When explanations must be exact, that fails. So facts assemble deterministically via templates.

Templates stitch sentences from BKT state with simple rules:

  1. Identify the task’s weakest skill — primary reason for recommendation.
  2. Identify the strongest skill — guarantees the student won’t drown.
  3. Surface PjointP_{\text{joint}} as a ZPD indicator.
  4. Mention rare-skill emphasis when rareSkillBonus applies.
Ülesanne T-147 Ivanile
Põhjus: kõige nõrgem mikrooskus on "sulgude avamine" (P=0.41).
Tugevaim — "aritmeetika märkidega" (P=0.82). Ülesanne
treenib just nõrka kohta, kuid ei jää aritmeetika peale
kinni. Lahenduse tõenäosus ≈ 0.55 — see on parajalt
keeruline.

English gloss:

“Weakest micro-skill — expanding parentheses (P=0.41P = 0.41). Strongest — arithmetic with signs (P=0.82P = 0.82). The exercise trains the weak spot without trapping them in arithmetic. P(solve)0.55P(\text{solve}) \approx 0.55 — appropriately challenging.”

In web/lib/explain.ts (stretch goal — stable baseline):

export function explainRecommendation(
scored: ScoredTask,
microskills: Record<MicroSkillId, MicroSkill>,
lang: 'et' | 'ru' | 'en' = 'et'
): string {
const sorted = Object.entries(scored.perSkillPL)
.sort(([, a], [, b]) => a - b);
const [weakest, weakP] = sorted[0];
const [strongest, strongP] = sorted[sorted.length - 1];
const targetSkill = microskills[weakest].title_et;
const supportSkill = microskills[strongest].title_et;
return T[lang]({
targetSkill,
weakP: weakP.toFixed(2),
supportSkill,
strongP: strongP.toFixed(2),
pSolve: scored.pSolve.toFixed(2),
});
}
const T = {
et: ({ targetSkill, weakP, supportSkill, strongP, pSolve }) =>
`Põhjus: nõrgim — "${targetSkill}" (P=${weakP}). Tugevaim — ` +
`"${supportSkill}" (P=${strongP}). Lahenduse tõenäosus ≈ ${pSolve}.`,
ru: ...,
en: ...,
};

You may pass the filled template through Claude only for tone:

You are a MATx assistant. Rewrite the following for a teacher in 1–2 friendly
Estonian sentences. Do not change numbers or skill names.
[template]

Guardrails:

  • numbers and skills remain fixed — Claude reads but shouldn’t alter facts;
  • hallucination risk stays low because facts are provided;
  • tone feels human, not database dump.

Hybrid recipe: facts from us, wording polish optional.

FieldSourcePurpose
Task nametask.ididentification
Top-2 skills with Pmastery vectortarget vs support
Student P(solve)P(\text{solve})scoreTaskForStudentZPD indicator
1–2 prose sentencesexplainRecommendationhuman-readable reason
Alternatives (top-3)recommend()[1..2]backup choices
  • Single-skill task → no “strongest contrast.” Template: “Target skill X (P=Y). P(solve)ZP(\text{solve}) \approx Z.”
  • All skills strong (P>0.85P > 0.85) → why recommend? Template admits reinforcement: “Skills mostly mastered — consolidation.”
  • All skills weak (P<0.3P < 0.3) → frustration zone. Template warns: “Risk of overload — consider easier alternate.”

“Explanations are generated deterministically from BKT state — digits stay trustworthy with zero LLM hallucination risk in math. Optional stylistic pass-through keeps tone friendly without touching facts.”

Next: not only “what’s wrong” but “how to practice”

Section titled “Next: not only “what’s wrong” but “how to practice””

Today the template answers “why this task” and “where the student is weakest.” The next step — add one more line: how to practice. Not only “Ivan’s P(brackets)=0.41P(\text{brackets}) = 0.41,” but also “try this: five expansion drills, then one with a minus in front of the parentheses.” The teacher screen turns from a diagnosis into a diagnosis + prescription, on the same page.

When several hints exist for the same mistake, it’s useful to know which of them helps. Plain version: after hint A the student solved the next task 60% of the time; after hint B — 75%. So B is better — show it more often. No formulas, just a counter of “how often it helped.” In the larger product this becomes automatic selection of the best hint.