Skip to content

Knowledge as probability

A classic mistake is trying to directly decide whether the student “knows” the topic. That’s impossible: skill lives in the head; we only see outcomes (correct / incorrect).

In BKT we give up on certainty and say:

Student state is a hidden variable. We don’t know it exactly. But we can estimate the probability that the student has mastered it.

That probability is the central quantity:

P(L)=P(student has mastered the skill),P(L)[0,1]P(L) = \mathbb{P}\big(\text{student has mastered the skill}\big), \quad P(L) \in [0, 1]

L stands for learned. It’s just a number between 0 and 1.

  • P(L)=0.0P(L) = 0.0 — we’re confident they don’t know.
  • P(L)=0.5P(L) = 0.5 — “no idea.”
  • P(L)=1.0P(L) = 1.0 — confident they know.

In practice P(L)P(L) is rarely exactly 0 or 1 — the model keeps doubt on purpose. That’s correct: one correct answer isn’t proof; one mistake isn’t a verdict.

When a student first meets a skill we set a prior P(L0)P(L_0) — our guess before any observations.

The literature default is P(L0)=0.2P(L_0) = 0.2:

“Probably doesn’t know yet, but might have heard something.”

If other skills look strong, you can raise it. If the topic is brand new, leave 0.2. This parameter can later be fit from real data (see Notebook 3 — EM fitting).

In plain words:

  1. Correct answerP(L)P(L) increases.
  2. Incorrect answerP(L)P(L) decreases.
  3. Not by “whole steps” — the shift depends on prior confidence and slip/guess parameters (see chapter 4).

The point: P(L)P(L) moves smoothly. That saves the model from two common failures:

  • Panic on one mistake (“they know nothing!”).
  • Euphoria on one correct answer (“genius!”).
Model confidence scale:
0 ────────●────────────────────────────────── 1
doesn't know P(L)=0.2 (start) knows
After one correct task:
0 ─────────────────────●───────────────────── 1
P(L)=0.58
After two correct:
0 ─────────────────────────────────●───────── 1
P(L)=0.87

The numbers come from real BKT updates with default parameters. Numerical example walks through the details.

Important nuance: we don’t store one P(L)P(L) per student. We store a vector:

Ivan:
expand_brackets: 0.42
distributive_law: 0.81
signs: 0.66
move_across_equals: 0.55
...

So each skill has its own trajectory. Ivan can be strong in arithmetic and weak on parentheses — and the model sees it.

Below is a real BKT simulator. Press ✓ or ✗ and watch P(L)P(L) climb on correct answers or fall on mistakes; P(solve)P(\text{solve}) follows automatically.

BKT parameters
Step
0
P(L)
0.200
P(solve)
0.340
ZPD?
✗ no

Notice:

  • P(L)P(L) never hits exactly 0 or 1 — BKT always keeps residual uncertainty;
  • after a correct answer P(L)P(L) jumps up, after an error it drops, but not symmetrically;
  • P(solve)P(\text{solve}) is always “tighter” — between P(G)P(G) and 1P(S)1 - P(S) because it folds in guess and slip noise.

Why we store a vector of P(L)P(L) per skill, and why one overall “math level” is a bad idea — next chapter.