Knowledge as probability

Hidden vs observed

A classic mistake is trying to directly decide whether the student “knows” the topic. That’s impossible: skill lives in the head; we only see outcomes (correct / incorrect).

In BKT we give up on certainty and say:

Student state is a hidden variable. We don’t know it exactly. But we can estimate the probability that the student has mastered it.

That probability is the central quantity:

P(L) = \mathbb{P}\big(\text{student has mastered the skill}\big), \quad P(L) \in [0, 1]

L stands for learned. It’s just a number between 0 and 1.

$P(L) = 0.0$ — we’re confident they don’t know.
$P(L) = 0.5$ — “no idea.”
$P(L) = 1.0$ — confident they know.

In practice $P(L)$ is rarely exactly 0 or 1 — the model keeps doubt on purpose. That’s correct: one correct answer isn’t proof; one mistake isn’t a verdict.

Where the starting value comes from

When a student first meets a skill we set a prior $P(L_0)$ — our guess before any observations.

The literature default is $P(L_0) = 0.2$ :

“Probably doesn’t know yet, but might have heard something.”

If other skills look strong, you can raise it. If the topic is brand new, leave 0.2. This parameter can later be fit from real data (see Notebook 3 — EM fitting).

What the model does after each answer

In plain words:

Correct answer → $P(L)$ increases.
Incorrect answer → $P(L)$ decreases.
Not by “whole steps” — the shift depends on prior confidence and slip/guess parameters (see chapter 4).

The point: $P(L)$ moves smoothly. That saves the model from two common failures:

Panic on one mistake (“they know nothing!”).
Euphoria on one correct answer (“genius!”).

Graphically

Model confidence scale:

0 ────────●────────────────────────────────── 1
doesn't know  P(L)=0.2 (start)                   knows

After one correct task:
0 ─────────────────────●───────────────────── 1
                        P(L)=0.58

After two correct:
0 ─────────────────────────────────●───────── 1
                                    P(L)=0.87

The numbers come from real BKT updates with default parameters. Numerical example walks through the details.

One counter per micro-skill

Important nuance: we don’t store one $P(L)$ per student. We store a vector:

Ivan:
  expand_brackets:       0.42
  distributive_law:      0.81
  signs:                 0.66
  move_across_equals:    0.55
  ...

So each skill has its own trajectory. Ivan can be strong in arithmetic and weak on parentheses — and the model sees it.

Try it: $P(L)$ moves with answers

Below is a real BKT simulator. Press ✓ or ✗ and watch $P(L)$ climb on correct answers or fall on mistakes; $P(\text{solve})$ follows automatically.

BKT parameters

P(L₀)0.20priorP(T)0.10learningP(S)0.10slipP(G)0.20guess

Step

P(L)

0.200

P(solve)

0.340

ZPD?

✗ no

Notice:

$P(L)$ never hits exactly 0 or 1 — BKT always keeps residual uncertainty;
after a correct answer $P(L)$ jumps up, after an error it drops, but not symmetrically;
$P(\text{solve})$ is always “tighter” — between $P(G)$ and $1 - P(S)$ because it folds in guess and slip noise.

Why we store a vector of $P(L)$ per skill, and why one overall “math level” is a bad idea — next chapter.