Numerical example — Ivan and parentheses
Goal: track how for the skill “expand parentheses” moves for Ivan after each task.
Starting state
Section titled “Starting state”Ivan opens his first parentheses problem. He hasn’t seen this topic before.
- — prior (“probably doesn’t know”).
- , , — literature defaults.
Task 1 — correct
Section titled “Task 1 — correct”Posterior for correct:
Learning step:
0.20 → 0.58. Confidence nearly triples after one task — appropriate: we knew almost nothing, now we have a strong positive signal.
Task 2 — correct
Section titled “Task 2 — correct”Now .
0.58 → 0.87. Another correct answer — the model is nearly convinced.
Task 3 — incorrect
Section titled “Task 3 — incorrect”. Wrong-answer posterior:
0.87 → 0.49. A noticeable drop — not to zero. At high confidence an error reads partly as slip:
Was at 0.87 — could be slip. Down to 0.49; waiting for more data.
Task 4 — correct
Section titled “Task 4 — correct”.
0.49 → 0.83. Recovery.
Task 5 — incorrect (second mistake after rebound)
Section titled “Task 5 — incorrect (second mistake after rebound)”.
0.83 → 0.42. Down again.
Task 6 — incorrect (third mistake in six tries)
Section titled “Task 6 — incorrect (third mistake in six tries)”.
0.42 → 0.17. Three errors — the model confidently says the skill isn’t mastered; slip explains one miss, not three.
That’s “random slip vs real gap.”
All six steps in one table
Section titled “All six steps in one table”| Step | Answer | before | posterior | after |
|---|---|---|---|---|
| 0 | — | — | — | 0.200 |
| 1 | ✓ | 0.200 | 0.529 | 0.576 |
| 2 | ✓ | 0.576 | 0.859 | 0.873 |
| 3 | ✗ | 0.873 | 0.433 | 0.490 |
| 4 | ✓ | 0.490 | 0.812 | 0.831 |
| 5 | ✗ | 0.831 | 0.353 | 0.418 |
| 6 | ✗ | 0.418 | 0.074 | 0.166 |
Step chart
Section titled “Step chart”P(L) 1.0 ┤ 0.9 ┤ ● 0.8 ┤ 0.7 ┤ 0.6 ┤● ● 0.5 ┤ 0.4 ┤ ● ● 0.3 ┤ 0.2 ┤● ● 0.1 ┤ 0.0 ┴──┬──┬──┬──┬──┬──┬──┬───── 0 1 2 3 4 5 6 stepTakeaways
Section titled “Takeaways”- Runs of correct answers (1–2) lift confidence quickly.
- Single mistake at high confidence → partly slip — no meltdown.
- Runs of mistakes (5–6) pull estimates down decisively — slip can’t explain everything.
- Volatility around is expected — uncertainty is real.
In chapter 8 we’ll choose Ivan’s next task from this history — likely something simpler on the same skill to consolidate, not advance blindly.
Want to verify yourself?
Section titled “Want to verify yourself?”These numbers match web/lib/bkt.ts.
Full Python replay with plots — Notebook 1 — BKT from scratch.