NB-4 — IRT vs BKT
Pitch question: “Why not IRT? Isn’t that the testing standard?”
Headline contrast
Section titled “Headline contrast”| IRT | BKT | |
|---|---|---|
| Models | Student ability + item difficulty | Skill state evolving over time |
| Time | Snapshot (single administration) | Dynamics (sequence of tasks) |
| Latent variable | — scalar ability | — mastery bit |
| Typical use | SAT, GRE, diagnostics | Adaptive tutoring |
| Delivers | “Ivan scored 78 on scale” | “Ivan’s parentheses mastery P=0.42” |
import numpy as npimport matplotlib.pyplot as pltfrom scipy.optimize import minimizeIRT model (3PL)
Section titled “IRT model (3PL)”Three-parameter logistic:
- — ability;
- — discrimination;
- — difficulty inflection;
- — guessing asymptote.
def irt_3pl(theta, a, b, c): return c + (1 - c) / (1 + np.exp(-a * (theta - b)))
# trio типичных задачfig, ax = plt.subplots(figsize=(8, 4))xs = np.linspace(-3, 3, 200)for a, b, c, label in [ (1.0, -1.0, 0.1, 'легкая (b=−1)'), (1.5, 0.0, 0.15, 'средняя (b=0)'), (1.0, 1.5, 0.05, 'сложная (b=+1.5)'),]: ax.plot(xs, irt_3pl(xs, a, b, c), label=label)ax.set_xlabel('θ — способность'); ax.set_ylabel('P(correct)')ax.legend(); ax.grid(alpha=0.3)plt.title('IRT 3PL — кривые задач')plt.show()Student at :
- easy item ~0.65 correct;
- medium ~0.50;
- hard ~0.13.
Where IRT shines
Section titled “Where IRT shines”- 30-minute test: ~20 items, single snapshot → precise .
- Standardization: comparable scores across cohorts.
- Item calibration: pre-tag difficulties.
Where IRT struggles for adaptive tutoring
Section titled “Where IRT struggles for adaptive tutoring”- No temporal updates. is fixed during the form — ten correct responses don’t “teach” IRT; they just refine the same construct.
- No explicit skill vectors. Items differ by scalar , not bundles of micro-skills like “parentheses vs signs.”
- Learning blind. Practice-induced growth becomes “ drift.” BKT spells out “ for parentheses rose 0.4→0.8.”
- Teacher opaque. “” means little; “ parentheses = 0.42, signs = 0.83” is actionable.
Demo — why BKT matches dynamics
Section titled “Demo — why BKT matches dynamics”Simulate a learner improving via repeated success.
def bkt_update(pL, c, p={'pT':0.1,'pS':0.1,'pG':0.2}): if c: post = (pL*(1-p['pS'])) / (pL*(1-p['pS'])+(1-pL)*p['pG']) else: post = (pL*p['pS']) / (pL*p['pS']+(1-pL)*(1-p['pG'])) return post + (1-post)*p['pT']
def fit_irt_theta(answers, b=0.0, a=1.0, c=0.1): """МLE оценка theta при фиксированных параметрах задач.""" def neg_log_lik(theta): p = irt_3pl(theta, a, b, c) return -sum(np.log(p if x else 1-p) for x in answers) res = minimize(neg_log_lik, x0=0.0, bounds=[(-4, 4)]) return res.x[0]
answers = [0, 1, 1, 0, 1, 1, 1, 1, 1, 1] # подтянулся
# BKT — состояние P(L) на каждом шагеpL_traj = [0.2]pL = 0.2for a in answers: pL = bkt_update(pL, a) pL_traj.append(pL)
# IRT — оценка theta после 1, 2, ... наблюденийtheta_traj = []for k in range(1, len(answers)+1): theta_traj.append(fit_irt_theta(answers[:k]))
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(11, 4))ax1.plot(pL_traj, marker='o', color='#9333ea', linewidth=2)ax1.set_title('BKT — P(L) растёт во времени')ax1.set_xlabel('шаг'); ax1.set_ylabel('P(L)')ax1.grid(alpha=0.3); ax1.set_ylim(0, 1)
ax2.plot(theta_traj, marker='s', color='#0ea5e9', linewidth=2)ax2.set_title('IRT — θ-оценка плавает на одном уровне')ax2.set_xlabel('после k наблюдений'); ax2.set_ylabel('θ MLE')ax2.grid(alpha=0.3)plt.tight_layout(); plt.show()BKT climbs ~0.2→0.85 — learning visible. IRT’s estimate wiggles around a plateau — same ability assumption.
When to combine
Section titled “When to combine”Research systems (e.g., CMU Cognitive Tutor) sometimes hybridize:
- IRT for onboarding diagnostics;
- BKT for ongoing mastery tracking.
MVP overhead isn’t worth it — pInit=0.2 + BKT suffices.
Talking points: IRT vs BKT
Section titled “Talking points: IRT vs BKT”“IRT snapshots exams; BKT tracks growth. We ship adaptive learning, not accreditation. Plus BKT speaks micro-skills — teachers see concrete gaps, not abstract ability scalars.”
Related
Section titled “Related”- NB-3 EM fitting — recovering BKT parameters.
- Chapter 8 — P(solve) & ZPD — why temporal state matters for selection.