Skip to content

NB-4 — IRT vs BKT

Pitch question: “Why not IRT? Isn’t that the testing standard?”

IRTBKT
ModelsStudent ability + item difficultySkill state evolving over time
TimeSnapshot (single administration)Dynamics (sequence of tasks)
Latent variableθ\theta — scalar abilityLtL_t — mastery bit
Typical useSAT, GRE, diagnosticsAdaptive tutoring
Delivers“Ivan scored 78 on scale”“Ivan’s parentheses mastery P=0.42”
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize

Three-parameter logistic:

P(correctθ,a,b,c)=c+(1c)σ ⁣(a(θb))P(\text{correct} \mid \theta, a, b, c) = c + (1 - c) \cdot \sigma\!\left(a (\theta - b)\right)
  • θ\theta — ability;
  • aa — discrimination;
  • bb — difficulty inflection;
  • cc — guessing asymptote.
def irt_3pl(theta, a, b, c):
return c + (1 - c) / (1 + np.exp(-a * (theta - b)))
# trio типичных задач
fig, ax = plt.subplots(figsize=(8, 4))
xs = np.linspace(-3, 3, 200)
for a, b, c, label in [
(1.0, -1.0, 0.1, 'легкая (b=−1)'),
(1.5, 0.0, 0.15, 'средняя (b=0)'),
(1.0, 1.5, 0.05, 'сложная (b=+1.5)'),
]:
ax.plot(xs, irt_3pl(xs, a, b, c), label=label)
ax.set_xlabel('θ — способность'); ax.set_ylabel('P(correct)')
ax.legend(); ax.grid(alpha=0.3)
plt.title('IRT 3PL — кривые задач')
plt.show()

Student at θ=0\theta = 0:

  • easy item ~0.65 correct;
  • medium ~0.50;
  • hard ~0.13.
  • 30-minute test: ~20 items, single snapshot → precise θ\theta.
  • Standardization: comparable scores across cohorts.
  • Item calibration: pre-tag difficulties.
  1. No temporal updates. θ\theta is fixed during the form — ten correct responses don’t “teach” IRT; they just refine the same construct.
  2. No explicit skill vectors. Items differ by scalar bb, not bundles of micro-skills like “parentheses vs signs.”
  3. Learning blind. Practice-induced growth becomes “θ\theta drift.” BKT spells out “P(L)P(L) for parentheses rose 0.4→0.8.”
  4. Teacher opaque.θ=0.3\theta = 0.3” means little; “P(L)P(L) parentheses = 0.42, signs = 0.83” is actionable.

Simulate a learner improving via repeated success.

def bkt_update(pL, c, p={'pT':0.1,'pS':0.1,'pG':0.2}):
if c:
post = (pL*(1-p['pS'])) / (pL*(1-p['pS'])+(1-pL)*p['pG'])
else:
post = (pL*p['pS']) / (pL*p['pS']+(1-pL)*(1-p['pG']))
return post + (1-post)*p['pT']
def fit_irt_theta(answers, b=0.0, a=1.0, c=0.1):
"""МLE оценка theta при фиксированных параметрах задач."""
def neg_log_lik(theta):
p = irt_3pl(theta, a, b, c)
return -sum(np.log(p if x else 1-p) for x in answers)
res = minimize(neg_log_lik, x0=0.0, bounds=[(-4, 4)])
return res.x[0]
answers = [0, 1, 1, 0, 1, 1, 1, 1, 1, 1] # подтянулся
# BKT — состояние P(L) на каждом шаге
pL_traj = [0.2]
pL = 0.2
for a in answers:
pL = bkt_update(pL, a)
pL_traj.append(pL)
# IRT — оценка theta после 1, 2, ... наблюдений
theta_traj = []
for k in range(1, len(answers)+1):
theta_traj.append(fit_irt_theta(answers[:k]))
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(11, 4))
ax1.plot(pL_traj, marker='o', color='#9333ea', linewidth=2)
ax1.set_title('BKT — P(L) растёт во времени')
ax1.set_xlabel('шаг'); ax1.set_ylabel('P(L)')
ax1.grid(alpha=0.3); ax1.set_ylim(0, 1)
ax2.plot(theta_traj, marker='s', color='#0ea5e9', linewidth=2)
ax2.set_title('IRT — θ-оценка плавает на одном уровне')
ax2.set_xlabel('после k наблюдений'); ax2.set_ylabel('θ MLE')
ax2.grid(alpha=0.3)
plt.tight_layout(); plt.show()

BKT climbs ~0.2→0.85 — learning visible. IRT’s θ\theta estimate wiggles around a plateau — same ability assumption.

Research systems (e.g., CMU Cognitive Tutor) sometimes hybridize:

  • IRT for onboarding diagnostics;
  • BKT for ongoing mastery tracking.

MVP overhead isn’t worth it — pInit=0.2 + BKT suffices.

“IRT snapshots exams; BKT tracks growth. We ship adaptive learning, not accreditation. Plus BKT speaks micro-skills — teachers see concrete gaps, not abstract ability scalars.”