6.2 days
Avg speed to IRR
↓ 26% vs Q1 benchmark
1,240
Tasks completed / week
— at target (1,200)
94%
Task effectiveness rate
↑ 3 pts since rubric v2.1
2.1%
Arbitration rate
↓ below 3% target
TaskVolumeThroughputκ liveStatusETA
Synopsis quality
3,200
87%
0.74
On track
Apr 30
Search relevance
5,800
91%
0.71
On track
May 3
Rec explanation
2,100
72%
0.63
Blocked
May 10 ⚠
Metadata tagging
8,400
95%
0.82
On track
Apr 28
Safety review
1,100
84%
0.68
Monitoring
May 6
!
Rec explanation — κ stuck below 0.70
Three raters showing systematic leniency on the "helpfulness" dimension. κ has not cleared 0.70 after four calibration sessions. Delivery at risk.
View escalation plan ↗
~
Safety task — rubric gap on comedic violence
Raters splitting on items involving comedic violence in adult animated content. Rubric has no anchor example for this edge case. Causing ~30% of safety task arbitrations.
Draft anchor examples ↗
~
APAC vendor onboarding — localization behind
12 new raters expected May 1. Training materials not localized for Korean and Japanese cultural context. At current pace, cohort won't be calibrated until May 10.
Review mitigation ↗
Sprint delivery tracker — Q2 2025
MilestoneOwnerDueProgressStatus
Rubric v2.2 — edge case expansion
Comedy violence anchors + helpfulness examples
K. Lovelace
Apr 30
Draft → Review → Publish70%
In review
APAC cohort — 12 raters calibrated
Korean (8) + Japanese (4)
Vendor ops
May 5
0 of 12 calibrated10%
At risk
Rec explanation — achieve κ ≥ 0.70
Leniency bias intervention required
Eval ops
May 10
κ = 0.63 → target 0.7040%
Blocked
Metadata tagging — 8,400 tasks delivered
κ = 0.82 · On time
Eval ops
Apr 28
8,200 / 8,400 complete98%
On track
Onboarding protocol v2 — published
5-phase protocol + calibration debrief template
K. Lovelace
Apr 25
Published to all vendors100%
Complete
Weekly throughput — past 8 weeks
Weekly task throughput ranged from 980 to 1240, with steady improvement as the cohort scaled.