Artifacts

Three focused proof points for Netflix: accountable evaluation operations, repeatable scoring design, and rater calibration at scale.

Role fit map

A fast way for reviewers to connect portfolio evidence to the core responsibilities in human evaluation and data operations.

Lead execution end-to-endIntake, scope, blockers, milestones, delivery status.
Evidence shownTask health, sprint tracker, active blockers, owner/due-date model.
Develop rubrics and guidelinesConsistent scoring protocols for subjective AI output quality.
Where to lookScoring Rubric
Evidence shownWeighted dimensions, threshold logic, arbitration trigger, safety hard block.
Own rater calibration and QAOnboarding, gold sets, IRR gates, ongoing quality monitoring.
Evidence shownFive-phase onboarding protocol, rolling κ, sentinel rate, drift remediation.