Ceiling Effect / Floor Effect

til/social-sciences/psychology/ceiling-floor-effect

ceiling-floor-effect.mdupdated 2026-07-162391 words

ダブルクリックで英日反転

Social Sciences · Psychology

EN

When every participant scores perfectly (ceiling) or fails completely (floor), the measuring instrument jams — differences between subjects vanish and the test item carries zero information.

Why it matters

Ceiling item: all respondents answer correctly → no differentiation possible.
Floor item: all respondents fail → same problem, opposite end.
Both scenarios make the item useless for ranking or comparing participants.

Detection rules of thumb

p ≈ 1.0 → ceiling effect (item too easy); flag for exclusion.
p ≈ 0.0 → floor effect (item too hard); flag for exclusion.
p = 0.4–0.6: maximum information — keep these as the main metric.
IRT (Item Response Theory) adds the discrimination parameter 'a' for finer filtering.

AI search evaluation context

Applies to accuracy scoring across query sets with many factual questions.
Items stuck at p ≈ 1 or p ≈ 0 must be flagged and excluded from aggregation.
Example (77 factual-static questions): 12 ceiling + 3 floor = 15 invalid → valid set 62.
Removing invalid items requires redoing the sample-size calculation.

Related concepts

Cohen's d: effect size measure — ceiling/floor collapse variance before d is even computable.
Nomological network: construct validity framework; difficulty design sits one level below it.

→ Always check the proportion correct before aggregating: items at p ≈ 0 or 1 add no signal and must be excluded.

社会科学 · 心理学

天井効果・床効果

JP

テスト項目で全員が正解（天井）または全員が不正解（床）になると、測定尺度が飽和して個人差を捉えられなくなる現象。その項目は比較情報をゼロしか持たない。

なぜ問題か

天井効果：全回答者が正解 → ランキング・比較に使えない。
床効果：全回答者が不正解 → 同様に弁別不能。
どちらもその項目の情報量がゼロになり、評価指標として機能しない。

検出の目安

正答率 p ≈ 1.0 → 天井効果（易しすぎ）：除外候補としてフラグを立てる。
p ≈ 0.0 → 床効果（難しすぎ）：同様に除外候補。
p = 0.4–0.6 の項目が情報量最大 → メイン指標として採用。
IRT（項目反応理論）では識別力パラメータ「a」も併用してさらに絞り込む。

AI検索評価への応用

多数の事実質問を含むクエリセットで正答率を分布で確認することが前提。
p ≈ 1 または p ≈ 0 の項目は集計から除外し、必要なら設問を再設計する。
例（77問の事実静的質問）：天井12問＋床3問＝15問無効 → 有効62問に縮小。
除外後は必要サンプルサイズの再計算が必要。

Ceiling Effect / Floor Effect

EN

When every participant scores perfectly (ceiling) or fails completely (floor), the measuring instrument jams — differences between subjects vanish and the test item carries zero information.

Why it matters

Ceiling item: all respondents answer correctly → no differentiation possible.
Floor item: all respondents fail → same problem, opposite end.
Both scenarios make the item useless for ranking or comparing participants.

Detection rules of thumb

p ≈ 1.0 → ceiling effect (item too easy); flag for exclusion.
p ≈ 0.0 → floor effect (item too hard); flag for exclusion.
p = 0.4–0.6: maximum information — keep these as the main metric.
IRT (Item Response Theory) adds the discrimination parameter 'a' for finer filtering.

AI search evaluation context

Applies to accuracy scoring across query sets with many factual questions.
Items stuck at p ≈ 1 or p ≈ 0 must be flagged and excluded from aggregation.
Example (77 factual-static questions): 12 ceiling + 3 floor = 15 invalid → valid set 62.
Removing invalid items requires redoing the sample-size calculation.

Related concepts

Cohen's d: effect size measure — ceiling/floor collapse variance before d is even computable.
Nomological network: construct validity framework; difficulty design sits one level below it.

→ Always check the proportion correct before aggregating: items at p ≈ 0 or 1 add no signal and must be excluded.

社会科学 · 心理学

天井効果・床効果

JP

テスト項目で全員が正解（天井）または全員が不正解（床）になると、測定尺度が飽和して個人差を捉えられなくなる現象。その項目は比較情報をゼロしか持たない。

なぜ問題か

天井効果：全回答者が正解 → ランキング・比較に使えない。
床効果：全回答者が不正解 → 同様に弁別不能。
どちらもその項目の情報量がゼロになり、評価指標として機能しない。

検出の目安

正答率 p ≈ 1.0 → 天井効果（易しすぎ）：除外候補としてフラグを立てる。
p ≈ 0.0 → 床効果（難しすぎ）：同様に除外候補。
p = 0.4–0.6 の項目が情報量最大 → メイン指標として採用。
IRT（項目反応理論）では識別力パラメータ「a」も併用してさらに絞り込む。

AI検索評価への応用

多数の事実質問を含むクエリセットで正答率を分布で確認することが前提。
p ≈ 1 または p ≈ 0 の項目は集計から除外し、必要なら設問を再設計する。
例（77問の事実静的質問）：天井12問＋床3問＝15問無効 → 有効62問に縮小。
除外後は必要サンプルサイズの再計算が必要。

Related notes

148 notestil

Ceiling Effect / Floor Effect

Why it matters

Detection rules of thumb

AI search evaluation context

Related concepts

天井効果・床効果

なぜ問題か

検出の目安

AI検索評価への応用

関連概念

Ceiling Effect / Floor Effect

Why it matters

Detection rules of thumb

AI search evaluation context

Related concepts

天井効果・床効果

なぜ問題か

検出の目安

AI検索評価への応用

関連概念

Related notes

Ceiling Effect / Floor Effect

Why it matters

Detection rules of thumb

AI search evaluation context

Related concepts

天井効果・床効果

なぜ問題か

検出の​目安

AI検索評価への​応用

関連概念

Ceiling Effect / Floor Effect

Why it matters

Detection rules of thumb

AI search evaluation context

Related concepts

天井効果・床効果

なぜ問題か

検出の​目安

AI検索評価への​応用

関連概念

Related notes

検出の目安

AI検索評価への応用

検出の目安

AI検索評価への応用