Cohen's d — Measuring Effect Size by How Much It Actually Differs in Practice

til/social-sciences/psychology/cohens-d-effect-size

cohens-d-effect-size.mdupdated 2026-07-162061 words

ダブルクリックで英日反転

Social Sciences · Psychology

Cohen's d — Measuring Effect Size

EN

Cohen's d expresses the size of a difference between two groups in a dimensionless unit: the mean difference divided by the pooled standard deviation. It answers 'how large?' independently of the p-value.

Why p-value alone is not enough

Statistical significance ≠ practical significance.
Large samples can make trivial differences reach p < 0.05.
Cohen's d shows magnitude regardless of sample size.

Formula

d = (M₁ − M₂) ÷ SD_pooled
SD_pooled is the standard deviation pooled across both groups.

Conventional benchmarks (Cohen 1988)

d = 0.2 → small effect
d = 0.5 → medium effect
d = 0.8 → large effect
Thresholds are field-dependent; in LLM evaluation even 0.5 may be weak for a decision.

Practical example

Engine A mean = 8.0, Engine B mean = 5.0, SD = 3.0 → d = 1.0 (large).
Used in AI-search project to pre-calculate sample size for d = 0.5 at power 0.80.
All 12-metric engine comparisons are reported with Cohen's d (Phase B-2).

→ Always pair a p-value with Cohen's d — significance tells you 'is there a difference?', d tells you 'does it matter?'

社会科学 · 心理学

Cohen's d（コーエンのd）— 効果量（差の大きさ）を測る

JP

Cohen's d（＝2群の平均差をプールされた標準偏差で割った無次元の指標）は、p値とは独立して「差の大きさ」を示す効果量（effect size）の代表的な指標です。

p値だけでは不十分な理由

統計的有意性（p < 0.05）と実用的な意味のある差は別物。
サンプルサイズが大きければ、ごく小さな差でも有意になる。
Cohen's d はサンプルサイズに依存せず差の大きさを表す。

計算式

d = (M₁ − M₂) ÷ SD_pooled
SD_pooled は2群を合わせた（プールした）標準偏差。

慣習的な判断基準（Cohen 1988）

d = 0.2 → 小（small）
d = 0.5 → 中（medium）
d = 0.8 → 大（large）
基準は分野依存。LLM評価では d = 0.5 でも実装判断の根拠として弱いとされる。

具体例と実務での使い方

A平均8.0、B平均5.0、SD=3.0 → d = 1.0（大）＝Aが明確に多い。
AIサーチ評価プロジェクトで、d=0.5を検出できるサンプルサイズを事前計算（検出力0.80）。
12指標すべての比較結果をCohen's dで報告（Phase B-2）。

→ p値と Cohen's d は必ずセットで報告する。p値は「差があるか」、d は「その差に意味があるか」を答える。

Social Sciences · Psychology

Cohen's d — Measuring Effect Size

EN

Cohen's d expresses the size of a difference between two groups in a dimensionless unit: the mean difference divided by the pooled standard deviation. It answers 'how large?' independently of the p-value.

Why p-value alone is not enough

Statistical significance ≠ practical significance.
Large samples can make trivial differences reach p < 0.05.
Cohen's d shows magnitude regardless of sample size.

Formula

d = (M₁ − M₂) ÷ SD_pooled
SD_pooled is the standard deviation pooled across both groups.

Conventional benchmarks (Cohen 1988)

d = 0.2 → small effect
d = 0.5 → medium effect
d = 0.8 → large effect
Thresholds are field-dependent; in LLM evaluation even 0.5 may be weak for a decision.

Practical example

Engine A mean = 8.0, Engine B mean = 5.0, SD = 3.0 → d = 1.0 (large).
Used in AI-search project to pre-calculate sample size for d = 0.5 at power 0.80.
All 12-metric engine comparisons are reported with Cohen's d (Phase B-2).

→ Always pair a p-value with Cohen's d — significance tells you 'is there a difference?', d tells you 'does it matter?'

社会科学 · 心理学

Cohen's d（コーエンのd）— 効果量（差の大きさ）を測る

JP

Cohen's d（＝2群の平均差をプールされた標準偏差で割った無次元の指標）は、p値とは独立して「差の大きさ」を示す効果量（effect size）の代表的な指標です。

p値だけでは不十分な理由

統計的有意性（p < 0.05）と実用的な意味のある差は別物。
サンプルサイズが大きければ、ごく小さな差でも有意になる。
Cohen's d はサンプルサイズに依存せず差の大きさを表す。

計算式

d = (M₁ − M₂) ÷ SD_pooled
SD_pooled は2群を合わせた（プールした）標準偏差。

慣習的な判断基準（Cohen 1988）

d = 0.2 → 小（small）
d = 0.5 → 中（medium）
d = 0.8 → 大（large）
基準は分野依存。LLM評価では d = 0.5 でも実装判断の根拠として弱いとされる。

具体例と実務での使い方

A平均8.0、B平均5.0、SD=3.0 → d = 1.0（大）＝Aが明確に多い。
AIサーチ評価プロジェクトで、d=0.5を検出できるサンプルサイズを事前計算（検出力0.80）。
12指標すべての比較結果をCohen's dで報告（Phase B-2）。

→ p値と Cohen's d は必ずセットで報告する。p値は「差があるか」、d は「その差に意味があるか」を答える。

Related notes

148 notestil

Cohen's d — Measuring Effect Size

Why p-value alone is not enough

Formula

Conventional benchmarks (Cohen 1988)

Practical example

Cohen's d​（コーエンの​d）​— 効果量​（差の​大きさ）を​測る

p値だけでは​不十分な​理由

計算式

慣習的な​判断基準​（Cohen 1988）

具体例と​実務での​使い方

Cohen's d — Measuring Effect Size

Why p-value alone is not enough

Formula

Conventional benchmarks (Cohen 1988)

Practical example

Cohen's d​（コーエンの​d）​— 効果量​（差の​大きさ）を​測る

p値だけでは​不十分な​理由

計算式

慣習的な​判断基準​（Cohen 1988）

具体例と​実務での​使い方

Related notes

Cohen's d（コーエンのd）— 効果量（差の大きさ）を測る

p値だけでは不十分な理由

慣習的な判断基準（Cohen 1988）

具体例と実務での使い方

Cohen's d（コーエンのd）— 効果量（差の大きさ）を測る

p値だけでは不十分な理由

慣習的な判断基準（Cohen 1988）

具体例と実務での使い方