asthma | no asthma | |
---|---|---|
male | 30 | 769 |
female | 49 | 781 |
Inference for measures of association in 2x2 contingency tables
epitools
in RConsider estimating the difference in proportions:
asthma | no asthma | |
---|---|---|
male | 30 | 769 |
female | 49 | 781 |
2-sample test for equality of proportions with continuity correction
data: table(asthma$sex, asthma$asthma)
X-squared = 3.6217, df = 1, p-value = 0.05703
alternative hypothesis: two.sided
90 percent confidence interval:
-0.040137075 -0.002841347
sample estimates:
prop 1 prop 2
0.03754693 0.05903614
With 90% confidence, asthma prevalence is estimated to be between 0.28 and 4.01 percentage points higher among women than among men.
Is a difference of up to 4 percentage points practically meaningful? Well, it depends:
If \(p_F, p_M\) are the (population) proportions of women and men with asthma, then the relative risk of asthma among women compared with men is defined as:
\[ RR = \frac{p_F}{p_M} \qquad \left(\frac{\text{risk among women}}{\text{risk among men}}\right) \]
An estimate of the relative risk is simply the ratio of estimated proportions. For the asthma data, an estimate is: \[ \widehat{RR} = \frac{\hat{p}_F}{\hat{p}_M} = \frac{0.059}{0.038} = 1.57 \]
It is estimated that the risk of asthma among women is 1.57 times greater than among men.
A normal model can be used to approximate the sampling distribution of \(\log(RR)\) and construct a confidence interval. If \(\hat{p}_1\) and \(\hat{p}_2\) are the two estimated proportions:
\[\log\left(\widehat{RR}\right) \pm c \times SE\left(\log\left(\widehat{RR}\right)\right) \quad\text{where}\quad SE\left(\log\left(\widehat{RR}\right)\right) = \sqrt{\frac{1 - p_1}{p_1n_1} + \frac{1 - p_2}{p_2n_2}}\]
Exponentiate endpoints to obtain an interval for relative risk.
risk ratio with 90% C.I.
Predictor estimate lower upper
male 1.000000 NA NA
female 1.572329 1.083353 2.282007
With 90% confidence, the risk of asthma is estimated to be betwen 1.08 and 2.28 times greater for women than for men.
epitools
table(asthma$sex, asthma$asthma) |>
riskratio(rev = 'columns',
method = 'wald',
conf.level = 0.9,
correction = T)
$data
no asthma asthma Total
male 769 30 799
female 781 49 830
Total 1550 79 1629
$measure
risk ratio with 90% C.I.
estimate lower upper
male 1.000000 NA NA
female 1.572329 1.083353 2.282007
$p.value
two-sided
midp.exact fisher.exact chi.square
male NA NA NA
female 0.04412095 0.04961711 0.05703135
$correction
[1] TRUE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
riskratio
is picky about input format:
It will return the relative risk
\[
RR = \frac{n_{22}/n_2}{n_{12}/n_1}
\] The data table can be reoriented using rev
rev = neither
keeps original orientationrev = rows
reverses order of rowsrev = columns
reverses order of columnsrev = both
reverses both$data
no asthma asthma Total
male 769 30 799
female 781 49 830
Total 1550 79 1629
$measure
risk ratio with 90% C.I.
estimate lower upper
male 1.000000 NA NA
female 1.572329 1.083353 2.282007
$p.value
two-sided
midp.exact fisher.exact chi.square
male NA NA NA
female 0.04412095 0.04961711 0.05703135
$correction
[1] TRUE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
The data provide evidence of an association between asthma and sex (\(\chi^2\) = 3.62 on 1 degree of freedom, p = 0.057). With 90% confidence, the risk of asthma is estimated to be betwen 1.08 and 2.28 times greater for women than for men, with a point estimate of 1.57.
Conventional style:
In a randomized trial for a malaria vaccine, 20 individuals were randomly allocated to receive a dose of the vaccine or a placebo.
Vaccine trials often estimate relative reduction in risk or “efficacy”:
\[ \underbrace{\frac{\hat{p}_\text{ctrl} - \hat{p}_\text{trt}}{\hat{p}_\text{ctrl}}}_\text{efficacy} = 1 - RR \]
no infection | infection | |
---|---|---|
placebo | 0 | 6 |
vaccine | 9 | 5 |
# relative risk
rr.out <- table(malaria$treatment, malaria$outcome) |>
riskratio(method = 'wald', correction = T)
rr.out$measure
risk ratio with 95% C.I.
estimate lower upper
placebo 1.0000000 NA NA
vaccine 0.3571429 0.1768593 0.7212006
risk ratio with 95% C.I.
estimate lower upper
placebo 0.0000000 NA NA
vaccine 0.6428571 0.8231407 0.2787994
The vaccine reduces the risk of infection by an estimated 27.9% to 82.3%.
Smokers | NonSmokers | total | |
---|---|---|---|
Cancer | 83 | 3 | 86 |
Control | 72 | 14 | 86 |
Recall: not possible to estimate the case rate (cancer prevalence) due to the study design.
The data provide evidence of an association between smoking and lung cancer (\(\chi^2 = 6.53\) on 1 degree of freedom, \(p = 0.0106\)).
How do we measure the association, considering we can’t estimate case rates?
If \(p\) is the true cancer prevalence (a population proportion), then the odds of cancer are:
\[ \text{odds} = \frac{p}{1 - p} \]
The odds represent the relative likelihood of an outcome.
Let \(a, b, c, d\) denote population proportions.
\(\;\) | outcome 1 (O1) | outcome 2 (O2) |
---|---|---|
group 1 (G1) | a | b |
group 2 (G2) | c | d |
The odds of outcome 1 (O1) in each group are:
The odds ratio or “relative odds” is:
\[ \omega = \frac{\text{odds}_{G1}(O1)}{\text{odds}_{G2}(O1)} = \frac{\textcolor{red}{a}/\textcolor{blue}{b}}{\textcolor{orange}{c}/\textcolor{purple}{d}} = \frac{\textcolor{red}{a}\textcolor{purple}{d}}{\textcolor{blue}{b}\textcolor{orange}{c}} \]
A surprising algebraic fact is that:
\[ \frac{\text{odds}_{G1}(O1)}{\text{odds}_{G2}(O1)} =\frac{\text{odds}_{O1}(G1)}{\text{odds}_{O2}(G1)} \]
relative odds of cancer given smoking status = relative odds of smoking given cancer status
The estimate is the same calculation as on the previous slide, but with sample counts.
\(\;\) | Smoker (O1) | NonSmoker (O2) |
---|---|---|
Case (G1) | 83 | 3 |
Control (G2) | 72 | 14 |
Estimate of \(\omega\): \[ \hat{\omega} = \frac{\textcolor{red}{83}\times\textcolor{purple}{14}}{\textcolor{blue}{3}\times\textcolor{orange}{72}} = 5.38 \]
Interpretation:
It is estimated that the relative odds of lung cancer are 5.38 times greater for smokers compared with nonsmokers.
The sampling distribution of the log odds ratio can be approximated by a normal model.
\[ \log\left(\hat{\omega}\right) \pm c \times SE\left(\log\left(\hat{\omega}\right)\right) \quad\text{where}\quad SE\left(\log\left(\hat{\omega}\right)\right) = \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{12}} + \frac{1}{n_{21}} + \frac{1}{n_{22}}} \]
The oddsratio(...)
function in the epitools
package will compute and back-transform the interval for you.
odds ratio with 95% C.I.
estimate lower upper
Control 1.00000 NA NA
Cancer 5.37963 1.486376 19.47045
With 95% confidence, the relative odds of lung cancer are estimated to be between 1.49 and 19.47 times greater for smokers compared with nonsmokers.
epitools
table(smoking$group, smoking$smoking) |>
oddsratio(rev = 'both',
method = 'wald',
conf.level = 0.95,
correction = T)
$data
NonSmokers Smokers Total
Control 14 72 86
Cancer 3 83 86
Total 17 155 172
$measure
odds ratio with 95% C.I.
estimate lower upper
Control 1.00000 NA NA
Cancer 5.37963 1.486376 19.47045
$p.value
two-sided
midp.exact fisher.exact chi.square
Control NA NA NA
Cancer 0.005116319 0.008822805 0.01062183
$correction
[1] TRUE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
oddsratio
is picky about data inputs:
outcome of interest should be second column
group of interest should be second row
It will return the odds ratio
\[ \frac{\text{odds}_\text{R2} (\text{C2})}{\text{odds}_\text{R1}(\text{C2})} = \frac{n_{22}/n_{21}}{n_{12}/n_{11}} = \frac{n_{22} n_{11}}{n_{12}n_{21}} = \frac{da}{cb} \]
The data table can be reoriented using rev
rev = neither
keeps original orientationrev = rows
reverses order of rowsrev = columns
reverses order of columnsrev = both
reverses both$data
NonSmokers Smokers Total
Control 14 72 86
Cancer 3 83 86
Total 17 155 172
$measure
odds ratio with 95% C.I.
estimate lower upper
Control 1.00000 NA NA
Cancer 5.37963 1.486376 19.47045
$p.value
two-sided
midp.exact fisher.exact chi.square
Control NA NA NA
Cancer 0.005116319 0.008822805 0.01062183
$correction
[1] TRUE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
First report the test result, then the measure of association:
The data provide evidence of an association between smoking and lung cancer (\(\chi^2 = 6.53\) on 1 degree of freedom, \(p = 0.1062\)). With 95% confidence, the relative odds of cancer are estimated to be between 1.49 and 19.47 times greater among smokers compared with nonsmokers, with a point estimate of 5.38.
Be sure to include the point estimate, since the interval estimate is asymmetric.
If a study design employs outcome-based sampling, proportions are not estimable.
Otherwise, analysis may use any measure of association.
An outbreak of cyclosporiasis was detected among residents of New Jersey. In a case-control study, investigators found that 21 of 30 cases and 4 of 60 controls had eaten raspberries.
Outcome-based sampling means…
odds ratio with 95% C.I.
Predictor estimate lower upper
no raspberries 1.00000 NA NA
raspberries 32.66667 9.081425 117.5048
no raspberries | raspberries | |
---|---|---|
case | 9 | 21 |
control | 56 | 4 |
The data provide evidence of an association raspberry consumption and illness (\(\chi^2\) = 36.89 on 1 degree of freedom, p < 0.0001). With 95% confidence, the odds of illness are estimated to be between 9.08 and 117.5 times higher among those who consumed raspberries during the outbreak.
A cohort study of 3,000 smokers and 5,000 nonsmokers investigated the link between smoking and the development of coronary heart disease (CHD) over 1 year.
Two independent samples but not outcome-based sampling…
risk ratio with 95% C.I.
Predictor estimate lower upper
nonsmoker 1.000000 NA NA
smoker 1.609195 1.196452 2.164325
CHD | no CHD | |
---|---|---|
nonsmoker | 87 | 4913 |
smoker | 84 | 2916 |
The data provide evidence that smoking is associated with coronary heart disease (\(\chi^2\) = 9.5711 on 1 degree of freedom, p = 0.00198). With 95% confidence, the risk of CHD is estimated to be between 1.196 and 2.164 times greater among smokers compared with nonsmokers.
STAT218