library(epitools)
load('data/asthma.RData')
load('data/chd.RData')
load('data/smoking.RData')
Lab: Relative risk and relative odds
With solutions
The objective of this lab is to learn how to implement inference for relative risk and odds ratios using the epitools
package in R. Examples use datasets from lecture.
Relative risk
In many contexts (primarily clinical studies) researchers are interested in how much the likelihood of a particular outcome increases or decreases relative to a control or baseline, or more generally, between two groups. For this, it is common to estimate the relative risk:
\[ RR = \frac{Pr(\text{outcome}\;|\;\text{group 1})}{Pr(\text{outcome}\;|\;\text{group 2})} \]
If \(p_1\) denotes the outcome probability (or population proportion) in group 1, and \(p_2\) denotes the same in group 2, relative risk is estimated as the ratio of sample proportions:
\[ \hat{RR} = \frac{\hat{p}_1}{\hat{p}_2} \]
For example, consider the asthma data from the NHANES survey. The proportions of women and men with asthma are, respectively:
# sample proportions
table(asthma$sex, asthma$asthma) |> prop.table(margin = 1)
asthma no asthma
male 0.03754693 0.96245307
female 0.05903614 0.94096386
We might wish to estimate the relative risk of asthma among women compared with men. A point estimate is:
# point estimate of rr
0.05903/0.03754
[1] 1.572456
It is estimated that the risk of asthma among women is 1.57 times greater than among men.
Using the chd
data, compute a point estimate of the relative risk of coronary heart disease among smokers compared with nonsmokers.
# sample proportions
table(chd$smoking, chd$chd) |> prop.table(margin = 1)
CHD no CHD
nonsmoker 0.0174 0.9826
smoker 0.0280 0.9720
# point estimate of rr
0.0280/0.0174
[1] 1.609195
It is estimated that the risk of CHD is 1.61 times greater among smokers compared with nonsmokers.
For inference on the relative risk (i.e., a confidence interval), the epitools
package has a function riskratio(...)
that is structured identically to oddsratio(...)
in terms of inputs and outputs:
# inference for rr
table(asthma$sex, asthma$asthma) |>
riskratio(rev = 'columns')
$data
no asthma asthma Total
male 769 30 799
female 781 49 830
Total 1550 79 1629
$measure
risk ratio with 95% C.I.
estimate lower upper
male 1.000000 NA NA
female 1.572329 1.008738 2.450803
$p.value
two-sided
midp.exact fisher.exact chi.square
male NA NA NA
female 0.04412095 0.04961711 0.04354632
$correction
[1] FALSE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
With 95% confidence, the risk of asthma among women is estimated to be between 1.01 and 2.45 times greater than among men.
Construct a 95% confidence interval for the relative risk of CHD among smokers compared with nonsmokers.
# inference for rr
table(chd$smoking, chd$chd) |>
riskratio(rev = 'columns')
$data
no CHD CHD Total
nonsmoker 4913 87 5000
smoker 2916 84 3000
Total 7829 171 8000
$measure
risk ratio with 95% C.I.
estimate lower upper
nonsmoker 1.000000 NA NA
smoker 1.609195 1.196452 2.164325
$p.value
two-sided
midp.exact fisher.exact chi.square
nonsmoker NA NA NA
smoker 0.001799736 0.001800482 0.001505872
$correction
[1] FALSE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
With 95% confidence, the risk of CHD among smokers is estimated to be between 1.20 and 2.16 times greater than among nonsmokers.
The confidence level for the interval is easily changed by adding a conf.level = ...
argument. For example, below is a 99% interval:
# adjust confidence level
table(asthma$sex, asthma$asthma) |>
riskratio(rev = 'columns',
conf.level = 0.99)
$data
no asthma asthma Total
male 769 30 799
female 781 49 830
Total 1550 79 1629
$measure
risk ratio with 99% C.I.
estimate lower upper
male 1.000000 NA NA
female 1.572329 0.8774198 2.817602
$p.value
two-sided
midp.exact fisher.exact chi.square
male NA NA NA
female 0.04412095 0.04961711 0.04354632
$correction
[1] FALSE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
Construct a 99% confidence interval for the relative risk of CHD among smokers compared with nonsmokers.
# inference for rr
table(chd$smoking, chd$chd) |>
riskratio(rev = 'columns',
conf.level = 0.99)
$data
no CHD CHD Total
nonsmoker 4913 87 5000
smoker 2916 84 3000
Total 7829 171 8000
$measure
risk ratio with 99% C.I.
estimate lower upper
nonsmoker 1.000000 NA NA
smoker 1.609195 1.09006 2.375566
$p.value
two-sided
midp.exact fisher.exact chi.square
nonsmoker NA NA NA
smoker 0.001799736 0.001800482 0.001505872
$correction
[1] FALSE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
With 99% confidence, the risk of CHD among smokers is estimated to be between 1.09 and 2.37 times greater than among nonsmokers.
Odds ratios
Odds are the relative likelihood of an event; for instance, if the odds of winning a bet are 3, that means that you’re three times as likely to win as to lose, i.e., in terms of probabilities, \(\frac{Pr(\text{win})}{Pr(\text{lose})} = 3\).
An odds ratio is a multiplicative comparison of odds under two circumstances. For example, if the odds of developing cancer among smokers are 2 (twice as likely to get cancer as not), and the odds of developing cancer among nonsmokers are 0.5 (half as likely to get cancer as not), then the odds ratio is \(\frac{2}{0.5} = 4\). This would mean that the odds of developing cancer are four times higher among smokers compared with nonsmokers.
Odds ratios can be estimated directly from a contingency table. For example, the odds that a person is a smoker are about 5.4 times higher among cancer patients than among healthy individuals:
# contingency table
table(smoking$group, smoking$smoking)
Smokers NonSmokers
Cancer 83 3
Control 72 14
# odds ratio (cancer/control) of smoking
83/3)/(72/14) (
[1] 5.37963
Somehwat miraculously, the odds ratio computed along one orientation is the same as that computed along the opposite orientation. That is, if we had a random sample rather than a case-control study, the odds ratio of cancer among smokers compared with nonsmokers is:
# hypothetically, odds ratio (smokers/nonsmokers) of cancer
83/72)/(3/14) (
[1] 5.37963
This is exactly the same!
Using the asthma data…
- compute the odds ratio of asthma among women compared with men
- compute the odds ratio of being a woman among asthmatics compared with non-asthmatics
You should find that they are the same!
# contingency table
table(asthma$sex, asthma$asthma)
asthma no asthma
male 30 769
female 49 781
# odds ratio (women/men) of asthma
49/781)/(30/769) (
[1] 1.608237
# odds ratio (asthma/no asthma) of being a woman
49/30)/(781/769) (
[1] 1.608237
Both calculations yield an estimate of 1.608:
- the relative odds of asthma are estimated to be 1.61 times greater for women
- the relative odds that a randomly chosen person is female are 1.61 higher among those with asthma
If you invert the order of comparison or compute the odds of the complementary event, you will get different results. For the smoking data, here is an exhaustive list of all of the odds ratios we could compute:
# contingency table
table(smoking$group, smoking$smoking)
Smokers NonSmokers
Cancer 83 3
Control 72 14
# odds of cancer (smokers/nonsmokers)
83/72)/(3/14) (
[1] 5.37963
# odds of cancer (nonsmokers/smokers)
3/14)/(83/72) (
[1] 0.1858864
# odds of not getting cancer (nonsmokers/smokers)
14/3)/(72/83) (
[1] 5.37963
# odds of not getting cancer (smokers/nonsmokers)
72/83)/(14/3) (
[1] 0.1858864
# odds of smoking (cancer/control)
83/3)/(72/14) (
[1] 5.37963
# odds of smoking (control/cancer)
72/14)/(83/3) (
[1] 0.1858864
# odds of not smoking (control/cancer)
14/72)/(3/83) (
[1] 5.37963
# odds of not smoking (cancer/control)
3/83)/(14/72) (
[1] 0.1858864
You will notice that there are two algebraically unique odds ratios that are reciprocals of one another. However, there are six conceptually unique odds ratios!
The epitools
package has a function oddsratio(...)
which takes the two variables as input and returns estimated odds, a confidence interval, and a test of association:
# inference for odds ratio
table(smoking$smoking, smoking$group) |>
oddsratio(method = 'wald')
$data
Cancer Control Total
Smokers 83 72 155
NonSmokers 3 14 17
Total 86 86 172
$measure
odds ratio with 95% C.I.
estimate lower upper
Smokers 1.00000 NA NA
NonSmokers 5.37963 1.486376 19.47045
$p.value
two-sided
midp.exact fisher.exact chi.square
Smokers NA NA NA
NonSmokers 0.005116319 0.008822805 0.004948149
$correction
[1] FALSE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
By default it will produce the odds of the second outcome in the second predictor (group) compared with the first predictor (group). So the above estimates the odds of not having cancer among nonsmokers compared with smokers. The rev = ...
argument allows you to reverse either rows
, columns
, or both
.
To orient the table correctly, we need to change the order of both rows and columns.
# inference for odds ratio
table(smoking$smoking, smoking$group) |>
oddsratio(method = 'wald',
rev = 'both')
$data
Control Cancer Total
NonSmokers 14 3 17
Smokers 72 83 155
Total 86 86 172
$measure
odds ratio with 95% C.I.
estimate lower upper
NonSmokers 1.00000 NA NA
Smokers 5.37963 1.486376 19.47045
$p.value
two-sided
midp.exact fisher.exact chi.square
NonSmokers NA NA NA
Smokers 0.005116319 0.008822805 0.004948149
$correction
[1] FALSE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
The result is interpreted as follows:
The odds of developing lung cancer are estimated to be between 1.49 and 19.47 times higher among smokers as compared with nonsmokers.
The result is the same as what we started with, but that’s just because we got lucky and the odds ratio we wanted happened to be algebraically identical to the one we got by default. There is no such guarantee in general.
Estimate the odds of asthma among women compared with men.
# inference for odds ratio
table(asthma$sex, asthma$asthma) |>
oddsratio(method = 'wald',
rev = 'columns')
$data
no asthma asthma Total
male 769 30 799
female 781 49 830
Total 1550 79 1629
$measure
odds ratio with 95% C.I.
estimate lower upper
male 1.000000 NA NA
female 1.608237 1.010044 2.560708
$p.value
two-sided
midp.exact fisher.exact chi.square
male NA NA NA
female 0.04412095 0.04961711 0.04354632
$correction
[1] FALSE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
The odds of asthma are estimated to be between 1.01 and 2.56 times higher among women as compared with men.
Lastly, the confidence level can be adjusted using the conf.level = ...
argument:
# adjust confidence level
table(asthma$sex, asthma$asthma) |>
oddsratio(method = 'wald',
rev = 'columns',
conf.level = 0.9)
$data
no asthma asthma Total
male 769 30 799
female 781 49 830
Total 1550 79 1629
$measure
odds ratio with 90% C.I.
estimate lower upper
male 1.000000 NA NA
female 1.608237 1.088474 2.376195
$p.value
two-sided
midp.exact fisher.exact chi.square
male NA NA NA
female 0.04412095 0.04961711 0.04354632
$correction
[1] FALSE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"