Lab: Relative risk and relative odds

With solutions

The objective of this lab is to learn how to implement inference for relative risk and odds ratios using the epitools package in R. Examples use datasets from lecture.

library(epitools)
load('data/asthma.RData')
load('data/chd.RData')
load('data/smoking.RData')

Relative risk

In many contexts (primarily clinical studies) researchers are interested in how much the likelihood of a particular outcome increases or decreases relative to a control or baseline, or more generally, between two groups. For this, it is common to estimate the relative risk:

\[ RR = \frac{Pr(\text{outcome}\;|\;\text{group 1})}{Pr(\text{outcome}\;|\;\text{group 2})} \]

If \(p_1\) denotes the outcome probability (or population proportion) in group 1, and \(p_2\) denotes the same in group 2, relative risk is estimated as the ratio of sample proportions:

\[ \hat{RR} = \frac{\hat{p}_1}{\hat{p}_2} \]

For example, consider the asthma data from the NHANES survey. The proportions of women and men with asthma are, respectively:

# sample proportions
table(asthma$sex, asthma$asthma) |> prop.table(margin = 1)
        
             asthma  no asthma
  male   0.03754693 0.96245307
  female 0.05903614 0.94096386

We might wish to estimate the relative risk of asthma among women compared with men. A point estimate is:

# point estimate of rr
0.05903/0.03754
[1] 1.572456

It is estimated that the risk of asthma among women is 1.57 times greater than among men.

Your turn

Using the chd data, compute a point estimate of the relative risk of coronary heart disease among smokers compared with nonsmokers.

# sample proportions
table(chd$smoking, chd$chd) |> prop.table(margin = 1)
           
               CHD no CHD
  nonsmoker 0.0174 0.9826
  smoker    0.0280 0.9720
# point estimate of rr
0.0280/0.0174
[1] 1.609195

It is estimated that the risk of CHD is 1.61 times greater among smokers compared with nonsmokers.

For inference on the relative risk (i.e., a confidence interval), the epitools package has a function riskratio(...) that is structured identically to oddsratio(...) in terms of inputs and outputs:

# inference for rr
table(asthma$sex, asthma$asthma) |>
  riskratio(rev = 'columns')
$data
        
         no asthma asthma Total
  male         769     30   799
  female       781     49   830
  Total       1550     79  1629

$measure
        risk ratio with 95% C.I.
         estimate    lower    upper
  male   1.000000       NA       NA
  female 1.572329 1.008738 2.450803

$p.value
        two-sided
         midp.exact fisher.exact chi.square
  male           NA           NA         NA
  female 0.04412095   0.04961711 0.04354632

$correction
[1] FALSE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

With 95% confidence, the risk of asthma among women is estimated to be between 1.01 and 2.45 times greater than among men.

Your turn

Construct a 95% confidence interval for the relative risk of CHD among smokers compared with nonsmokers.

# inference for rr
table(chd$smoking, chd$chd) |> 
  riskratio(rev = 'columns')
$data
           
            no CHD CHD Total
  nonsmoker   4913  87  5000
  smoker      2916  84  3000
  Total       7829 171  8000

$measure
           risk ratio with 95% C.I.
            estimate    lower    upper
  nonsmoker 1.000000       NA       NA
  smoker    1.609195 1.196452 2.164325

$p.value
           two-sided
             midp.exact fisher.exact  chi.square
  nonsmoker          NA           NA          NA
  smoker    0.001799736  0.001800482 0.001505872

$correction
[1] FALSE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

With 95% confidence, the risk of CHD among smokers is estimated to be between 1.20 and 2.16 times greater than among nonsmokers.

The confidence level for the interval is easily changed by adding a conf.level = ... argument. For example, below is a 99% interval:

# adjust confidence level
table(asthma$sex, asthma$asthma) |>
  riskratio(rev = 'columns',
            conf.level = 0.99)
$data
        
         no asthma asthma Total
  male         769     30   799
  female       781     49   830
  Total       1550     79  1629

$measure
        risk ratio with 99% C.I.
         estimate     lower    upper
  male   1.000000        NA       NA
  female 1.572329 0.8774198 2.817602

$p.value
        two-sided
         midp.exact fisher.exact chi.square
  male           NA           NA         NA
  female 0.04412095   0.04961711 0.04354632

$correction
[1] FALSE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
Your turn

Construct a 99% confidence interval for the relative risk of CHD among smokers compared with nonsmokers.

# inference for rr
table(chd$smoking, chd$chd) |> 
  riskratio(rev = 'columns',
            conf.level = 0.99)
$data
           
            no CHD CHD Total
  nonsmoker   4913  87  5000
  smoker      2916  84  3000
  Total       7829 171  8000

$measure
           risk ratio with 99% C.I.
            estimate   lower    upper
  nonsmoker 1.000000      NA       NA
  smoker    1.609195 1.09006 2.375566

$p.value
           two-sided
             midp.exact fisher.exact  chi.square
  nonsmoker          NA           NA          NA
  smoker    0.001799736  0.001800482 0.001505872

$correction
[1] FALSE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

With 99% confidence, the risk of CHD among smokers is estimated to be between 1.09 and 2.37 times greater than among nonsmokers.

Odds ratios

Odds are the relative likelihood of an event; for instance, if the odds of winning a bet are 3, that means that you’re three times as likely to win as to lose, i.e., in terms of probabilities, \(\frac{Pr(\text{win})}{Pr(\text{lose})} = 3\).

An odds ratio is a multiplicative comparison of odds under two circumstances. For example, if the odds of developing cancer among smokers are 2 (twice as likely to get cancer as not), and the odds of developing cancer among nonsmokers are 0.5 (half as likely to get cancer as not), then the odds ratio is \(\frac{2}{0.5} = 4\). This would mean that the odds of developing cancer are four times higher among smokers compared with nonsmokers.

Odds ratios can be estimated directly from a contingency table. For example, the odds that a person is a smoker are about 5.4 times higher among cancer patients than among healthy individuals:

# contingency table
table(smoking$group, smoking$smoking)
         
          Smokers NonSmokers
  Cancer       83          3
  Control      72         14
# odds ratio (cancer/control) of smoking
(83/3)/(72/14)
[1] 5.37963

Somehwat miraculously, the odds ratio computed along one orientation is the same as that computed along the opposite orientation. That is, if we had a random sample rather than a case-control study, the odds ratio of cancer among smokers compared with nonsmokers is:

# hypothetically, odds ratio (smokers/nonsmokers) of cancer
(83/72)/(3/14)
[1] 5.37963

This is exactly the same!

Your turn

Using the asthma data…

  1. compute the odds ratio of asthma among women compared with men
  2. compute the odds ratio of being a woman among asthmatics compared with non-asthmatics

You should find that they are the same!

# contingency table
table(asthma$sex, asthma$asthma)
        
         asthma no asthma
  male       30       769
  female     49       781
# odds ratio (women/men) of asthma
(49/781)/(30/769)
[1] 1.608237
# odds ratio (asthma/no asthma) of being a woman
(49/30)/(781/769)
[1] 1.608237

Both calculations yield an estimate of 1.608:

  • the relative odds of asthma are estimated to be 1.61 times greater for women
  • the relative odds that a randomly chosen person is female are 1.61 higher among those with asthma

If you invert the order of comparison or compute the odds of the complementary event, you will get different results. For the smoking data, here is an exhaustive list of all of the odds ratios we could compute:

# contingency table
table(smoking$group, smoking$smoking)
         
          Smokers NonSmokers
  Cancer       83          3
  Control      72         14
# odds of cancer (smokers/nonsmokers)
(83/72)/(3/14)
[1] 5.37963
# odds of cancer (nonsmokers/smokers)
(3/14)/(83/72)
[1] 0.1858864
# odds of not getting cancer (nonsmokers/smokers)
(14/3)/(72/83)
[1] 5.37963
# odds of not getting cancer (smokers/nonsmokers)
(72/83)/(14/3)
[1] 0.1858864
# odds of smoking (cancer/control)
(83/3)/(72/14)
[1] 5.37963
# odds of smoking (control/cancer)
(72/14)/(83/3)
[1] 0.1858864
# odds of not smoking (control/cancer)
(14/72)/(3/83)
[1] 5.37963
# odds of not smoking (cancer/control)
(3/83)/(14/72)
[1] 0.1858864

You will notice that there are two algebraically unique odds ratios that are reciprocals of one another. However, there are six conceptually unique odds ratios!

The epitools package has a function oddsratio(...) which takes the two variables as input and returns estimated odds, a confidence interval, and a test of association:

# inference for odds ratio
table(smoking$smoking, smoking$group) |> 
  oddsratio(method = 'wald')
$data
            
             Cancer Control Total
  Smokers        83      72   155
  NonSmokers      3      14    17
  Total          86      86   172

$measure
            odds ratio with 95% C.I.
             estimate    lower    upper
  Smokers     1.00000       NA       NA
  NonSmokers  5.37963 1.486376 19.47045

$p.value
            two-sided
              midp.exact fisher.exact  chi.square
  Smokers             NA           NA          NA
  NonSmokers 0.005116319  0.008822805 0.004948149

$correction
[1] FALSE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

By default it will produce the odds of the second outcome in the second predictor (group) compared with the first predictor (group). So the above estimates the odds of not having cancer among nonsmokers compared with smokers. The rev = ... argument allows you to reverse either rows, columns, or both.

To orient the table correctly, we need to change the order of both rows and columns.

# inference for odds ratio
table(smoking$smoking, smoking$group) |> 
  oddsratio(method = 'wald', 
            rev = 'both')
$data
            
             Control Cancer Total
  NonSmokers      14      3    17
  Smokers         72     83   155
  Total           86     86   172

$measure
            odds ratio with 95% C.I.
             estimate    lower    upper
  NonSmokers  1.00000       NA       NA
  Smokers     5.37963 1.486376 19.47045

$p.value
            two-sided
              midp.exact fisher.exact  chi.square
  NonSmokers          NA           NA          NA
  Smokers    0.005116319  0.008822805 0.004948149

$correction
[1] FALSE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

The result is interpreted as follows:

The odds of developing lung cancer are estimated to be between 1.49 and 19.47 times higher among smokers as compared with nonsmokers.

The result is the same as what we started with, but that’s just because we got lucky and the odds ratio we wanted happened to be algebraically identical to the one we got by default. There is no such guarantee in general.

Your turn

Estimate the odds of asthma among women compared with men.

# inference for odds ratio
table(asthma$sex, asthma$asthma) |>
  oddsratio(method = 'wald',
            rev = 'columns')
$data
        
         no asthma asthma Total
  male         769     30   799
  female       781     49   830
  Total       1550     79  1629

$measure
        odds ratio with 95% C.I.
         estimate    lower    upper
  male   1.000000       NA       NA
  female 1.608237 1.010044 2.560708

$p.value
        two-sided
         midp.exact fisher.exact chi.square
  male           NA           NA         NA
  female 0.04412095   0.04961711 0.04354632

$correction
[1] FALSE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

The odds of asthma are estimated to be between 1.01 and 2.56 times higher among women as compared with men.

Lastly, the confidence level can be adjusted using the conf.level = ... argument:

# adjust confidence level
table(asthma$sex, asthma$asthma) |>
  oddsratio(method = 'wald', 
            rev = 'columns',
            conf.level = 0.9)
$data
        
         no asthma asthma Total
  male         769     30   799
  female       781     49   830
  Total       1550     79  1629

$measure
        odds ratio with 90% C.I.
         estimate    lower    upper
  male   1.000000       NA       NA
  female 1.608237 1.088474 2.376195

$p.value
        two-sided
         midp.exact fisher.exact chi.square
  male           NA           NA         NA
  female 0.04412095   0.04961711 0.04354632

$correction
[1] FALSE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"