Relative risk and odds ratios

Inference for measures of association in 2x2 contingency tables

Today’s agenda

  1. [lecture] relative risk and odds ratios in 2x2 tables
  2. [lab] relative risk and odds ratios with epitools in R
  3. [miscellany] preparing for the final

Relative risk

Asthma data

Consider estimating the difference in proportions:

  asthma no asthma
male 30 769
female 49 781
table(asthma$sex, asthma$asthma) |>
  prop.test(asthma.tbl, conf.level = 0.9)

    2-sample test for equality of proportions with continuity correction

data:  table(asthma$sex, asthma$asthma)
X-squared = 3.6217, df = 1, p-value = 0.05703
alternative hypothesis: two.sided
90 percent confidence interval:
 -0.040137075 -0.002841347
sample estimates:
    prop 1     prop 2 
0.03754693 0.05903614 

With 90% confidence, asthma prevalence is estimated to be between 0.28 and 4.01 percentage points higher among women than among men.

Is a difference of up to 4 percentage points practically meaningful? Well, it depends:

  • yes if prevalence is very low
  • no if prevalence is very high

Relative risk

If \(p_F, p_M\) are the (population) proportions of women and men with asthma, then the relative risk of asthma among women compared with men is defined as:

\[ RR = \frac{p_F}{p_M} \qquad \left(\frac{\text{risk among women}}{\text{risk among men}}\right) \]

An estimate of the relative risk is simply the ratio of estimated proportions. For the asthma data, an estimate is: \[ \widehat{RR} = \frac{\hat{p}_F}{\hat{p}_M} = \frac{0.059}{0.038} = 1.57 \]

It is estimated that the risk of asthma among women is 1.57 times greater than among men.

Confidence intervals for relative risk

A normal model can be used to approximate the sampling distribution of \(\log(RR)\) and construct a confidence interval. If \(\hat{p}_1\) and \(\hat{p}_2\) are the two estimated proportions:

\[\log\left(\widehat{RR}\right) \pm c \times SE\left(\log\left(\widehat{RR}\right)\right) \quad\text{where}\quad SE\left(\log\left(\widehat{RR}\right)\right) = \sqrt{\frac{1 - p_1}{p_1n_1} + \frac{1 - p_2}{p_2n_2}}\]

Exponentiate endpoints to obtain an interval for relative risk.

table(asthma$sex, asthma$asthma) |>
  riskratio(rev = 'columns', 
            method = 'wald', 
            conf.level = 0.9,
            correction = T)
         risk ratio with 90% C.I.
Predictor estimate    lower    upper
   male   1.000000       NA       NA
   female 1.572329 1.083353 2.282007

With 90% confidence, the risk of asthma is estimated to be betwen 1.08 and 2.28 times greater for women than for men.

Implementation with epitools

table(asthma$sex, asthma$asthma) |>
  riskratio(rev = 'columns', 
            method = 'wald', 
            conf.level = 0.9,
            correction = T)
$data
        
         no asthma asthma Total
  male         769     30   799
  female       781     49   830
  Total       1550     79  1629

$measure
        risk ratio with 90% C.I.
         estimate    lower    upper
  male   1.000000       NA       NA
  female 1.572329 1.083353 2.282007

$p.value
        two-sided
         midp.exact fisher.exact chi.square
  male           NA           NA         NA
  female 0.04412095   0.04961711 0.05703135

$correction
[1] TRUE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

riskratio is picky about input format:

  • outcome of interest should be second column
  • group of interest should be second row

It will return the relative risk

\[ RR = \frac{n_{22}/n_2}{n_{12}/n_1} \] The data table can be reoriented using rev

  • rev = neither keeps original orientation
  • rev = rows reverses order of rows
  • rev = columns reverses order of columns
  • rev = both reverses both

Reporting results

$data
        
         no asthma asthma Total
  male         769     30   799
  female       781     49   830
  Total       1550     79  1629

$measure
        risk ratio with 90% C.I.
         estimate    lower    upper
  male   1.000000       NA       NA
  female 1.572329 1.083353 2.282007

$p.value
        two-sided
         midp.exact fisher.exact chi.square
  male           NA           NA         NA
  female 0.04412095   0.04961711 0.05703135

$correction
[1] TRUE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

The data provide evidence of an association between asthma and sex (\(\chi^2\) = 3.62 on 1 degree of freedom, p = 0.057). With 90% confidence, the risk of asthma is estimated to be betwen 1.08 and 2.28 times greater for women than for men, with a point estimate of 1.57.

Conventional style:

  • first report the test result
  • then the measure of association
  • include point estimates (since interval is asymmetric)

Treatment efficacy: reduction in risk

In a randomized trial for a malaria vaccine, 20 individuals were randomly allocated to receive a dose of the vaccine or a placebo.

Vaccine trials often estimate relative reduction in risk or “efficacy”:

\[ \underbrace{\frac{\hat{p}_\text{ctrl} - \hat{p}_\text{trt}}{\hat{p}_\text{ctrl}}}_\text{efficacy} = 1 - RR \]

  no infection infection
placebo 0 6
vaccine 9 5
# relative risk
rr.out <- table(malaria$treatment, malaria$outcome) |>
            riskratio(method = 'wald', correction = T)
rr.out$measure
         risk ratio with 95% C.I.
           estimate     lower     upper
  placebo 1.0000000        NA        NA
  vaccine 0.3571429 0.1768593 0.7212006
# efficacy
1 - rr.out$measure
         risk ratio with 95% C.I.
           estimate     lower     upper
  placebo 0.0000000        NA        NA
  vaccine 0.6428571 0.8231407 0.2787994

The vaccine reduces the risk of infection by an estimated 27.9% to 82.3%.

Odds ratios

Example: case-control study

  Smokers NonSmokers total
Cancer 83 3 86
Control 72 14 86

Recall: not possible to estimate the case rate (cancer prevalence) due to the study design.

# chi square test of association
table(smoking$group, smoking$smoking) |>
  chisq.test()

    Pearson's Chi-squared test with Yates' continuity correction

data:  table(smoking$group, smoking$smoking)
X-squared = 6.5275, df = 1, p-value = 0.01062

The data provide evidence of an association between smoking and lung cancer (\(\chi^2 = 6.53\) on 1 degree of freedom, \(p = 0.0106\)).

How do we measure the association, considering we can’t estimate case rates?

Odds

If \(p\) is the true cancer prevalence (a population proportion), then the odds of cancer are:

\[ \text{odds} = \frac{p}{1 - p} \]

The odds represent the relative likelihood of an outcome.

  • \(\text{odds} = 2\): the outcome (cancer) is twice as likely to occur as to not occur
  • \(\text{odds} = 1/2\) indicates the outcome (cancer) is half as likely to occur as to not occur

Odds ratios

Let \(a, b, c, d\) denote population proportions.

\(\;\) outcome 1 (O1) outcome 2 (O2)
group 1 (G1) a b
group 2 (G2) c d

The odds of outcome 1 (O1) in each group are:

  • \(\frac{\textcolor{red}{a}}{\textcolor{blue}{b}}\) in group 1 (G1)
  • \(\frac{\textcolor{orange}{c}}{\textcolor{purple}{d}}\) in group 2 (G2)

The odds ratio or “relative odds” is:

\[ \omega = \frac{\text{odds}_{G1}(O1)}{\text{odds}_{G2}(O1)} = \frac{\textcolor{red}{a}/\textcolor{blue}{b}}{\textcolor{orange}{c}/\textcolor{purple}{d}} = \frac{\textcolor{red}{a}\textcolor{purple}{d}}{\textcolor{blue}{b}\textcolor{orange}{c}} \]

A surprising algebraic fact is that:

\[ \frac{\text{odds}_{G1}(O1)}{\text{odds}_{G2}(O1)} =\frac{\text{odds}_{O1}(G1)}{\text{odds}_{O2}(G1)} \]

relative odds of cancer given smoking status = relative odds of smoking given cancer status

Estimating odds ratios

The estimate is the same calculation as on the previous slide, but with sample counts.

\(\;\) Smoker (O1) NonSmoker (O2)
Case (G1) 83 3
Control (G2) 72 14

Estimate of \(\omega\): \[ \hat{\omega} = \frac{\textcolor{red}{83}\times\textcolor{purple}{14}}{\textcolor{blue}{3}\times\textcolor{orange}{72}} = 5.38 \]

Interpretation:

It is estimated that the relative odds of lung cancer are 5.38 times greater for smokers compared with nonsmokers.

Confidence intervals for odds ratios

The sampling distribution of the log odds ratio can be approximated by a normal model.

\[ \log\left(\hat{\omega}\right) \pm c \times SE\left(\log\left(\hat{\omega}\right)\right) \quad\text{where}\quad SE\left(\log\left(\hat{\omega}\right)\right) = \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{12}} + \frac{1}{n_{21}} + \frac{1}{n_{22}}} \]

The oddsratio(...) function in the epitools package will compute and back-transform the interval for you.

table(smoking$group, smoking$smoking) |>
  oddsratio(rev = 'both', method = 'wald')
         odds ratio with 95% C.I.
          estimate    lower    upper
  Control  1.00000       NA       NA
  Cancer   5.37963 1.486376 19.47045
  • critical value \(c\) from the normal model
  • exponentiate to obtain an interval for \(\omega\)

With 95% confidence, the relative odds of lung cancer are estimated to be between 1.49 and 19.47 times greater for smokers compared with nonsmokers.

Implementation with epitools

table(smoking$group, smoking$smoking) |>
  oddsratio(rev = 'both', 
            method = 'wald',
            conf.level = 0.95,
            correction = T)
$data
         
          NonSmokers Smokers Total
  Control         14      72    86
  Cancer           3      83    86
  Total           17     155   172

$measure
         odds ratio with 95% C.I.
          estimate    lower    upper
  Control  1.00000       NA       NA
  Cancer   5.37963 1.486376 19.47045

$p.value
         two-sided
           midp.exact fisher.exact chi.square
  Control          NA           NA         NA
  Cancer  0.005116319  0.008822805 0.01062183

$correction
[1] TRUE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

oddsratio is picky about data inputs:

  • outcome of interest should be second column

  • group of interest should be second row

It will return the odds ratio

\[ \frac{\text{odds}_\text{R2} (\text{C2})}{\text{odds}_\text{R1}(\text{C2})} = \frac{n_{22}/n_{21}}{n_{12}/n_{11}} = \frac{n_{22} n_{11}}{n_{12}n_{21}} = \frac{da}{cb} \]

The data table can be reoriented using rev

  • rev = neither keeps original orientation
  • rev = rows reverses order of rows
  • rev = columns reverses order of columns
  • rev = both reverses both

Reporting results

$data
         
          NonSmokers Smokers Total
  Control         14      72    86
  Cancer           3      83    86
  Total           17     155   172

$measure
         odds ratio with 95% C.I.
          estimate    lower    upper
  Control  1.00000       NA       NA
  Cancer   5.37963 1.486376 19.47045

$p.value
         two-sided
           midp.exact fisher.exact chi.square
  Control          NA           NA         NA
  Cancer  0.005116319  0.008822805 0.01062183

$correction
[1] TRUE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

First report the test result, then the measure of association:

The data provide evidence of an association between smoking and lung cancer (\(\chi^2 = 6.53\) on 1 degree of freedom, \(p = 0.1062\)). With 95% confidence, the relative odds of cancer are estimated to be between 1.49 and 19.47 times greater among smokers compared with nonsmokers, with a point estimate of 5.38.

Be sure to include the point estimate, since the interval estimate is asymmetric.

Risk or odds?

Rules of thumb

If a study design employs outcome-based sampling, proportions are not estimable.

  • analysis must use relative odds

Otherwise, analysis may use any measure of association.

  • difference in proportions
  • relative risk
  • treatment efficacy
  • relative odds

Risk or odds? Cyclosporiasis outbreak

An outbreak of cyclosporiasis was detected among residents of New Jersey. In a case-control study, investigators found that 21 of 30 cases and 4 of 60 controls had eaten raspberries.

Outcome-based sampling means…

  • can’t estimate risk
  • analysis should use odds ratio
oddsratio(outbreak$exposure, outbreak$group, 
          rev = 'columns', method = 'wald', 
          correction = T)
                odds ratio with 95% C.I.
Predictor        estimate    lower    upper
  no raspberries  1.00000       NA       NA
  raspberries    32.66667 9.081425 117.5048
  no raspberries raspberries
case 9 21
control 56 4

The data provide evidence of an association raspberry consumption and illness (\(\chi^2\) = 36.89 on 1 degree of freedom, p < 0.0001). With 95% confidence, the odds of illness are estimated to be between 9.08 and 117.5 times higher among those who consumed raspberries during the outbreak.

Risk or odds? smoking and CHD

A cohort study of 3,000 smokers and 5,000 nonsmokers investigated the link between smoking and the development of coronary heart disease (CHD) over 1 year.

Two independent samples but not outcome-based sampling…

  • can estimate risk
  • analysis can use any measure
  • RR (arguably) most natural
riskratio(chd$smoking, chd$chd, rev = 'columns',
          method = 'wald', correction = T)
           risk ratio with 95% C.I.
Predictor   estimate    lower    upper
  nonsmoker 1.000000       NA       NA
  smoker    1.609195 1.196452 2.164325
  CHD no CHD
nonsmoker 87 4913
smoker 84 2916

The data provide evidence that smoking is associated with coronary heart disease (\(\chi^2\) = 9.5711 on 1 degree of freedom, p = 0.00198). With 95% confidence, the risk of CHD is estimated to be between 1.196 and 2.164 times greater among smokers compared with nonsmokers.