Lab: Two-sample inference for proportions

With solutions

The goal of this lab is to learn how to perform two-sample inference for categorical data. This includes both binomial and multinomial data. Examples follow those given in lecture.

load('data/nhanes500.RData')
load('data/vitc.RData')
load('data/asthma.RData')

Two-way tables

For study data in which a values of categorical variable are recorded for two or more groups, analysis proceeds from a two-way table of the category counts for each group. Two-way tables are constructed by cross-tabulating the categorical variable with the group labels. In R, this is achieved using the table(...) function with two arguments. For the data from the Vitamin C experiment discussed in lecture:

# two-way table
table(vitc$trt, vitc$out)
         
          Cold NoCold
  Placebo  335     76
  VitC     302    105

By convention, the categories for the variable of interest are shown on the column dimension and the group labels are shown in the row dimension. In R, this means entering the grouping variable (vitc$trt above) as the first argument to table(...).

To see groupwise point estimates of the category proportions, use prop.table(...):

# groupwise proportions
table(vitc$trt, vitc$out) |>
  prop.table(margin = 1)
         
               Cold    NoCold
  Placebo 0.8150852 0.1849148
  VitC    0.7420147 0.2579853

The estimated chance of getting a cold without the supplement is 81.5%, compared with an estimated chance of 74.2% with the supplement.

Notice the margin = 1 argument – this specifies that the counts should be normalized using the row totals (as opposed to the column totals or grand total). It’s very important to specify this correctly – you should do a quick sanity check by hand to make sure the proportions returned match your expectation.

Your turn

The asthma data contains, for a random sample of adults, each subjects’ sex and whether they are asthmatic. Construct a two-way table of asthma prevalence by sex and compute the proportion of asthmatics for each group.

Solution
# two-way table
table(asthma$sex, asthma$asthma)
        
         asthma no asthma
  male       30       769
  female     49       781
# groupwise proportions
table(asthma$sex, asthma$asthma) |>
  prop.table(margin = 1)
        
             asthma  no asthma
  male   0.03754693 0.96245307
  female 0.05903614 0.94096386

30 men and 49 women in the dataset are asthmatic; this amounts to an estimated prevalence of 3.75% and 5.90% for men and women, respectively.

If you obtained different proportions, then either the margin is not specified correctly or the table orientation is reversed.

The construction of two-way tables proceeds identically for multinomial data and/or multiple groups. For the example comparing health perceptions between survey years from the NHANES data:

# two-way table for multinomial data
table(nhanes$surveyyr, nhanes$healthgen)
         
          Excellent Vgood Good Fair Poor
  2009_10        27    78   88   28    5
  2011_12        20    84   89   25    6
# groupwise proportions
table(nhanes$surveyyr, nhanes$healthgen) |>
  prop.table(margin = 1)
         
           Excellent      Vgood       Good       Fair       Poor
  2009_10 0.11946903 0.34513274 0.38938053 0.12389381 0.02212389
  2011_12 0.08928571 0.37500000 0.39732143 0.11160714 0.02678571

An estimated 11.95% of U.S. adults perceive themselves in excellent health in 2009-2010; by comparison, an estimated 8.93% hold the same perception in 2011-2012.

Your turn

Using the nhanes data, estimate the proportions of men and women who perceive themselves as in excellent health.

Solution
# groupwise proportions
table(nhanes$gender, nhanes$healthgen) |>
  prop.table(margin = 1)
        
          Excellent      Vgood       Good       Fair       Poor
  female 0.07327586 0.36206897 0.39655172 0.13793103 0.03017241
  male   0.13761468 0.35779817 0.38990826 0.09633028 0.01834862

An estimated 7.33% of women perceive themselves in excellent health; by comparison, an estimated 13.76% of men hold the same perception.

When the grouping attribute is constrained by the study design (as in the Vitamin C experiment or when grouping NHANES data by survey year), it’s more natural to think of the table as a summary of univariate data for two or more independent samples. In this context, inferences should be described in terms of comparisons among groups. However, when the grouping attribute is determined by observational data (as in the asthma data or when grouping NHANES data by sex), it’s more natural to think of the two-way table as a multivariate summary of a single sample. In this context, inferences should be described in terms of “association” between the variables that define the columns and rows.

Two-sample inference for binomial data

Consider testing whether vitamin C supplements affect the probability of getting a cold. The corresponding hypotheses are:

\[ \begin{cases} H_0: &p_1 = p_2 \\ H_A: &p_1 \neq p_2 \end{cases} \] The subscripts distinguish the treatment groups (not the outcomes). As in the one-sample setting, a test can be constructed by:

  • determining expected counts if \(p_1 = p_2 = p\) from the table margins (see here)
  • measuring the divergence of observations from expectations using the \(\chi^2\) statistic
  • comparing with a \(\chi^2_1\) model to determine a p-value

Implementation in R is provided by prop.test(...):

# inference for a difference in binomial proportions
table(vitc$trt, vitc$out) |>
  prop.test()

    2-sample test for equality of proportions with continuity correction

data:  table(vitc$trt, vitc$out)
X-squared = 5.9196, df = 1, p-value = 0.01497
alternative hypothesis: two.sided
95 percent confidence interval:
 0.01391972 0.13222111
sample estimates:
   prop 1    prop 2 
0.8150852 0.7420147 

Since the data come from a randomized experiment, we can interpret the difference in proportions as a causal effect of the treatment:

The data provide evidence that vitamin C supplements affect the probability of getting a common cold (\(\chi^2\) = 5.92 on 1 degree of freedom, p = 0.01497).

The function also provides an interval estimate for the difference in proportions:

With 95% confidence, vitamin C supplements lower the chance of getting a common cold by an estimated 1.39 to 13.22 percentage points.

Your turn

Test whether asthma prevalence differs between men and women. Report the result of the test in context following conventional style and provide an interval estimate for the difference.

Solution
# inference for a difference in binomial proportions
table(asthma$sex, asthma$asthma) |>
  prop.test()

    2-sample test for equality of proportions with continuity correction

data:  table(asthma$sex, asthma$asthma)
X-squared = 3.6217, df = 1, p-value = 0.05703
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.0434742223  0.0004958005
sample estimates:
    prop 1     prop 2 
0.03754693 0.05903614 

The data do not provide evidence that asthma prevalence differs between men and women (\(\chi^2\) = 3.62 on 1 degree of freedom, p = 0.05703).

A 95% interval is returned by default. To change this, add a conf.level = ... argument:

# adjust confidence level
table(vitc$trt, vitc$out) |>
  prop.test(conf.level = 0.99)

    2-sample test for equality of proportions with continuity correction

data:  table(vitc$trt, vitc$out)
X-squared = 5.9196, df = 1, p-value = 0.01497
alternative hypothesis: two.sided
99 percent confidence interval:
 -0.003898486  0.150039319
sample estimates:
   prop 1    prop 2 
0.8150852 0.7420147 

One can also perform a directional test by adding an alternative = ... argument. For instance, to test whether vitamin C prevents common cold, we’d test:

\[ \begin{cases} H_0: &p_1 = p_2 \\ H_A: &p_1 > p_2 \end{cases} \] Here \(p_1\) denotes the binomial proportion for the placebo group and \(p_2\) denotes the same for the treatment group. To perform this test, add alternative = 'greater':

# upper-sided test
table(vitc$trt, vitc$out) |>
  prop.test(alternative = 'greater')

    2-sample test for equality of proportions with continuity correction

data:  table(vitc$trt, vitc$out)
X-squared = 5.9196, df = 1, p-value = 0.007487
alternative hypothesis: greater
95 percent confidence interval:
 0.02303649 1.00000000
sample estimates:
   prop 1    prop 2 
0.8150852 0.7420147 

The interpretation of this test is:

The data provide evidence that vitamin C is effective at preventing common cold (\(\chi^2\) = 5.92 on 1 degree of freedom, p = 0.007487).

And the interval:

With 95% confidence, vitamin C reduces the chance of getting a common cold by at least 2.3 percentage points.

While directional tests are less common, the above provides an example in which a one-sided test more directly answers the research question (since the premise of the study is that vitamin C supplements at worst have no effect on the chances of getting a cold).

Two-sample inference for multinomial data

Now let’s consider comparing health perceptions between NHANES survey years. If \(\mathbf{p}_1\) denotes the set of category proportions in the 2009-2010 survey and \(\mathbf{p}_2\) denotes the same in the 2011-2012 survey, this corresponds to the hypothesis:

\[ \begin{cases} H_0: &\mathbf{p}_1 = \mathbf{p}_2 \\ H_A: &\mathbf{p}_1 \neq \mathbf{p}_2 \end{cases} \] As in the two-sample comparison of binomial proportions, inference proceeds by:

  • determining expected counts if \(\mathbf{p}_1 = \mathbf{p}_2 = \mathbf{p}\) from the table margins
  • measuring the divergence of observations from expectations using the \(\chi^2\) statistic
  • comparing with a \(\chi^2_{K-1}\) model to determine a p-value (where \(K\) is the number of categories)

The R implementation is provided by chisq.test(...) given the two-way table as input:

# inference comparing multinomial proportions
table(nhanes$surveyyr, nhanes$healthgen) |>
  chisq.test()

    Pearson's Chi-squared test

data:  table(nhanes$surveyyr, nhanes$healthgen)
X-squared = 1.5223, df = 4, p-value = 0.8227

The data do not provide evidence that health perceptions differ between survey years (\(\chi^2\) = 1.52 on 4 degrees of freedom, p = 0.8227)

Although there is not a significant difference, if there had been we’d look at the residuals of the \(\chi^2\) test to explain the result. This is parallel to the one-sample setting, except that the residuals will be arranged in a two-way table. We need to first store the output of chisq.test(...) and then retrieve the appropriate element:

# store test result
chisq.rslt <- table(nhanes$surveyyr, nhanes$healthgen) |>
  chisq.test()

# inspect test residuals
chisq.rslt$residuals
         
            Excellent       Vgood        Good        Fair        Poor
  2009_10  0.69889824 -0.37250646 -0.09474994  0.26791188 -0.22312857
  2011_12 -0.70201139  0.37416574  0.09517199 -0.26910526  0.22412247

While none of the changes were significant, an increased share of respondents reported being in very good, good, and poor health in the later survey year and fewer respondents reported being in excellent or fair health.

The general rule of thumb for inspecting residuals is to look for values in excess of \(\pm 2\).

Your turn

Use the nhanes data to test whether health perceptions differ between men and women. Report the result in context following conventional style and inspect residuals to identify significant differences (if any).

Solution
# store test result
chisq.rslt <- table(nhanes$gender, nhanes$healthgen) |>
  chisq.test()

# show test result
chisq.rslt

    Pearson's Chi-squared test

data:  table(nhanes$gender, nhanes$healthgen)
X-squared = 6.767, df = 4, p-value = 0.1487
# inspect test residuals
chisq.rslt$residuals
        
           Excellent       Vgood        Good        Fair        Poor
  female -1.46898841  0.05252257  0.07816321  0.89445305  0.55802620
  male    1.51542384 -0.05418284 -0.08063399 -0.92272715 -0.57566568

The data do not provide evidence that health perceptions differ by sex (\(\chi^2\) = 6.77 on 4 degrees of freedom, p = 0.1487). Although there is not a statistically significant difference, the largest discrepancy is in the share of individuals who perceive themselves in excellent health – the data suggest that men hold this perception more frequently than women.