Biostatistics: Exercise 4

Contingency tables

The data sets are available for download in a .zip file.

  1. Data set HairEyeGender contains the distribution of hair and eye color and gender in a sample of 592 statistics students.

    • Explore the relation between the three variables in the data set. Use ftable for inspecting the counts. The auxiliary functions margin.table and prop.table can be used for computing table margins and expressing table entries as fractions of these margins, respectively. Can you find any relations using these figures?

    • Use mosaicplot to analyze the data set graphically. Try re-arranging the order of the variables using aperm to find a display which is most informative. Which relations can you find?

    • If the variables were independent, the joint distribution of the cell counts would be the product of the marginal distributions. For which cells is the difference between observed and expected counts “most” extreme?

    • Try visualizing the HairEyeGender data set using dot plots.

  2. Data set Titanic provides information on the fate of passengers on the fatal maiden voyage of the ocean liner `Titanic’, summarized according to economic status (class), sex, age and survival. Use mosaicplot in combination with the above-mentioned methods for obtaining margins to analyze the data set graphically. Are the data in agreement with the principle of `women and children first’?

  3. Data set Virginia contains the death rates in Virginia in 1940, cross-classified by age group (rows) and population group (columns). (The population groups are Rural/Male, Rural/Female, Urban/Male and Urban/Female, hence this is really 3-way data classified according to the variables age, site, and gender.) Use Cleveland dot plots to visualize the data set. Could you also use mosaic plots?

  4. The following table shows the approval of the U.S. President’s performance in office in two surveys, one month apart, for a random sample of 1600 voting-age Americans.

    Survey 2
    Survey 1

    Is there significant association between the two successive ratings?

  5. Data set Admissions contains data on applicants to graduate school at Berkeley for the six largest departments in 1973 classified by admission and gender. At issue is whether the data show evidence of gender bias in admission practices.

    • Compute the overall odds ratio (of a relation between the binary traits admission and gender) obtained by aggregating over the departments, and compare it to the ratios in the strata.

    • Analyze the data set graphically, and try displaying the aggregated table along with the tables for each department in one plot.