Biostatistics: Exercise 4
Contingency tables
The data sets are available for download in a .zip file.

Data set
HairEyeGender
contains the distribution of hair and eye color and gender in a sample of 592 statistics students.
Explore the relation between the three variables in the data set. Use
ftable
for inspecting the counts. The auxiliary functionsmargin.table
andprop.table
can be used for computing table margins and expressing table entries as fractions of these margins, respectively. Can you find any relations using these figures? 
Use
mosaicplot
to analyze the data set graphically. Try rearranging the order of the variables usingaperm
to find a display which is most informative. Which relations can you find? 
If the variables were independent, the joint distribution of the cell counts would be the product of the marginal distributions. For which cells is the difference between observed and expected counts “most” extreme?

Try visualizing the HairEyeGender data set using dot plots.


Data set
Titanic
provides information on the fate of passengers on the fatal maiden voyage of the ocean liner `Titanic’, summarized according to economic status (class), sex, age and survival. Usemosaicplot
in combination with the abovementioned methods for obtaining margins to analyze the data set graphically. Are the data in agreement with the principle of `women and children first’? 
Data set
Virginia
contains the death rates in Virginia in 1940, crossclassified by age group (rows) and population group (columns). (The population groups are Rural/Male, Rural/Female, Urban/Male and Urban/Female, hence this is really 3way data classified according to the variables age, site, and gender.) Use Cleveland dot plots to visualize the data set. Could you also use mosaic plots? 
The following table shows the approval of the U.S. President’s performance in office in two surveys, one month apart, for a random sample of 1600 votingage Americans.
Survey 2 Survey 1 Approve Disapprove Approve 794 150 Disapprove 86 570
Is there significant association between the two successive ratings?

Data set
Admissions
contains data on applicants to graduate school at Berkeley for the six largest departments in 1973 classified by admission and gender. At issue is whether the data show evidence of gender bias in admission practices.
Compute the overall odds ratio (of a relation between the binary traits admission and gender) obtained by aggregating over the departments, and compare it to the ratios in the strata.

Analyze the data set graphically, and try displaying the aggregated table along with the tables for each department in one plot.
