The data sets are available for download in a .zip file.
A retrospective study on the effects of smoking gives the following numbers of smokers in four different patient groups:
Group 1 2 3 4 Smokers 83 90 129 70 Patients 86 93 136 82
- What are the proportions of smokers in the respective samples?
prop.testto test the null hypothesis is that the four populations from which the patients were drawn have the same true proportion of smokers. The alternative is that this proportion is different in at least one of the populations.
- Under simple Mendelian inheritance, the distribution of human
genotypes for a diallelic marker system should be p^2^ : 2pq :
q^2^, where p and q are the allele frequencies (Hardy-Weinberg
- Construct a simple chi^2^ goodness-of-fit test for the null hypothesis of Hardy-Weinberg equilibrium.
In a sample of schizophrenic patients, observed genotype counts for the Dopamine 3 receptor polymorphism were
Genotype A1A1 A1A2 A2A2 Count 45 35 15
Is there evidence for deviation from Hardy-Weinberg equilibrium in the underlying population?
- Data set
carsgives the speed of cars and the distances taken to stop. Note that the data were recorded in the 1920s.
- Plot the data set. Can a linear model (straight line) be used for describing the relation between the variables?
- Graphically analyze the relation between the variables using
- Does linear modeling work after taking logarithms?
- Data set
GAGUrinecontains data collected by Susan Prosser on the concentration of a chemical GAG in the urine of 314 children aged from zero to seventeen years. Analyze these data, and produce a chart to help a pediatrician to assess if a child’s GAG concentration is “normal”.
- The Janka hardness is an important structural property of Australian
timbers, which is difficult to measure. It is, however, related to
the density of the timber, which is relatively easy to measure. For
the data in
jankaa low degree polynomial regression of hardness on density is suggested as appropriate. Fit models and check whether there are obvious outliers or heteroscedasticity and if these can be remedied by square root or log transformations.
- Data set
tetrahymenacontains data about the growth of tetrahymena cells: the diameter (μm) and concentration (counts/ml) of the cells and whether gloces was added to the growth medium or not. Find an appropriate model for the diameter of the cells explained by the other variables.