Biostatistics: Exercise 3
Generalized linear models
The data sets are available for download in a .zip file.
 Under simple Mendelian inheritance, the distribution of human
genotypes for a diallelic marker system should be p^2^ : 2pq :
q^2^, where p and q=1p are the allele frequencies
(HardyWeinberg equilibrium).
 Construct a simple chi^2^ goodnessoffit test for the null
hypothesis of HardyWeinberg equilibrium where you use both
methods suggested in the lecture:
 determine the ML estimate for p to obtain the expected values and calculate the chi^2^ statistic and
 minimize the chi^2^ statistic.
Please note that you have to correct the degrees of freedom for the number of parameters estimated.

In a sample of schizophrenic patients, observed genotype counts for the Dopamine 3 receptor polymorphism were
Genotype A1A1 A1A2 A2A2 Count 45 35 15
Is there evidence for deviation from HardyWeinberg equilibrium in the underlying population?
 Construct a simple chi^2^ goodnessoffit test for the null
hypothesis of HardyWeinberg equilibrium where you use both
methods suggested in the lecture:
 Data set
tetrahymena
contains data about the growth of tetrahymena cells: the diameter (μm) and concentration (counts/ml) of the cells and whether gloces was added to the growth medium or not. Find an appropriate model for the diameter of the cells explained by the other variables.  Data set
menarche
contains information about the age at menarche in Warsaw female children, collected in 1965. Analyze the proportions of girls who have reached menarche using both logit and probit links.  Data set
coronary
provides data about the association between the risk of coronary attack, age and smoking. For each combination of age group and smoking (yes/no) the number of deaths and the number of personyears at risk is given. How does the death rate depend on age and smoking?  Data set
malaria
contains a random sample of 100 children, aged 315 years, from a village in Ghana. The children were followed for a period of 8 months. At the beginning of the study, values of a particular antibody were assessed. Based on observations during the study period, the children were categorized into two groups: individuals with and without symptoms of malaria. How does the probability of getting malaria depend on the other variables? 
On the Greek island of Kalythos the male inhabitants suffer from a congenital eye disease, the effects of which become more marked with increasing age. Samples of islander males of various ages were tested for blindness and the results recorded.
Age: 20 35 45 55 70 No. tested: 50 50 50 50 50 No. blind: 6 17 26 37 44
Using an logit or probit model estimate the LD50, that is, the age at which the probability of blindness is p=1/2, together with the standard error. Check how different the logit and probit models are in this respect.