Problem Set 4
Data Analysis and Statistical Methods (WMEE14000 – 2018/2019)
IMPORTANT NOTE: For all calculation without R, please "type" the formulas, values, etc., and DO
NOT use photos of your hand-written calculations.
Question 1
Researchers studied a mutant type of flax seed that they hoped would produce oil for use in
margarine and shortening. The amount of palmitic acid in the flax seed was an important factor in
this research; a related factor was whether the seed was brown or was variegated. The seeds were
classified into six combinations of palmitic acid and color, as shown in the table below. According to a
hypothesized (Mendelian) genetic model, the six combinations should occur in a 3:6:3:1:2:1 ratio.
That is, brown and low acid level should occur with probability 3/16, brown and intermediate acid
level should occur with probability 6/16, and so on.
Color Acid level Observed
Brown Low 15
Brown Intermediate 26
Brown High 15
Variegated Low 0
Variegated Intermediate 8
Variegated High 8
Total 72
a) Design a statistical test to check the correctness of the genetic model and declare the null
and alternative hypothesis. [0.5 pt]
b) Run the test you designed in part a without use of R and draw your conclusion. [0.75 pt]
c) Run the test you designed in part a in R environment and draw you conclusion. [0.75 pt]
Question 2
A local magazine reported on preferred types of office communication by different age groups in its
neighborhood. The results based on a survey of 500 respondents in each age group are cross-
classified in the table below:
Type of Communication Preferred
Group
Meetings
Face-to-Face
Meetings
with
Individuals
E-mails Other Total
Ag
e G
rou
p Generation X 180 260 50 10 500 Generation Y 210 190 65 35 500
Boomer 205 195 65 35 500
Mature 200 195 50 55 500
Total 795 840 230 135 2000
2
a) Design and conduct a statistical test (without R at the 95% confidence level) to check
whether there is any evidence of relationship between age groups and type of
communication (i.e. test if the preferred communication depends on the age group) and
draw your conclusion. [1 pt]
b) Import the raw data from “communication_per_age.csv” using “read.csv” function. Design
and run the test of part a in R, analyze the R output, and draw your conclusion. [0.75 pt]
c) In R, process the data of “communication_per_age.csv” to reproduce the table above. [0.25]
Question 3
In one experiment in University of Wonderland, three type of diet were tested on 76 people. The
data is available in “diet.csv”. This dataset includes the following variables:
Variable Variable Description
Person Participant number
gender Gender, 1 = male, 0 = female
Age Age (years)
Height Height (cm)
preweight Weight before the diet (kg)
Diet Diet
Weight6weeks Weight after 6 weeks (kg)
a) Import the dataset in R using “read.csv” function. Calculate a new variable DietEffect (i.e.
weight lost by participants after 6 weeks) and add it to the data set [0.25 pt]
b) In R, make a boxplot for dietEffect where the data is grouped by Diet. [0.25 pt]
c) In R, design and run a hypothesis test to check whether the effects of all diets are the same
or whether at least one of the diets has an effect that differs from the others. Explain each
item in the output of the test and draw your conclusions. [1 pt]
Question 4
The pressure P, temperature T, and volume V of one mole of an ideal gas are related by the equation
PV = 8.31T, when P is measured in kilopascals, T is measured in kelvins, and V is measured in liters.
a) Assume that P = 242.52 ± 0.03 kPa and V = 10.103±0.002 L. Estimate T and find the
uncertainty in the estimate. [1 pt]
b) Round the result and uncertainty to the significant digits. Explain what significant digits are
[0.5 pt]
c) Calculate T and sT (standard devia) using a Monte Carlo simulation with 10000 repetitions
and plot its histogram. [1 pt]