In this lab assignment you will learn how to examine data produced in a simple statistical experiment. In
particular, you will examine the experiment design and apply graphical, numerical, and inferential tools
available in StatCrunch to compare two distributions produced by the study. Before you start working on
the assignment, you should review the course material about designing experiments and comparing two
population means.
Caffeine Dependence Experiment
Caffeine is the world's most widely consumed mood-altering substance. In North America, about 90% of
adults consume caffeine daily. Coffee is the leading dietary source of caffeine among adults in Canada,
while soft drinks represent the largest source of caffeine for children. People who consume large amounts
of caffeine each day may experience physical withdrawal symptoms if they stop taking in their usual
amounts of the substance.
In this lab assignment you will follow an experiment on caffeine dependency conducted on a group of
volunteers by researchers from Johns Hopkins University School of Medicine, in Baltimore and
summarized in the paper “Caffeine Dependence Syndrome. Evidence from Case Histories and
Experimental Evaluations”, JAMA (The Journal of the American Medical Association), Vol. 272, No. 13, 1994.
The researchers recruited twenty-seven volunteers who believed they were psychologically or physically
addicted to caffeine but otherwise were in good health. Of these twenty-seven volunteers, sixteen were
diagnosed as caffeine dependent based on some general substance dependence criteria. Of the sixteen
subjects who were diagnosed as caffeine dependent, eleven agreed to participate in a study to evaluate their
caffeine dependency. Before the experiment was conducted, daily caffeine intake measurements for each
subject were obtained based on food diaries of the participants.
The experiment was conducted on two 2-day periods which occurred exactly one week apart. During one of
the 2-day periods, the subjects were given a set of capsules containing the amount of caffeine normally
ingested by the subject in one day. During the other study period, the subjects were given placebos. The
order in which each subject received the two types of capsules was randomized. At the end of each 2-day
study period, subjects were evaluated in three areas: depression symptoms, fatigue and vigor. The
experimenters were blinded to whether the subject was receiving the caffeine pills or the caffeine-free pills.
The data are available in the StatCrunch file lab4.txt located on the STAT 151 Laboratories website at
http://www.stat.ualberta.ca/statslabs/stat151/index.htm (click Stat 151 link, and Data for Lab 4). The data
are not to be printed in your submission. The following is a description of the variables in the data file:
Variable Name Description of Variable
SUBJECT Subject number (a whole number from 1 to 11),
DEPR-CAF Depression score during caffeine period,
DEPR-NC Depression score during no-caffeine period,
FATIGUE-CAF Fatigue score during caffeine period,
FATIGUE-NC Fatigue score during no-caffeine period,
VIGOR-CAF Vigor score during caffeine period,
VIGOR-NC Vigor score during no-caffeine period,
SMOKER Smoker status (Y if smoked cigarettes daily, N otherwise),
CAFFEINE Daily intake of caffeine (in mg).
Use the data provided to answer the following questions:
1. First you will examine the experiment design.
(a) Can we treat the 11 subjects who agreed to participate in the study as a random sample from the
population of all caffeine dependent individuals? Explain why or why not? Can you generalize the
results of the study to the population of all caffeine dependent individuals? Explain briefly.
(b) Why were both the subjects and the experimenters interviewing the subjects blinded to whether
the subject was receiving the caffeine pills or the caffeine-free pills?
(c) Why was the order in which the two series of capsules were taken randomized?
(d) Why were the two study periods held one week apart instead of using two consecutive 2-day
periods?
2. Now you will use inferential tools in StatCrunch to compare the levels of depression for the caffeine
and no-caffeine periods.
(a) Do the data give evidence that being deprived of caffeine raises depression scores? Use an
appropriate test to answer the question. In particular, state the hypotheses, report the value of the
test statistic from the output, specify the distribution of the test statistic under the null hypothesis,
provide the p-value, and state your conclusion.
(b) Obtain also a 95% confidence interval for the mean increase in the depression scores between no-
caffeine and caffeine periods. Is the confidence interval consistent with the outcome of the test in
part (a)? Explain briefly.
(c) What assumptions must be satisfied to justify the procedures you used in (a) and (b)? Are the
assumptions met in this case? Obtain the appropriate plot to verify the assumption and paste it into
your report. What is the chief threat to the validity of the results obtained in parts (a) and (b)?
3. Now you will compare the changes in depression levels for smokers and non-smokers.
(a) Use an appropriate test in StatCrunch to see whether the change in the depression scores is
different for smokers and non-smokers. State the null and alternative hypotheses, specify the
distribution of the test statistic under the null hypothesis, report the value of the test statistic, and
the p-value of the test. What is your conclusion?
(b) Explain the choice of your test in (a) and specify the assumptions necessary to apply the test. You
do not need to verify the assumptions.
(c) Obtain a 95% confidence interval for the mean difference in the in the depression scores for non-
smokers and smokers. Interpret the 95% confidence interval. Is the confidence interval consistent
with the test in part (a)?
4. You have studied the change in depression scores exhibited by caffeine-dependent individuals when
they are deprived of caffeine. Now you will explore changes in fatigue and vigor.
(a) Do the data give evidence that being deprived of caffeine raises fatigue scores? Use an appropriate
test to answer the question. In particular, state the hypotheses, report the value of the test statistic
and p-value from the output, and state your conclusion.
(b) Do the data give evidence that being deprived of caffeine lowers vigor scores? Use an appropriate
test to answer the question. In particular, state the hypotheses, report the value of the test statistic
and the p-value from the output, and state your conclusion.
5. Summarize briefly your findings in Questions 1-5 in a form. of a brief report. In particular, indicate
which of the three withdrawal symptoms (more depressive mood, more fatigue, less vigor) seems to be
the most intense. Refer to the plots and inferences in your summary.
LAB 4 ASSIGNMENT: MARKING SCHEMA
Proper Header and appearance: 10 marks
Question 1
(a) Random sample or not: 2 marks
Generalizations: 2 marks
(b) Double-blind study discussion: 2 marks
(c) Order of administration of pills: 2 marks
(d) Timing of the two study periods: 2 marks
Question 2
(a) Hypotheses: 3 marks
Value of the test statistic: 2 marks
Null Distribution: 2 marks
P-value: 2 marks
Conclusion: 2 marks
(b) 95% confidence interval: 4 marks
Comparison the interval with the test: 2 marks
(c) Specifying assumptions (in general): 2 marks
Normality assumption for the data: 2 marks
Plot to verify the assumption of normality: 3 marks
SRS assumption for the data: 2 marks
Chief threat to validity: 2 marks
Question 3
(a) Hypotheses: 3 marks
Value of the test statistic: 2 point
Null distribution: 2 marks
P-value: 2 marks
Conclusions: 2 marks
(b) Choice of the test: 2 marks
Assumption of the two-sample t test: 2 marks
(c) Confidence interval: 4 marks
Consistency of the confidence interval with the test: 2 marks
Question 4
(a) Hypotheses: 3 marks
Value of the test statistic: 2 point
P-value: 2 marks
Conclusions: 2 marks
(b) Hypotheses: 3 marks
Value of the test statistic: 2 point
P-value: 2 marks
Conclusions: 2 marks
Question 5
Brief summary (including the answer to the question): 5 marks
TOTAL= 92