3. Smoking and Birthweight Consider the dataset birthweight.Rdata. It contains data from
a random sample of 50,000 live births in the US from 1997. The meaning of the variables should
be self-explanatory from their names. The outcome of interest is the weight of the baby at birth
(measured in grams). To give you some idea about magnitudes: in the western world an average
baby weighs about 3,500g at birth, and most babies’ weight is between 2,500g and 4,500g. Birth
weights lower than 2,500g are associated with a number of adverse health outcomes, including
mortality. In this exercise, we want to study the relationship between birth weight and various
explanatory variables. We will run a number of regressions for this purpose. You may report
results by hand.
where each of the columns is a regression. Sometimes the variables will be blank if not included
in the regression. When asked to interpret the results, no need to do tests, simply explain the
meaning of the estimated coe�cient. E.g., for a log-levels regression, explain the interpretation of
an increase in X on Y.
(a) What is the proportion of babies in the data that weigh less than 2,500g?
(b) Run a regression of weight on boy. Interpret the results. Without running any new regressions,
explain how your interpretation would change if the left hand side were the log of brith weight.
(c) Run a regression of weight on boy and smoke.Interprettheresults.
(d) Run a regression of weight on boy, smoke, and their interaction. Interpret the results.
(e) Using the regression in (d), test (by hand and assuming homoscedasticity) whether there are
gender di↵erences in birth weight. Hint: You’ll need to calculate two regressions here and use
the R
No need to look up how to make tables in R, but the curious may research the stargazer command.
2
(f) Run a regression of weight on mom age.Interprettheresults.
(g) Run a regression of weight on a linear and quadratic term in mom age.Interprettheresults.
(h) Test the null hypothesis that the relationship between weight and mom age is linear against
the alternative that the relationship can be described by a quadratic function. What do you
conclude?
(i) In the quadratic model, test the null hypothesis that the relationship between weight mom age
is significant. To be sure: this means testing both the coe�cient on the linear and quadratic
terms. Please do this assuming homoscedasticity. Meaning, calculate the R
2
in the restricted
and unrestricted models and perform. the F-test by hand.
(j) Under the quadratic model, what is the marginal e↵ect of increasing mother’s age for a mother
who is 20? who is 40? Comment on the di↵erence.
4. AProof!Show that in a single variable regression,