1. Data Inspection and Statistical Inference:
1. (a) How many variables and observations does the dataset contain? Which variables
are dummy variabl es? Which variables are categorical variabl es (more than 2
categories)?
2. (b) Show price, the dependent variable, as a histogram. Describe the distribution.
3. (c) Determine the relationship between sqft and price, and garage type and price.
4. (d) Show saledate as a histogram (because it is a time variable, declare how you want
your bins divided, hist(saldate, "break"), where break is days, weeks, months, or years).
Is there any useful information here?
2. Model Selection and Output
1. (a) Construct your model by theory and statistical inference. What are your
determinants of sale price? Are there determinants that are missing from the dataset
(you may need to you can create new variables). Present your regressors in a table and
expl ain why you include them (include what your expected effect of each is on your
price).
2. (b) Are there any variables missing from the dataset? Which ones may cause omitted
variable bias?
3. (c) Run your regressions and diagnoses to determine if there are any OLS violations. Is
heteroskedasticity
present? Demonstrate how you came to your conclusion. If heteroskedasticity is present,
use robust standard
errors (> library(sandwich) > coeftest(regressionname, vcov=vcovHC(regressionname,
type = ‘‘HC1’’))). Are there any other clear patterns?
4. (d) What is your choice model(s). Defend your model selection. Present the results in a
table. What corrections did you make to your initial theorized model, and why did you
select it/them?
5. (e) Interpret all your the coefficients (mainly dummy variables, and where you use
natural log transformation, or polynomials).
6. (f) Provide an explanation for any surprising or counterintuitive coefficients.
3. Hypothesis Testing (at the αmax = 0.05 significance level) for your selected model.
1. (a) What variables are statistically significant?
2. (b) Interpret the R2, and the F-statistic.
3. (c) Test the null hypothesis that only the physical characteristics of the house matter
(i.e. jointly test if all variables you include other than the physical characteristics like
square footage, number of bathrooms, etc. are equal to zero). Interpret the resul t.