PREPARE THE TASKS IN BOLD.
Section 1.1. Description of the data and cleaning
1. Pick the data set you will be working on.
2. Pick 5 variables to include in your project (2 numerical continuous, 2 numerical
discrete/categorical, 1 either one, ID (i.e., name or other type of identification is not counted).
3. Sketch the workload for the cleaning procedure and descriptive statistics.
Section 1.2: Confidence Intervals and Hypothesis Testing
For each section, formulate your question and think about the significance of the question/answer to this question.
Section 2.2.1. Confidence Interval Estimation
To formulate the question here, pick one continuous variable and one categorical. In the categorical variable, pick two levels. Now, a viable question would compare the average of the continuous variable for different levels of the categorical one.
Question:
Why you should care?
Section 2.2.2. T-test
The process here is the same as above. Pick a continuous variable and a categorical variable. Pick two levels. Now, a viable question would compare the average of the continuous variable for different levels of the categorical one.
Question:
Why you should care?
Section 2.2.3. Chi-square test
Chi-square test considered a relationship between TWO categorical variables. So, pick two categorical variables and consider the contingency table between them:
|
Var1/Var2
|
Category 1
|
Category 2
|
….
|
|
Category 1
|
?
|
?
|
|
|
Category 2
|
?
|
?
|
|
|
….
|
|
|
|
The question here would inquire about the distribution of one variable across the levels of another variable.
Question:
Why you should care?
Section 2.3: Regression Analysis
Formulate the question for the regression. Think about one continuous variable that could be explained using other variables (this is called the dependent variables). Then think about what variables you can use to try to explain it (these are called independent variables).
Think about the relationships between the dependent and the independent variable: sketch out the possible hypotheses of the sign of the relationship (positive vs negative) and why.
Question:
Section 2.3.1. Results
Think about who would care about the results of the regression (whether the relationship exists or doesn’t). Could it be consumers? Firms? Other stakeholders?
Why you should care?
Section 2.4: Infographic
What is going to be the central message of your infographic? Pick 3-5 graphs and 3-5 numbers that best support the message you have chosen.