STAB57: Assignment 2

STAB57: Assignment-2

Shahriar Shams

Winter 2022

Submission deadline: April 01, 2022; 11.59pm (Local Toronto time)

Late penalty: 10% per day.

Instructions on creating documents for submission

• We will use crowdmark for submission and grading which only accepts PDF, JPG and

PNG files.

• I recommend using R-markdown. 1 mark of this assignment is assigned for using R

markdown.

• If you do not want to use Rmarkdown, you can write your answers using Microsoft

Word and in the end save them as pdfs. But you will lose that 1 mark.

• Answers that are fully handwritten will not be accepted.

• If you are a Python user, feel free to use Python in place of R to answer any of the

questions.

• For each answer, make sure you have provided your codes and outputs.

• Make sure your answers are easy to read and nicely presented.

Academic Integrity

Each student will work alone. You are not allowed to ask anyone for help on any platform.

Don’t ask for solutions to anyone. Do not share your codes or answers. If you need

clarification on any of these questions, you are allowed to ask questions on Ed or ask

questions during office hours (please do not email us). And please do not post your solution

on Ed and ask “does it look ok?”.

When submitting your assignment on crowdmark, there will be a space for an academic

integrity statement. Write this following statement on paper/ipad/surface and upload a

screenshot of it.

Statement:

I am attesting to the fact that I, [name] (write your full name here), [stnum] (write your

student number here), have abided fully to the Code of Behaviour on Academic Matters.

I have not committed academic misconduct, and am aware of the penalties that may be

imposed if I have committed an academic offence.

Question 1 (4 points)

In question 1 of assignment 1, you created all possible combinations(4096 of them) of samples

of size (n=4) from this following population.

11, 12, 13, 14, 15, 16, 17, 18

Here is the code again that will produce 4096 different values of X-bar.

X=c(11, 12, 13, 14, 15, 16, 17, 18)

d=expand.grid(X,X,X,X)

X_bar=apply(d,1,mean)

Even though we need sample size n to be large to apply central limit theorem, but let’s apply

it anyway. Suppose you know that the population variance, σ2 = 5.25.

a. Suppose, someone observes only one of these 4096 combinations as a sample: (15, 16,

17, 18). That person is testing the null hypothesis H0 : µ = 14.5 at level of significance,

α = 0.05, based on this observed set of four numbers. Calculate the p-value that the

person will get, using central limit theorem.

b. Calculate the p-value numerically by using the 4096 ¯X values (do not use CLT here).

hint: think about the graph from assignment 1 [question 1(f)] and use the second

definition of p-value from slide 20, week-7.

c. Why do you see a difference in your calculation in part(a) and part(b)? And under

what condition you expect these two numbers to be similar?

Question 2 (5 points)

The goal of this question is to "see" the distribution of a likelihood ratio test statistic. In

other words, we want to see, if H0 is really true, what will be the distribution of W where

W = −2logL(θ0) L(θˆ)

We will do this under two scenarios.

(a) Suppose X1, X2, ..., X10

iid∼ N(µ, σ2 = 9). Treat σ2 = 9 as the known constant.

We want to test H0 : µ = 5 vs H1 : µ ̸= 5 at level of significance, α.

(i) Write a function in R that

• generates 10 samples from a N(µ = 5, σ2 = 9) distribution

• evaluates the likelihood function at µ = 5 (save it under the name L_theta0) • evaluates the likelihood function at µ = ¯x (save it under the name L_theta1) • calculates and returns −2 ∗ log(L_theta0/L_theta1)

(ii) Run this function using the replicate() command (or something similar) and save the

output under the name LRT_vec.

(iii) Plot a density histogram using LRT_vec.

code hint: use hist() with options freq=FALSE, breaks=50.

(iv) Overlay a χ2(df=1) density curve on top of this histogram.

code hint: generate 100000 random samples from a χ2(df=1), use denisty() and lines()

(b) (we will repeat the process of part(a) but with a different distribution here)

Suppose X1, X2, ..., X10

iid∼ P ois(λ).

We want to test H0 : λ = 5 vs H1 : λ ̸= 5 at level of significance, α.

(i) Write a function in R that

• generates 10 samples from a P ois(λ = 5) distribution

• evaluates the likelihood function at λ = 5 (save it under the name L_theta0) • evaluates the likelihood function at λ = ¯x (save it under the name L_theta1) • calculates and returns −2 ∗ log(L_theta0/L_theta1)

(ii) Run this function (100000 times) using the replicate() command (or something similar)

and save the output under the name LRT_vec.

(iii) Plot a density histogram using LRT_vec.

(iv) Overlay a χ2(df=1) density curve on top of this histogram.

the χ2(df=1) density. Make a brief comment on what role you expect the sample size to play in

the closeness of the histograms and the χ2(df=1) density. (In other words, do you expect these

type of closeness irrespective of the value of n?)

Question 3 (5 points)

In lecture-8 (day 2), we demonstrated the Newton-Raphson algorithm and Fisher’s scoring

algorithm for solving score equations for Exponential distirbution. You will implement these

two algorithms for the Bernoulli distribution.

Suppose this following observations are randomly drawn from Bernoulli(θ) distribution.