首页 > > 详细

STAB57: Assignment 2

 STAB57: Assignment-2

Shahriar Shams
Winter 2022
Submission deadline: April 01, 2022; 11.59pm (Local Toronto time)
Late penalty: 10% per day.
Instructions on creating documents for submission
• We will use crowdmark for submission and grading which only accepts PDF, JPG and
PNG files.
• I recommend using R-markdown. 1 mark of this assignment is assigned for using R
markdown.
• If you do not want to use Rmarkdown, you can write your answers using Microsoft
Word and in the end save them as pdfs. But you will lose that 1 mark.
• Answers that are fully handwritten will not be accepted.
• If you are a Python user, feel free to use Python in place of R to answer any of the
questions.
• For each answer, make sure you have provided your codes and outputs.
• Make sure your answers are easy to read and nicely presented.
1
Academic Integrity
Each student will work alone. You are not allowed to ask anyone for help on any platform.
Don’t ask for solutions to anyone. Do not share your codes or answers. If you need
clarification on any of these questions, you are allowed to ask questions on Ed or ask
questions during office hours (please do not email us). And please do not post your solution
on Ed and ask “does it look ok?”.
When submitting your assignment on crowdmark, there will be a space for an academic
integrity statement. Write this following statement on paper/ipad/surface and upload a
screenshot of it.
Statement:
I am attesting to the fact that I, [name] (write your full name here), [stnum] (write your
student number here), have abided fully to the Code of Behaviour on Academic Matters.
I have not committed academic misconduct, and am aware of the penalties that may be
imposed if I have committed an academic offence.
2
Question 1 (4 points)
In question 1 of assignment 1, you created all possible combinations(4096 of them) of samples
of size (n=4) from this following population.
11, 12, 13, 14, 15, 16, 17, 18
Here is the code again that will produce 4096 different values of X-bar.
X=c(11, 12, 13, 14, 15, 16, 17, 18)
d=expand.grid(X,X,X,X)
X_bar=apply(d,1,mean)
Even though we need sample size n to be large to apply central limit theorem, but let’s apply
it anyway. Suppose you know that the population variance, σ2 = 5.25.
a. Suppose, someone observes only one of these 4096 combinations as a sample: (15, 16,
17, 18). That person is testing the null hypothesis H0 : µ = 14.5 at level of significance,
α = 0.05, based on this observed set of four numbers. Calculate the p-value that the
person will get, using central limit theorem.
b. Calculate the p-value numerically by using the 4096 ¯X values (do not use CLT here).
hint: think about the graph from assignment 1 [question 1(f)] and use the second
definition of p-value from slide 20, week-7.
c. Why do you see a difference in your calculation in part(a) and part(b)? And under
what condition you expect these two numbers to be similar?
3
Question 2 (5 points)
The goal of this question is to "see" the distribution of a likelihood ratio test statistic. In
other words, we want to see, if H0 is really true, what will be the distribution of W where
W = −2logL(θ0) L(θˆ)
We will do this under two scenarios.
(a) Suppose X1, X2, ..., X10
iid∼ N(µ, σ2 = 9). Treat σ2 = 9 as the known constant.
We want to test H0 : µ = 5 vs H1 : µ ̸= 5 at level of significance, α.
(i) Write a function in R that
• generates 10 samples from a N(µ = 5, σ2 = 9) distribution
• evaluates the likelihood function at µ = 5 (save it under the name L_theta0) • evaluates the likelihood function at µ = ¯x (save it under the name L_theta1) • calculates and returns −2 ∗ log(L_theta0/L_theta1)
(ii) Run this function using the replicate() command (or something similar) and save the
output under the name LRT_vec.
(iii) Plot a density histogram using LRT_vec.
code hint: use hist() with options freq=FALSE, breaks=50.
(iv) Overlay a χ2(df=1) density curve on top of this histogram.
code hint: generate 100000 random samples from a χ2(df=1), use denisty() and lines()
4
(b) (we will repeat the process of part(a) but with a different distribution here)
Suppose X1, X2, ..., X10
iid∼ P ois(λ).
We want to test H0 : λ = 5 vs H1 : λ ̸= 5 at level of significance, α.
(i) Write a function in R that
• generates 10 samples from a P ois(λ = 5) distribution
• evaluates the likelihood function at λ = 5 (save it under the name L_theta0) • evaluates the likelihood function at λ = ¯x (save it under the name L_theta1) • calculates and returns −2 ∗ log(L_theta0/L_theta1)
(ii) Run this function (100000 times) using the replicate() command (or something similar)
and save the output under the name LRT_vec.
(iii) Plot a density histogram using LRT_vec.
(iv) Overlay a χ2(df=1) density curve on top of this histogram.
(c) In both parts (a) and (b), your histograms should match(almost if not completely) with
the χ2(df=1) density. Make a brief comment on what role you expect the sample size to play in
the closeness of the histograms and the χ2(df=1) density. (In other words, do you expect these
type of closeness irrespective of the value of n?)
5
Question 3 (5 points)
In lecture-8 (day 2), we demonstrated the Newton-Raphson algorithm and Fisher’s scoring
algorithm for solving score equations for Exponential distirbution. You will implement these
two algorithms for the Bernoulli distribution.
Suppose this following observations are randomly drawn from Bernoulli(θ) distribution.
## 1 1 1 1 0 1 0 1 1 1
a) By using a random number between 0 and 1 as the initial guess, implement the N-R
algorithm in R to find the MLE of θ. Use the following updating equation that we have
learned in the class
l′(θn) θn+1 = θn + −l
′′(θn)
b) Change the updating equation to the following (so called Fisher’s scoring method) and
implement the algorithm to find the MLE of θ. l′(θn) θn+1 = θn + −E[l
′′(θn)]
c) Try different value as your initial guess and compare the performances of the two
updating equations. (hint: one of them will sometime fail to give you the solution or
will require more iterations to converge. Keep trying different initial values until you
are able to see this difference).
Question-4 (1 point)
• Question 1-3 are worth 14 points in total.
• Here is a video on how to use R-markdown. https://play.library.utoronto.ca/watch/d
b75830ca374b589e5453aeadf248ba2
• The final 1-point of the assignment will be awarded if you use R-markdown to write
your assignment (i.e. to write your answers to Ques 1-3).
 
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!