调试Ioannis Papastathopoulos、R编程辅导、辅导R、R编程解析、讲解R

Ioannis Papastathopoulos
November 21, 2017
• Submission: The submission details for the number of attempts is set to Single Attempt.
All answers related to calculation of posterior distributions must be written in a pdf or word
document called matriculationnumberA2math. Your code must be written separately in
script. scriptA2.R which is available on Learn. When submitting, rename the script. from
scriptA2.R to matriculationnumberA2.R where matriculationnumber refers to your ma-
triculation number. Failure to rename the script. file will incur a 5% penalty.
You have to submit a single script. file and a single pdf/word document. Failure to
comply with this will incur a 5% penalty. Guidance on what your answer must be is given in
each question. Your answer for each question must be included in the corresponding section of
scriptA2.R file or the corresponding section of matriculationnumberA2math. For example,
your answer/code for question 1.1 must be included in the section below
## ;;
## ---------------------------------------------
## Q1: -- add your code below
## ---------------------------------------------
## ;;
## 1.1
code goes here
## ---------------------------------------------
• Guidance - Assessment criteria.
– □ A marking scheme is given. Additionally to the marking scheme, your code will be
assessed according to the following criteria:
∗ □ Style. follow https://google.github.io/styleguide/Rguide.xml with care;
∗ □ Writing of functions: avoid common pitfalls of local vs global assignments;
wrap your code in a coherent set of instructions and try to make it as generic as
possible; Also, functions that are meant to be optimized with optim must be written
accordingly, see ?optim.
∗ □ Executability: your code must be executable and should not require additional
code in order to run. A common pitfall is failure to load R packages required by your
code.
1
• Deadline: Sunday 3rd December 23:59.
• Individual feedback will be given.
1 Question 1
Consider a set of data relating two score tests, LSAT and GPA, at a sample of 15 American law
schools. Of interest is the correlation = cor(lsat;gpa) between these measurements and the
variance ratio = var(lsat)=var(gpa).
list(lsat = c(576, 635, 558, 578, 666, 580, 555,
661, 651, 605, 653, 575, 545, 572, 594),
gpa = c(3.39, 3.30, 2.81, 3.03, 3.55, 3.07, 3.00,
3.43, 3.36, 3.13, 3.12, 2.74, 2.76, 2.88, 2.96),
n = 15)
1. Write a function in R called CI.cor that returns 95% bootstrap confidence intervals for the
correlation parameter using the basic bootstrap interval method and the percentile interval
method.
Your answer should contain the R function CI.var.cor.
(12 marks)
2. Write a function in R called CI.var.ratio. that returns 95% bootstrap confidence intervals
for the variance ratio using the basic bootstrap interval method and the percentile interval
method.
Your answer should contain the R function CI.var.ratio.
(13 marks)
2
2 Question 2
Let X1;X2;::: be a sequence of independent and identically distributed random variables. An
asymptotically justified model for the distribution function of the excesses Xi j Xi > u above a
large threshold u is given by the generalized Pareto distribution function
FXijXi>u(x) = 1
{
1 +
(x u

)} 1=
+
x>u;
where 2 R, > 0 and x+ = max(x;0).
In all subsequent exercises, you are allowed to use the following function called fit.gpd which
fits the generalized Pareto distribution to exceedances of data x above the threshold u = thresh.
fit.gpd thresh]
sigma -Inf, llik, -1e40 )
return(llik)
}
fit fit.gpd(par=c(0,0),
x=burlington$Precipitation,
thresh=quantile(burlington$Precipitation,0.8))
[1] 1.4869389 0.2784732
## ----------------------------------------------------------
3
2.1 Parametric bootstrap
If X1;:::;Xn is a sequence of independent and identically distributed random variables and pu =
Pr(Xi >u), it then follows that the number of exceedances above u, say Nu = #fXi > ug, has
probability mass function
Pr(Nu = k) =
(n
k
)
pku (1 pu)n k; k = 0;:::;n;
that is, Nu Binomial(n;pu). This suggests the following
PARAMETRIC BOOTSTRAP ALGORITHM
1. Fit by maximum likelihood the generalized Pareto model to exceedances above a threshold u.
Set counter I = 0, bootstrap sample size R;
2. Increment I to I + 1. Simulate N Iu from Binomial(n;^pu);
3. Simulate N Iu exceedances from the fitted generalized Pareto model;
4. Fit the generalized Pareto model to the simulated exceedances to get
I = (^ I; ^ I);
5. If I ( 1;:::; R):
Write a function called parboot.gpd that takes as inputs a vector of numeric values for the raw
data, a threshold argument thresh and bootstrap size R. The function should return a named list
with elements
• $mle: a vector containing the maximum likelihood estimates of , and pu;
• $bias: a vector containing the estimated bias of ^ , ^ and ^pu;
• $se: a vector containing the standard error of ^ , ^ and ^pu;
• $distn: a matrix of dimension R by 3 with the bootstrap distribution of ^ , ^ and ^pu.
Your answer should contain the R function parboot.gpd.
(10 marks)
4
2.2 Non-parametric bootstrap
NON-PARAMETRIC BOOTSTRAP ALGORITHM
1. Fit by maximum likelihood the generalized Pareto model to exceedances above a threshold u.
Set counter I = 0, bootstrap sample size R;
2. Increment I to I + 1. Simulate n variates from the empirical distribution function
^F(x) = 1
n
n∑
i=1
1(Xi x):
3. Fit the generalized Pareto model to the exceedances above u to get
I = (^ I; ^ I);
4. If I ( 1;:::; R):
Write a function called npboot.gpd that takes as inputs a vector of numeric values for the raw
data, a threshold argument thresh and bootstrap size R. The function should return a named li㹳൴਼ ⁥⁥†⁲⁯⁮⁧⁭⹫⁥⁤⁥††㰠⁡⽮㹤ഠੰ‾⸠›⁥⁩⁩䝮⁥⁥††‼ 㰺†⽶㹥ൣੴ⁧⁨⹥㰠⁡⽮㹤ൡੲ⁲⁞⁮⁲‍†⁯⁯⁒†⁴⁴⁢⁳⁡㱩⁲⽩㹢൵ੴ⁯⁞‬⁞††㑡⁮⡳㉷⥥⁲⁨⁤㴠䄠㉦㵩⁯⁢‮‮㰨†⽭㹡൲੫⹲′⁮†⁥䅯㉦⸠㱥⁲⽮㸠൬੥⡶ㅥぬ⁳⤾㰍⁳⽥㸠൴੨㍥⸠⁦䱵⁴⁯⁢⁯⁡⹧⁰⁤⁡‰⁯⁴⁳⸠⁩⁢㱨⁳⽨㹯൬੤⁴⁴‹⁡⁦⁢⁲⁩․䵐⁰䍩⁯⸼㱢 ⼾㸍ഊੴ⁵⁯⁡⁰⁴⁢⁴⁡⁮⹣⹢㱯⁳⽴㹲ൡੰ⠠㕤⁩⥵㱴⁮⽳㸠൯੦㐠⹴⁨ ‍※‰‱‰⹬⁩†⁩†⁢⁔†㱞†⼼㹢൲ਠ‾㰼†⼯㸾഍ਊ⁢㩲†⠍›⁲ ⤍ 㵃⁯㱳⁵⽣㹴ഠਹ⁣⁩⁮††⁯㥲㔠╡⁬⁥⹮㱧⁨⽥㸠൰੥⁴⁴†⁳‼‾⁳⁬⁮⹵⁢⁩⁥⁯㱡⁤⽩㹳൴ੲ⁩⁳⁬⁴㥵㕲╮††⸼㱢⽲㸯> used to obtain all confidence intervals.
(5 marks)
5
3 Question 3
Consider the following dataset
list(t = c(94.3, 15.7, 62.9, 126, 5.24, 31.4, 1.05, 1.05, 2.1, 10.5),
x = c(5, 1, 5, 14, 3, 19, 1, 1, 4, 22),
n = 10)
on observed failuresxi fn = 10 power plant pumps. Hereti denotes the length of operation time of
the pump (in 1000s of hours). The number of failures Xi is assumed to follow a Poisson distribution
Xi j i Poisson( iti). Consider the hierarchical model
i j ; Gamma( ; )
where
Exp( ); Gamma( ; ); = 1 and = = 0:01:
1. What are the conditional posterior distributions of
- i j i; ; ;x;
- j ; ;x;
- j ; ;x?
Write your answer in a section called Question 3 (a) of the file.
(10 marks)
2. Write a function in R called mcmc.pumps that implements a general McMC algorithm with
components ; ; where and are updated from the posterior conditional distribution and
log is updated via a random walk Metropolis step with normal increments. The function
should return samples from the posterior distribution of , and as well as an estimate of
the probability of accepting .
Your answer must contain the code of the function mcmc.pumps.
(10 marks)
3. Write a function in R called predictive.pumps that returns estimates of the predictive dis-
tribution of failures for the i-th pump for a given length of operation time t i, using Monte
Carlo integration.
Your answer must contain the code of the function predictive.pumps.
(5 marks)
6
4 Question 4
Let x1;:::;xn be a sample of independent and identically distributed observations assumed to have
been generated from a t-distribution with degrees of freedom located at , i.e.,
f(xj ) /
{
1 + 1 (x )2
} ( +1)=2
x2 R: (1)
Assume = 10 and suppose N(0;1).
1. Write down, up to proportionality constant, the posterior j x. Is this a recognisable density?
Write a function in R called mcmc.t that implements an McMC algorithm for evaluating the
posterior distribution of where the updating is done using random walk Metropolis with a
normal candidate generator.
Your answer should contain the posterior density of j x in a section called Question 4 (1)
of the =matriculationA2math= file and the function mcmc.t separately in and scriptA2.R
(5 marks)
2. Using the fact that the t-distribution is a scale mixture of normals, the sampling model in (1)
can alternatively be represented as
xi j ;zi N( ;1=zi)
where zi are a priori independent of and
f(zi) /z( =2) 1i expf ( =2)zig:
Using this representation together with the prior distribution N(0;1), write down, up to
proportionality constant, the conditional posterior densities of and of each zi, i = 1;:::;n