首页 >
> 详细

STAT 385 Fall 2019 - Homework Assignment 03

Due by 12:00 PM 10/13/2019

The Homework Problems

Below you will find problems for you to complete as an individual. It is fine to discuss the homework problems with classmates, but cheating is prohibited and will be harshly penalized if detected.

1. Create a custom volume measurement function that will convert the following units of volume:

13 imperial (liquid) cups to cubic inches.

2.5 US customary (liquid) gallons to fluid ounces.

3 US customary (dry) teaspoons to milliliters.

75 (dry) liters to imperial quarts.

2. Do the following:

create a 25 ×× 25 matrix with autoregressive structure with p=9/10p=9/10, every element in the matrix should be equal to (9/10)|i−j|(9/10)|i−j| where i is the row index and j is the column index. Report the row and column sums of this matrix.

run the commands:

set.seed(13)

x <- c(10, 10)

n <- 2

Create a while loop which concatenates a new mean-zero normal random variables that have σ=2σ=2 to the existing vector x at every iteration. Have this loop terminate when the standard error (estimated standard deviation of x divided by n−−√n) is lower than 1/10. Report nn.

repeat part b and report nn after running the commands:

set.seed(13)

x <- rnorm(0, sd = 2)

n <- 1

The sample size required to get a standard error lower than 1/10 was smaller in part c than it was in part b. We would expect for this to be the case before we ran any code. Why?

3. Do the following (Efron’s bootstrap):

load in the dataset dataHW3.csv

call the first column of this dataset x. Compute the statistic (mean(x) - 10)/se(x) where se is shorthand for standard error (see the previous problem for the definition of standard error).

now resample the elements of x with replacement 10000 times, and compute and store the statistic (mean(x’) - mean(x))/se(x’) at each iteration where x’ corresponds to the resample of the elements of x. Call the vector which contains these reasampled statistics `resamples’. Use an apply function for this part.

run the command `hist(resamples, breaks = 20)’ to make a histogram, include this histogram in your assignment.

repeat parts b through d with respect to the second column of dataHW3.csv. Would you say that the test statistic calculated from each column has the same distribution?

4. Do the following:

make sure you have the dataset WPP2010.csv (your file location may need to change) and then run the commands:

# load in UN dataset and remove irrelevant variables

options(warn=-1)

WPP2010 <- read.csv("WPP2010.csv", header = TRUE)

colnames(WPP2010)[3] <- c("region")

colnames(WPP2010)[6] <- c("year")

colnames(WPP2010)[7:17] <- paste("age", 0:10 * 5, sep = "")

WPP2010 <- WPP2010[, c(3, 6, 11, 12)]

# restrict attention to countries of interest

countries <- c("Canada", "Mexico", "United States of America")

# obtain population data for all countries for all years

dataset <- WPP2010[WPP2010[, 1] %in% countries, ]

dataset[, 3] <- as.numeric(levels(dataset[, 3]))[dataset[, 3]]

dataset[, 4] <- as.numeric(levels(dataset[, 4]))[dataset[, 4]]

dataset[, 3:4] <- dataset[, 3:4] / 1000

# get population dataset for this analysis corresponding to the

# Census years

dataset.years <- dataset[dataset[, 2] %in%

c("1960", "1970", "1980", "1990", "2000", "2010"), ]

dataset.years[, 2] <- factor(dataset.years[, 2])

dataset.years.list <- split(dataset.years, f = as.factor(dataset.years[, 2]))

pops <- unlist(lapply(dataset.years.list, function(x) sum(x[, 3:4])))

The code in part a is partially commented. Add comments to all remaining lines of code to make the script clear.

Determine the proportion of mainland North American males aged 20-29 that lived in 1970 or before.

5. With the tidyverse package and its functions, do the following with the CCSO Bookings Data:

show only the 2012 bookings for people ages 17-23 years old not residing in Illinois and show the data dimension

show only the bookings for people who have employment status as “student” booked after the year 2012 residing in Danville and show the data dimension

show only the bookings for Asian people residing in the cities of Champaign or Urbana and show the data dimension

repeat parts a-c using only pipe operators

Select in-class tasks

Completion of select in-class tasks will be worth 1 point and will be graded largely by completion. Obvious errors and incomplete work will recieve deductions. Problems 3-5 are directly copied from your notes. Problems 1-2 are copied from the notes with minor alterations. In these problems I ask that you display the first 5 rows of the dataset instead of the entire dataset.

Load in the CCSO dataset, discover 3 factor (or categorical) variables and 3 numeric variables. Show the first 5 rows of this dataset with only those 6 variables.

Rename one of the factor variables to a name that is either easier to understand than the original variable name. Show the first 5 rows of the dataset with all variables such that the variable with the new name is the first column in the dataset.

Write 3 separate loops: a for loop, while loop, and repeat loop that give the same result. The result should be the cumulative sum of Days in jail among Black people whose Arrest Ages 18-24 with Student as Employment status within the CCSO Bookings Data.

Here are some images of R code. Read the code, debug it if necessary, and judge it on its efficiency and correctness. Decide on which set of code is better and improve the better one.

Using the vector y below

set.seed(385)

y <- rnorm(100)

Use the which.min and which.max functions to dispay the index corresponding to the minimum and maximum elelments of y.

Do the which.min and which.max functions work? (try: max(y) == y[which.max(y)]).

Use the which function and the length function to report the proportion of the elements of y that are greater than 0.

Discuss why the proportion in part c is close to 0.5. Hint: What is the mean of the normal distribution that generated the elements in y?

Create a factor variable with 50 values of A and 50 values of B, and name this factor variable trt.

Create a data frame consisting of x and trt.

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp2

- 代写artificial课程作业、Java，Python程序语言作业调试、C 2020-05-27
- Comp Sci 3306作业代写、Python编程语言作业调试、代做jav 2020-05-27
- Data留学生作业代写、代做r课程设计作业、Analytics作业代做、R编 2020-05-27
- Csci 3120作业代做、C++程序语言作业调试、代做c/C++课程作业、 2020-05-26
- 代写algorithms作业、Data留学生作业代做、代写java、Pyth 2020-05-26
- Data Science作业代写、C++程序设计作业代写、Programmi 2020-05-26
- Data课程作业代写、C++编程设计作业调试、C/C++语言作业代做、Alg 2020-05-26
- 代写r留学生作业、代做data课程作业、代写r编程语言作业代做r语言编程|调 2020-05-25
- Cosc473作业代做、Systems作业代写、Python编程设计作业调试 2020-05-25
- Data留学生作业代做、R编程设计作业调试、R语言作业代写、Program课 2020-05-25
- Comp 250 Assignment 3 2020-05-24
- Macm 316 – Computing Assignment 7 2020-05-24
- Sta457 Assignment 2020-05-24
- Homework 10 2020-05-24
- Lab 2 Msc: Time Series Prediction With... 2020-05-24
- Comp2011作业代做、Data Analysis作业代写、C++编程语言 2020-05-24
- 代做compsys201作业、Python，Java，C/C++编程语言作业 2020-05-24
- Program留学生作业代做、Python编程设计作业调试、Data作业代写 2020-05-24
- 代写 Practical 3 Covid-19程序作业，代写... 2020-05-23
- 代写comp3059作业、代做programming作业、Java语言作业代 2020-05-23