首页 >
> 详细

STAT 385 Fall 2019 - Homework Assignment 03

Due by 12:00 PM 10/13/2019

The Homework Problems

Below you will find problems for you to complete as an individual. It is fine to discuss the homework problems with classmates, but cheating is prohibited and will be harshly penalized if detected.

1. Create a custom volume measurement function that will convert the following units of volume:

13 imperial (liquid) cups to cubic inches.

2.5 US customary (liquid) gallons to fluid ounces.

3 US customary (dry) teaspoons to milliliters.

75 (dry) liters to imperial quarts.

2. Do the following:

create a 25 ×× 25 matrix with autoregressive structure with p=9/10p=9/10, every element in the matrix should be equal to (9/10)|i−j|(9/10)|i−j| where i is the row index and j is the column index. Report the row and column sums of this matrix.

run the commands:

set.seed(13)

x <- c(10, 10)

n <- 2

Create a while loop which concatenates a new mean-zero normal random variables that have σ=2σ=2 to the existing vector x at every iteration. Have this loop terminate when the standard error (estimated standard deviation of x divided by n−−√n) is lower than 1/10. Report nn.

repeat part b and report nn after running the commands:

set.seed(13)

x <- rnorm(0, sd = 2)

n <- 1

The sample size required to get a standard error lower than 1/10 was smaller in part c than it was in part b. We would expect for this to be the case before we ran any code. Why?

3. Do the following (Efron’s bootstrap):

load in the dataset dataHW3.csv

call the first column of this dataset x. Compute the statistic (mean(x) - 10)/se(x) where se is shorthand for standard error (see the previous problem for the definition of standard error).

now resample the elements of x with replacement 10000 times, and compute and store the statistic (mean(x’) - mean(x))/se(x’) at each iteration where x’ corresponds to the resample of the elements of x. Call the vector which contains these reasampled statistics `resamples’. Use an apply function for this part.

run the command `hist(resamples, breaks = 20)’ to make a histogram, include this histogram in your assignment.

repeat parts b through d with respect to the second column of dataHW3.csv. Would you say that the test statistic calculated from each column has the same distribution?

4. Do the following:

make sure you have the dataset WPP2010.csv (your file location may need to change) and then run the commands:

# load in UN dataset and remove irrelevant variables

options(warn=-1)

WPP2010 <- read.csv("WPP2010.csv", header = TRUE)

colnames(WPP2010)[3] <- c("region")

colnames(WPP2010)[6] <- c("year")

colnames(WPP2010)[7:17] <- paste("age", 0:10 * 5, sep = "")

WPP2010 <- WPP2010[, c(3, 6, 11, 12)]

# restrict attention to countries of interest

countries <- c("Canada", "Mexico", "United States of America")

# obtain population data for all countries for all years

dataset <- WPP2010[WPP2010[, 1] %in% countries, ]

dataset[, 3] <- as.numeric(levels(dataset[, 3]))[dataset[, 3]]

dataset[, 4] <- as.numeric(levels(dataset[, 4]))[dataset[, 4]]

dataset[, 3:4] <- dataset[, 3:4] / 1000

# get population dataset for this analysis corresponding to the

# Census years

dataset.years <- dataset[dataset[, 2] %in%

c("1960", "1970", "1980", "1990", "2000", "2010"), ]

dataset.years[, 2] <- factor(dataset.years[, 2])

dataset.years.list <- split(dataset.years, f = as.factor(dataset.years[, 2]))

pops <- unlist(lapply(dataset.years.list, function(x) sum(x[, 3:4])))

The code in part a is partially commented. Add comments to all remaining lines of code to make the script clear.

Determine the proportion of mainland North American males aged 20-29 that lived in 1970 or before.

5. With the tidyverse package and its functions, do the following with the CCSO Bookings Data:

show only the 2012 bookings for people ages 17-23 years old not residing in Illinois and show the data dimension

show only the bookings for people who have employment status as “student” booked after the year 2012 residing in Danville and show the data dimension

show only the bookings for Asian people residing in the cities of Champaign or Urbana and show the data dimension

repeat parts a-c using only pipe operators

Select in-class tasks

Completion of select in-class tasks will be worth 1 point and will be graded largely by completion. Obvious errors and incomplete work will recieve deductions. Problems 3-5 are directly copied from your notes. Problems 1-2 are copied from the notes with minor alterations. In these problems I ask that you display the first 5 rows of the dataset instead of the entire dataset.

Load in the CCSO dataset, discover 3 factor (or categorical) variables and 3 numeric variables. Show the first 5 rows of this dataset with only those 6 variables.

Rename one of the factor variables to a name that is either easier to understand than the original variable name. Show the first 5 rows of the dataset with all variables such that the variable with the new name is the first column in the dataset.

Write 3 separate loops: a for loop, while loop, and repeat loop that give the same result. The result should be the cumulative sum of Days in jail among Black people whose Arrest Ages 18-24 with Student as Employment status within the CCSO Bookings Data.

Here are some images of R code. Read the code, debug it if necessary, and judge it on its efficiency and correctness. Decide on which set of code is better and improve the better one.

Using the vector y below

set.seed(385)

y <- rnorm(100)

Use the which.min and which.max functions to dispay the index corresponding to the minimum and maximum elelments of y.

Do the which.min and which.max functions work? (try: max(y) == y[which.max(y)]).

Use the which function and the length function to report the proportion of the elements of y that are greater than 0.

Discuss why the proportion in part c is close to 0.5. Hint: What is the mean of the normal distribution that generated the elements in y?

Create a factor variable with 50 values of A and 50 values of B, and name this factor variable trt.

Create a data frame consisting of x and trt.

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp

- Data Visualisation And Analytics Assi... 2019-11-15
- Block Breaker Assignment Game Engine ... 2019-11-15
- Data Visualisation And Analytics 2019 2019-11-15
- Event Driven Computing 2019 Assignment... 2019-11-15
- Fit1043 Assignment 3 2019-11-15
- Event Driven Computing Assignment 3 - ... 2019-11-15
- 代做data Ming作业、代写systematic课程作业、代写r编程语言 2019-11-15
- Cs210留学生作业代做、Java编程语言作业调试、Java课程设计作业代写 2019-11-15
- 代写stat 385作业、代做r程序语言作业、代写r课程设计作业、Progr 2019-11-15
- 代写cpeg 222作业、Java，C/C++程序语言作业调试、Python 2019-11-15
- Ece 547作业代做、代写python编程设计作业、代做networks留 2019-11-15
- Csc8202作业代做、Web编程语言作业代写、代做web、Html课程设计 2019-11-15
- 代写mathematics课程作业、Matlab编程语言作业代做、代写mat 2019-11-15
- 代做pyopencl留学生作业、Python程序设计作业调试、Python实 2019-11-15
- Rtos Kernel作业代做、代写python，C++程序语言作业、代做j 2019-11-14
- Algorithm课程作业代写、代做r课程设计作业、R编程语言作业调试、代写 2019-11-14
- 代做fpu留学生作业、代写python，Java编程设计作业、代写c++语言 2019-11-14
- 代写msc/Icy课程作业、代写software留学生作业、代做java语言 2019-11-14
- Cse105留学生作业代做、Java程序语言作业调试、代做programmi 2019-11-14
- 代写fm 9528留学生作业、代做risk Analytics作业、Java 2019-11-14