首页 >
> 详细

GV900 Political Explanation, 2017/2018

30 October, 2018

Homework assignment 2

Due Week 7 (13 November)

Write an R code file named (gv900-HW2.R) to complete the following tasks. The easiest

way to write an R code file is to start with an existing file: Duplicate an existing R code file

that you have (e.g., gv900-week4-JointExercise.R) and modify the file contents accordingly.

Rules

• Submit two files, and two files only. That is, submit (1) the coversheet (ESSAY COVERSHEET

2018-2019.docx, available on Moodle) and (2) your R code file (gv900-HW2.R). Don’t

submit your graph or other outputs. You’ll earn 5 points if you do all of these correctly.

• Make sure that you delete your name from your R code file. You’ll earn 5 points if you

do this correctly.

• Execute everything before you submit (e.g., CTRL + A & CTRL + Return on a Windows

PC; Command + A & Command + Enter on a Mac machine), and make sure your file

runs without an error. I will execute your file to check if you did it. You’ll earn 5 points

if your R file runs without an error.

• Your file must have a proper header. You’ll earn 5 points if you do this correctly.

• Add comments and annotations to everything you do. Try to make your code file look

like my code file. If your code file doesn’t have a proper annotation, you’ll lose 5 points.

Don’t copy and paste all the questions into your R code file, but do show me the question

number for each question. You’ll earn 5 points if you do this correctly.

Tasks (5 points each × 15 = 75 points)

1. Load the “world” dataset (world.csv), and store it as an object named world.data.

2. The data set contains a dummy variable (i.e., a nominal variable with two categories)

named oecd that classifies countries into two groups, OECD member countries and nonmember

countries. One way to describe and summarize the information contained in a

nominal variable is to describe the distribution numerically. As we learned during the

past weeks, we describe the distribution of a nominal variable numerically by creating a

frequency table. Create a frequency table of this variable and store it into a data frame

object ft.oecd. The table has to have three columns: values (initially called “Var1”),

frequency (called “Freq”), and percentage (should be called “Percentage”). Change the

column name of the first column to “OECD Member?”.

3. According to the frequency table you created above, (A) how many countries in the data

set are OECD members? (B) How many countries in the data set are not? (C) What

percentage of countries are OECD members? (D) What percentage of countries are nonmembers?

Give me four answers (four numbers) as a comment. Note: for this task, you

don’t need an R command. Just read the table and tell me the numbers. Don’t forget to

comment them out.

4. Another way to describe and summarize a nominal variable is to draw a frequency distribution

graph. For nominal variables, we draw a bar chart. Using the functions available

in the ggplot2 package (e.g., geom bar), draw a bar chart of the dummy variable that

measures OECD membership.

• Hint 1: Don’t forget to load the package using the library function. It’s usually a

good idea to do so at the beginning of your R code file.

• Hint 2: Don’t forget to change the axis labels using the xlab and ylab options. The

appropriate label for the X axis would be “OECD membership”, whereas the label

for the Y axis could be “Number of countries”.

5. List three countries that are coded as OECD member states. List three countries that

are non-democratic according to the democracy dummy variable. Note: Again, you don’t

need a command for this one; I only need six country names.

6. The data set contains a numerical variable (interval-level variable) named gdp 10 thou

that records a country’s per capita GDP in 10,000 US dollars. Note that this variable

measures per capita GDP in 10,000 dollars, not in dollars. This means that, when this

variable takes a value of 4, for example, then that country’s per cpaita GDP is 40,000

dollars, not 4 dollars. Describe this variable numerically by calculating the following

statistics:

• Range (minimum and maximum), median, mean, 1st and 3rd quartile values (Hint:

this can be done at once with one command)

• Standard deviation (Hint: you need to take care of missing values using the na.rm

option)

Note: You need to provide R commands, not just numerical answers for this one.

7. It appears that the mean and the median of this per capita GDP variable are far apart:

the mean is 6,018 dollars whereas the median is 1,897 dollars. Given that the mean is

much higher than the median, the distribution of this variable is very skewed (i.e., not

symmetric). In which way does the skew go? Answer this question by choosing between

two options: (A) negatively skewed (skewed to the left) or (B) positively skewed (skewed

to the right). Note: Give me your answer in words, not in R commands.

8. Describe this per capita GDP variable graphically by drawing a histogram.

• Hint: Don’t forget to change the axis labels using the xlab and ylab options. The

appropriate label for the X axis would be “Per capita GDP (in 10,000 US dollars)”,

whereas the label for the Y axis could be “Number of countries”.

9. There are two countries in the data set whose per capita GDP is greater than 40,000 US

dollars. Identify these two countries. For this task, I need an R command that gives us

the name of the two countries. Your command will probably generate the

(14 of them), along with the name of the two countries, but that’s fine.

10. We have calculated the sample mean of this per capita GDP variable in task 6. We

have also calculated its standard deviation. We also know from task 6 that there are 14

observations (countries) where this variable is missing, so we have 191 (total number of

countries in the data set) −14 = 177 observations (i.e., n = 177). Therefore, we have all

the building blocks to calculate the standard error of the mean. Calculate the standard

error (the answer should be 0.07091015). Note that I need R commands, not just the

numerical answer.

11. Using the calculated standard error and the mean value, construct the 95 % confidence

interval of the sample mean of gdp 10 thou. For this, I need both R commands and the

numerical answer.

12. Draw histograms of per capita GDP variable, one for democracies and the other for nondemocracies.

• Hint 1: Use the democ regime variable to classify the countries into democracies and

non-democracies.

• Hint 2: Use the facet wrap option. For this task, you may actually have three

histograms (No, Yes, and NA), and that’s OK. We will correct it below.

13. We find (I mean, I find) a few things about this graph unsatisfactory. First, it is a little

bit aesthetically unpleasing that we have a blank graph on the far right. This happens

because there are missing values. Second, the labels “No” and “Yes” are not intuitive at

all (readers can’t know what “Yes” and “No” mean simply by looking at the graph). So

let’s now fix these two things. Create a new data frame named dem.gdp that excludes

those rows where the democ regime variable is missing. Use the is.na function for this.

Then, create a new variable dem.dum within this new data set, which has two nominal

values, “Democracy” and “Autocracy”, instead of “Yes” and “No”. Then, recreate the

histograms you drew in 12. It should look like the following:

Autocracy Democracy

Per capita GDP (in 10,000 US dollars)

Number of countries

14. The graph above appears to suggest that democracies tend to have higher per capita

GDP. Let’s document this relationship by calculating the mean value of per capita GDP

for each group. In doing so, report the 95 % confidence intervals as well. For task 14,

calculate the mean of per capita GDP for democracies, along with the 95 % confidence

interval. Please provide both the commands as well as the results (numbers).

15. Similarly, calculate the mean of per capita GDP for autocracies (non-democracies), along

with the 95 % confidence interval.

End of file

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp2

- Tsp课程作业代写、代做algorithms留学生作业、代做java，C/C 2020-06-23
- Kit107留学生作业代做、C++编程语言作业调试、Data课程作业代写、代 2020-06-23
- Sta302h1f作业代做、代写r课程设计作业、代写r编程语言作业、代做da 2020-06-22
- 代写seng 474作业、代做data Mining作业、Python，Ja 2020-06-22
- Cmpsci 187 Binary Search Trees 2020-06-21
- Comp226 Assignment 2: Strategy 2020-06-21
- Math 504 Homework 12 2020-06-21
- Math4007 Assessed Coursework 2 2020-06-21
- Optimization In Machine Learning Assig... 2020-06-21
- Homework 1 – Math 104B 2020-06-20
- Comp1000 Unix And C Programming 2020-06-20
- General Specifications Use Python In T... 2020-06-20
- Comp-206 Mini Assignment 6 2020-06-20
- Aps 105 Lab 9: Search And Link 2020-06-20
- Aps 105 Lab 9: Search And Link 2020-06-20
- Mech 203 – End-Of-Semester Project 2020-06-20
- Ms980 Business Analytics 2020-06-20
- Cs952 Database And Web Systems Develop... 2020-06-20
- Homework 4 Using Data From The China H... 2020-06-20
- Assignment 1 Build A Shopping Cart 2020-06-20