首页 > > 详细

ECON30025/ECOM90020 Assignment 1

 
Assignment 1, 2022
Due 11:59 pm Monday April 11, 2022
This assignment is worth 20% of your final grade for those in ECON30025.
This assignment is worth 25% of your final grade for those in ECOM90020.
Make sure to include the coversheet with your answers. Read the instructions on the coversheet. Try 
to keep your answers short and clear. Please submit copies of all the programs. Write not more than 
5/8 pages of text (not including the programs) and cut and paste any results into a word document. 
The program code should be included as an appendix. When writing code remember to add 
comments to the programs so they would look like ones that you would write for someone else to 
use. One method for saving space is to paste the code and results as a picture then reduce the size. 
All assignments are to be submitted as pdfs. The exam in this subject will be in a similar form.
It is not enough just to provide the computer results – you will be graded on your interpretation of 
what you find. If you are in doubt as to a particular definition or question – state your assumption 
and move on.
There are 4 parts to this assignment. The assessment for this subject differs depending on your 
enrolment. All students are to submit answers to questions in parts I and II. 
For those enrolled in ECON30025 they are only to answer the non-stared questions in part III
(Parts a to e) and not part IV. 
Students enrolled in ECOM90020 are also to answer all questions in part III (including the 
starred ones) and the question in part IV. 
J. Hirschberg ECON30025/ECOM90020
Part I. (10pts) 
1. (4 pts) List all the errors in the following code. Fix them and run the code. 
DATA CLASS;
INPUT NAME $ SEX AGE HEIGHT WEIGHT; 
CAROL F 14 62.8 102.5
HENRY 14 63.S 102.5
JAMES M 12 57.3 B3.o 
ALFRED M 14 69.Ø 112.5
ROBERT M 12 64.8 128:0 
RONALD M 15 67.0 133.0.
ALICE F 13 56.5 84.0
BARBARA F 13 65.3 98.0
JEFFREY M 13 62.5 -84.0
JOHN M 12 59.0 99.5
JOYCE F 11 51.3 50.5
RUN
2. (6pts) Answer the following questions. 
(a)(2pt) Briefly explain what the following code does after the code from part 1. 
DATA CLASS1;
SET CLASS;
BY AGE WEIGHT; 
IF ?? THEN OUTPUT;
RUN;
(b)(1pt) Can the code in I.2a immediately follow the code in I.1? 
(c)(1pt) Replace the ?? in this code to define the CLASS1 data set as the heaviest students in each 
age? Print out CLASS1. 
(d)(1pt) Generate a new data set called CLASS2 that computes the Body-Mass index (BMI) (where 
2 703( / ) BMI weight height = ) for all students in CLASS. 
(e)(1pt) Create another data set called CLASS3 of the students in each age group with the lowest 
BMI and print it out. 
Part II. (5 pts) Linear Algebra 
 Consider a system of linear equations defined as:
1234
1 2 34
34 2
3 4 3 1
3
 2 3 3 -9
10 2 4 5 3
 5 2
 2 6 5 14
4
5
xxxx
x x xx
xx x
x x xx
x
+++=
−+ = −
+ =
+ + −+
=
1.(3 pts) Define the matrix A, and the vectors b and x where this system of equations is 
equivalent to the expression: Ax = b . Then modify the IML routine lms_example1 to solve for the 
elements of x. 
2.(2 pts) We now have two additional equations to include defined as: 
3 42
14 2
 2 12 3
 10 5 
7 x xx
xx x
− + =
+ +=
J. Hirschberg ECON30025/ECOM90020
Describe how one might find a solution that for the vector x. By modifying the IML routine you 
used for part 1 above compute this solution. (Hint: We might consider the minimization of the 
squared errors). 
Part III. (5pts-ECON30025 /8pts-ECOM90020) The World's Super Yachts
1
 This question requires you to consider two data series. This assignment combines the 
methods used in the AFL football attendance program and the multivariate statistics routines.
Recent events have brought to the fore the existence of Super Yachts that are owned by 
individuals of very high wealth. The syt data set read by the code below, is a sample of over 1000 
so-called super yachts and their characteristics.2
 These characteristics include: country of the owner, 
the value in US$, the length in metres, number of crew, number of guests, and the year in which it 
was built.
 The other dataset to consider is the United Nations' (UN) cross country series entitled 
un_plus we considered for the principal component analysis example with quality of life indicators 
(see PCA_Example.sas). This data set records several country specific characteristics that include 
GDP per capita, population, infant mortality rates, and many other national characteristics.3
I have written a program to read these data series from the subject datasets on line as listed 
below. It is called assign1Q3_22.sas you should use this at the start of your routine. If you have any 
questions about the interpretation of the UN variables go to the source website.
assign1Q3_22.sas
Read a file of data found from lists of Super Yachts on line.
These data list the value of the Yachts and other characteristics 
by country of their owners.
filename csvFile1 url 
"https://www.online.fbe.unimelb.edu.au/t_drive/ECOM/ECOM90020/data/Super_Yachts_v.csv" 
 
1 https://edition.cnn.com/travel/article/skyscraper-superyacht-concept/index.html 
2
 The list provided here is a modification of information that can be found on the internet.
3 Most of these variables can be found at http://hdr.undp.org/en/data. Note we add c_n for the country number to the 
dataset.
J. Hirschberg ECON30025/ECOM90020
termstr=crlf;
proc import datafile=csvFile1 out=syt replace dbms=csv; run; 
data syt ; set syt ;
age = 2022 - yr; 
label
Value = Estimated value (mill $US)
Guests = Number of Guests that can be accommodated
Crew = Number of Crew
yr = Year Built
size = Length in metres
up_cnty = Country of Owner
c_n = Country Number
age = Age of Yacht
run; 
The United Nations' (UN) cross country series. 
As considered for the principal component analysis example with quality of life indicators 
(see PCA_Example.sas). This data set records several country specific characteristics that 
include GDP per capita, population, infant mortality rates, and many other national 
characteristics.
;
filename csvFile2 url 
"https://www.online.fbe.unimelb.edu.au/t_drive/ECOM/ECOM90020/data/un_plus.csv" 
termstr=crlf;
proc import datafile=csvFile2 out=un_plus replace dbms=csv; run; 
data un_plus ; set un_plus ; 
productivity = productivity / 1000 ; * rescale productivity to be in 1,000s ; 
gini = gini * 100 ; * rescale the gini coefficient ; 
Define the uppercase name of the country and change some of the country names 
on the UN data set. As assembled from: http://hdr.undp.org/en/data
up_cnty = upcase(country) ; 
if up_cnty = "CZECH REPUBLIC" then up_cnty = "CZECHIA"; 
if up_cnty = "BURKIAN FASO" then up_cnty = "BURKINA FASO"; 
label 
ARTICLES= "Scientific articles per capita"
CO2= "Carbon dioxide emissions per capita, (tonnes), 2011"
EMP_RATIO_15= "Employment to population ratio, (% ages 15 and older), 2013"
EN_SEC= "Gross enrolment ratio, Secondary, (% of secondary school age population), 2008"
EN_TER= "Gross enrolment ratio, Tertiary, (% of tertiary school age population), 2008-2012"
EQ_MATH= "Education quality, Performance of 15-year-old students, Mathematics, 2012"
EQ_READING= "Education quality, Performance of 15-year-old students, Reading, 2012"
EQ_SCIENCE= "Education quality, Performance of 15-year-old students, Science, 2012"
EQ_SEC= "Education quality, Population with at least some secondary education, (% ages 25+)"
FER_2010= "Total fertility rate, (births per woman),2010/2015"
GDP_CAP= "Gross Domestic Product per capita"
GINI= "GINI index (World Bank estimate)"
GR_HDI= "Average annual HDI growth, (%), 1990-2014"
HDI= "Rank of Human Development Index 2013"
HDI_VALUE= "HDI, Value, 2014"
IMMIGRANTS= "Human mobility, Stock of immigrants (% of population), 2013"
INEQ_GEN= "Gender Inequality Index Value, 2014"
INEQ_PALMA= "Income inequality, Palma ratio20052013"
INEQU_GINI= "Income inequality, Gini coefficient, 2005-2013"
INEQU_QUIN= "Income inequality, Quintile ratio, 2005-2013"
INT_STUDENTS= "Human mobility, International student mobility, (% of tertiary enrolment)"
J. Hirschberg ECON30025/ECOM90020
INTERNET= "Communication, Internet users, (% of population), 2014"
LGDP_CAP= "Log GDP per capita"
LIFE_EXP= "Healthy life expectancy at birth"
MED_AGE= "Population, Median age, (years), 2015"
MIGRATION= "Human mobility, Net migration rate, (per 1,000 people), 2010/2015"
MORT_INF= "Mortality rates, (per 1,000 live births), Infant, 2013"
POP= "Population, Total, (millions), 2014"
POP_GR_2000= "Population, Average annual growth=, 2000/2005"
POP_GR_2010= "Population, Average annual growth=, 2010/2015"
POP_OV_65= "Population, Ages 65 and older, (millions), 2014"
POP_URBAN= "Population, Urban, (%), 2014"
PRIS_POP= "Prison population, (per 100,000 people), 20022013"
PRODUCTIVITY= "Labour productivity, Output per worker, (2011 PPP $), 2005-2012"
R_N_D= "Research and development expenditure (% of GDP), 20052012"
SCH_YR_F= "Mean years of schooling(years), Female, 2014"
SCH_YR_M= "Mean years of schooling(years), Male, 2014"
SEX_RATIO= "Sex ratio at birth, (male to female births), 2010/2015"
SOILSUIT= "Soil fertility"
TEMP= "Geographic temperature average 1961-1990"
TOURISTS= "Human mobility, International inbound tourists, (thousands), 2013"
WOMEN_MPS= "Share of seats in parliament (% held by women), 2014"
 ; 
run; 
proc sort data = un_plus ; by up_cnty ; run; 
data un_plus ; set un_plus ; c_n = _n_ ;
label c_n = Country Number ; run;
a) (1pt) Using the proc sgscatter create the scatter plots of how the value of the yachts in syt
varies with the characteristics of the yachts. As in the example below with a variable y on the 
y-axis and x1 and x2 on the x-axis.4
proc sgscatter data=syt ;
plot (y ) * (x1 x2 ) / columns = 2 
loess=( )reg=(degree=3) datalabel = up_cnty; run;
b) (1pt) Again using the super yacht data (syt) estimate an hedonic regression with proc reg to 
predict the value of the yacht based on the characteristics of the yacht. Do the signs and 
significance of the coefficients match your prior opinions? 
c) (1pt) Sort the super yacht data by country number (keep the country name as an id variable) 
and compute the averages, the median and the maximum for the value and the characteristics 
by country using the code listed below to create c_syt. Once this is done sort both the new 
datasets c_syt and un_plus data by country using the code given below: 
 proc sort data = syt; by c_n ; run; 
proc summary data = syt ; by c_n ; id up_cnty ; 
var value crew guests yr size age; 
output out = c_syt 
mean = avg_value avg_crew avg_guests avg_yr avg_size avg_age
median = med_value med_crew med_guests med_yr med_size med_age
 
4 The loess line is a non-parametric fit to the data and the reg line is a 3rd order polynomial.
J. Hirschberg ECON30025/ECOM90020
max = max_value max_crew max_guests max_yr max_size max_age; run; 
d) (1 pt) Once this is done sort the dataset un_plus data by country number and merge it with 
c_syt using the code given below: What is in the result? What do you learn from this? 
 proc sort data = un_plus ; by c_n ; run; 
data match miss1 miss2 ; merge un_plus(in=i1) c_syt(in=i2) ; by c_n ; 
if i1 & i2 then output match ; 
if i1 & not i2 then output miss1 ; 
if i2 & not i1 then output miss2 ; 
run; 
e) (1 pt) Using the data set match create two new variables that could be considered measures 
of wealth inequality in each country. The first is the number of times greater than the average 
income (assuming it is equal to GDP per capita) is the average value of a super yacht owned 
by someone in the country ( equivalent to the number of years it would take at that income 
level to buy it). The other is the proportion of the population that owns a super yacht by 
assuming only one super yacht per owner. 5
 Using Proc sgscatter, plot the two new 
variables y-axis and the UN variables on the x-axis. Limit your analysis to no more than five 
variables and choose ones that you can justify using 
f) *(1 pt) Use the Proc Princomp routine to compute the principal components of the average 
yacht characteristics excluding the value using the full data set (syt). Base this computation 
on the correlation matrix of the characteristics. Interpret the results of this routine by 
commenting on the variables that have the greatest influence on the first two components. 
g) *(2 pt) Construct a new variable formed by the ratio of the average of value to the average 
size in the data matched to the UN data set (match). Find a regression relationship that best 
explains the variation in this new variable when only including the average age of the yachts 
and at least two variables from the UN data. Interpret the signs of the estimated coefficients 
that you estimate.
*Part IV 3-D Fun (2 pts-ECOM90020) 
 Consider the following function: 
1 2 46 2 1
6 min(log( ( , ) ),5), where ( , ) 2 1.05 q f xy f x y x x x xy y − = = − + ++
Where 5 5, and 5 5 x y −≤ ≤ − ≤ ≤ with increments of x and y of .1. 
1. *(1 pt) Using a program similar to the one that we used to plot the Hat in three dimensions 
(three_d_plot), construct a 3-D plot of this function. Try changing the perspective to obtain 
the best view.
2. *(1 pt) Use the contour program to locate the extreme values of this function in the 
neighbourhood of the range of the xs and ys specified. Identify the values of x and y where 
these points occur. Is there a global extremum? 
 
 Recall that the _freq_ variable in the c_syt data set is the count of the values used in the computation for each country.
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!