首页
编程语言
数据库
网络开发
Algorithm算法
移动开发
系统相关
金融统计
人工智能
其他
首页
>
> 详细
CMPT 459.1-19辅导、讲解database、辅导Java/Python程序设计、讲解Java,c/c++语言 讲解SPSS|辅导留学生
---
title: "CMPT 459.1-19. Programming Assignment 1"
subtitle: "FIFA 19Players"
author: "Name - Student ID"
output: html_notebook
---
### Introduction
The data has detailed attributes for every player registered in
the latest edition of FIFA 19 database, obtained scraping the
website “sofifa.com”. Each instance is a different player, and
the attributes give basic information about the players and
their football skills. Basic pre-processing was done and Goal
Keepers were removed for this assignment.
Please look here for the original data overview and attributes’
descriptions:
- https://www.kaggle.com/karangadiya/fifa19
And here to get a better view of the information:
- https://sofifa.com/
---
### First look
**[Task 1]**: Load the dataset, completing the code below (keep
the dataframe name as **fifa**)
```{r}
# Loading
fifa <- read.csv("fifa.csv")
```
**[Checkpoint 1]**: How many rows and columns exist?
```{r}
cat(ifelse(all(dim(fifa) == c(16122, 68)), "Correct results!",
"Wrong results.."))
```
---**[Task 2]**: Give a very brief overview of the types of each
attribute and their values. **HINT**: Functions *str*, *table*,
*summary*.
```{r}
# Overview
str(fifa)
```
**[Checkpoint 2]**: Were functions used to display data types
and give some idea of the information of the attributes?
---
### Data Cleaning
Functions suggested to use on this part: *ifelse*, *substr*,
*nchar*, *str_split*, *map_dbl*.
Five attributes need to be cleaned.
- **Value**: Remove euro character, deal with ending
"K" (thousands) and "M" (millions), define missing values and
make it numeric.
- **Wage**: Same as above.
- **Release.Clause**: Same as above.
- **Height**: Convert to "cm" and make it numeric.
- **Weight**: Remove "lbs" and make it numeric.
**[Task 3]**: The first 3 of the 5 attributes listed above that
need to be cleaned are very alike. Create only one function to
clean them the same way. This function should get the vector of
attribute values as parameter and return it cleaned, so use it
three times, each with one of the columns. **Encode zeroes or
blank as NA.**
```{r}
# Function used to clean attributes
library(stringr)
attr_fix <- function(attribute){
cleaned_attribute = str_split(attribute, gsub, pattern='€',
replacement='')
return(cleaned_attribute)}
# Cleaning attributes
fifa$Value <- attr_fix(fifa$Value)
fifa$Wage <- attr_fix(fifa$Wage)
fifa$Release.Clause <- attr_fix(fifa$Release.Clause)
```
**[Checkpoint 3]**: How many NA values?
```{r}
cat(ifelse(sum(is.na(fifa))==1779, "Correct results!", "Wrong
results.."))
```
---
**[Task 4]**: Clean the other two attributes. **Hint**: To
convert to "cm" use http://www.sengpielaudio.com/calculatorbodylength.htm.
```{r}
# Cleaning attribute Weight:
```
```{r}
# Cleaning attribute Height:
```
**[Checkpoint 4]**: What are the mean values of these two
columns?
```{r}
cat(ifelse(all(c(round(mean(fifa[,8]),4)==164.1339,
round(mean(fifa[,7]),4)==180.3887)), "Correct results!", "Wrong
results.."))
```
---
### Missing Values
**[Task 5]**: What columns have missing values? List them below
(Replace
). Impute (so do not remove) values missing (that is all NA found) and explain the reasons for the
method used. Suggestion: MICE imputation based on random
forests .R package mice: https://www.ncbi.nlm.nih.gov/pmc/
articles/PMC3074241/, Use *set.seed(1)*. **HINT**: Remember to
not use "ID" nor "International.Reputation" for the imputation,
if MAR (Missing at Random) is considered. Also later remember to
put them back to the "fifa" dataframe.
Columns with missing values:
-
-
- ...
```{r}
# Handling NA values
```
```{r}
# Putting columns not used on imputation back into "fifa"
dataframe
```
**[Checkpoint 5]**: How many instances have at least one NA? It
should be 0 now. How many columns are there? It should be 68
(remember to put back "ID" and "International.Reputation").
```{r}
cat(ifelse(all(sum(is.na(fifa))==0, ncol(fifa)==68), "Correct
results!", "Wrong results.."))
```
---
### Feature Engineering
**[Task 6]**: Create a new attribute called "Position.Rating"
that has the rating value of the position corresponding to the
player. For example, if the player has the value "CF" on the
attribute "Position", then "Position.Rating" should have the
number on the "CF" attribute. **After that, remove the
"Position" attribute from the data**.```{r}
# Creating the attribute "Position.Rating"
```
```{r}
# Removing the attribute "Position"
```
**[Checkpoint 6]**: What's the mean of the "Position.Rating"
attribute created? How many columns are there in the dataframe?
It should be 68 (remember to remove "Position").
```{r}
cat(ifelse(all(c(round(mean(fifa$Position.Rating),5) ==
66.87067, ncol(fifa)==68)), "Correct results!", "Wrong
results.."))
```
---
### Dimension Reduction
**[Task 7]**: Performe PCA (Principal Component Analysis) on the
columns representing ratings of positions (that is, attributes:
LS, ST, RS, LW, LF, CF, RF, RW, LAM, CAM, RAM, LM, LCM, CM, RCM,
RM, LWB, LDM, CDM, RDM, RWB, LB, LCB, CB, RCB, RB). Show the
summary of the components obtained. **Keep the minimum number of
components to have at least 98.50% of the variance explained by
them.**. Remove the columns used for PCA. **HINT**: Function
*prcomp*, remember to center and scale.
```{r}
# Perform PCA
# Show Summary
```
```{r}# Put the components back into "fifa" dataframe
# Remove original columns used for PCA
```
**[Checkpoint 7]**: How many columns exist in the dataset? It
should be 45.
```{r}
cat(ifelse(ncol(fifa)==45, "Correct results!", "Wrong
results.."))
```
**[Bonus]**: Use the code below to see which columns influenced
the most each component graphically. Replace "fifa.pca" with the
object result from the use of *prcomp* function.
```{r}
library(factoextra)
fviz_pca_var(fifa.pca,
col.var = "contrib", # Color by contributions to
the PC
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = TRUE # Avoid text overlapping
)
```
---
### Binarization
**[Task 8]**: Perform binarization on the following categorical
attributes: "Preferred.Foot" and "Work.Rate". **HINT**: R
package "dummies", function *dummy.data.frame*.
```{r}
# Binarize categorical attributes
```
**[Checkpoint 8]**: How many columns exist in the dataset? It
should be 54.
```{r}
cat(ifelse(ncol(fifa)==54, "Correct results!", "Wrong
results.."))```
---
### Normalization
**[Task 9]**: Remove attribute "ID" from "fifa" dataframe, save
attribute "International.Reputation" on vector named "IntRep"
and then also remove "International.Reputation" from "fifa"
dataframe. Perform z-score normalization on "fifa", except for
columns that came from PCA. Finally combine the normalized
attributes with those from PCA saving on "fifa" dataframe.
**HINT**: Function *scale*.
```{r}
# Normalize with Z-Score
```
**[Checkpoint 9]**: How many columns exist in the dataset? It
should be 52. What's the mean of all the means of the
attributes? Should be around zero.
```{r}
cat(ifelse(ncol(fifa)==52, "Correct results!", "Wrong
results.."))
```
---
### K-Means
**[Task 9]**: Perform K-Means for values of K ranging from 2 to
15. Find the best number of clusters for K-means clustering,
based on the silhouette score. Report the best number of
clusters and the silhouette score for the corresponding
clustering (Replace
below). How strong is the
discovered cluster structure? (Replace
below) Use
"set.seed(1)". **HINT**: Function *kmeans* (make use of
parameters *nstart* and *iter.max*) and *silhouette* (from
package "cluster").
```{r}
# K-Means and Silhouette scores```
Results found:
- Best number of clusters:
- Silhouette score:
- How strong is the cluster?
**[Checkpoint 9]**: Are there silhouette scores for K-Means with
K ranging from 2 to 15? Were the best K and correspondent
silhouette score reported?
---
**[Task 10]**: Perform K-means with the K chosen and get the
resulting groups. Try out several pairs of attributes and
produce scatter plots of the clustering from task 9 for these
pairs of attributes. By inspecting these plots, determine a pair
of attributes for which the clusters are relatively wellseparated
and submit the corresponding scatter plot.
```{r}
# K-Means for best K and Plot
```
**[Checkpoint 10]**: Is there at least one plot showing two
attributes and the groups (colored or circled) reasonably
separated?
---
### Hierarchical Clustering
**[Task 11]**: Sample randomly 1% of the data (set.seed(1)).
Perform hierarchical cluster analysis on the dataset using the
algorithms complete linkage, average linkage and single linkage.
Plot the dendrograms resulting from the different methods (three
methods should be applied on the same 1% sample). Discuss the
commonalities and differences between the three dendrograms and
try to explain the reasons leading to the differences (Replace
the
below).
```{r}# Sample and calculate distances
```
```{r}
# Complete
```
```{r}
# Average
```
```{r}
# Single
```
Discussion:
-
**[Checkpoint 11]**: Does the discussion show commonalities and
differences between the three dendrograms and explain the
differences?
---
### Clustering comparison
**[Task 12]**: Now perform hierarchical cluster analysis on the
**ENTIRE dataset** using the algorithms complete linkage,
average linkage and single linkage. Cut all of the three
dendrograms from task 11 to obtain a flat clustering with the
number of clusters determined as the best number in task 9.
To perform an external validation of the clustering results, use
the vector "IntRep"" created. What is the Rand Index for the
best K-means clustering? And what are the values of the Rand
Index for the flat clusterings obtained in this task from
complete linkage, average linkage and single linkage? Discuss the results (Replace
below). **HINT**: Function
*cluster_similarity* from package "clusteval".
```{r}
# Hierarchical Clusterings (Complete, Average and Single)
```
```{r}
# Flat Clusterings
```
```{r}
# Cluster Similarities
```
Discussion:
-
**[Checkpoint 12]**: Does the discussion include relevant
comparison of the clusters and makes sense?
。grid200,20pixel。
gl.bufferData(gl.ARRAY_BUFFER, 200*4*8, gl.STATIC_DRAW);allocate
buffer。sizesquare。200,4
vertex(triangle-stripes),vertex 8 bytes,
联系我们
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-21:00
微信:codinghelp
热点文章
更多
辅导 comm2000 creating socia...
2026-01-08
讲解 isen1000 – introductio...
2026-01-08
讲解 cme213 radix sort讲解 c...
2026-01-08
辅导 csc370 database讲解 迭代
2026-01-08
讲解 ca2401 a list of colleg...
2026-01-08
讲解 nfe2140 midi scale play...
2026-01-08
讲解 ca2401 the universal li...
2026-01-08
辅导 engg7302 advanced compu...
2026-01-08
辅导 comp331/557 – class te...
2026-01-08
讲解 soft2412 comp9412 exam辅...
2026-01-08
讲解 scenario # 1 honesty讲解...
2026-01-08
讲解 002499 accounting infor...
2026-01-08
讲解 comp9313 2021t3 project...
2026-01-08
讲解 stat1201 analysis of sc...
2026-01-08
辅导 stat5611: statistical m...
2026-01-08
辅导 mth2010-mth2015 - multi...
2026-01-08
辅导 eeet2387 switched mode ...
2026-01-08
讲解 an online payment servi...
2026-01-08
讲解 textfilter辅导 r语言
2026-01-08
讲解 rutgers ece 434 linux o...
2026-01-08
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 99515681 微信:codinghelp
© 2024
www.7daixie.com
站长地图
程序辅导网!