首页 > > 详细

辅导 Storytelling with data讲解 留学生SQL 程序

Assignment Part A: Storytelling with data (30 marks)

Statistics is the science of learning from data. Turning data into information is a critical aspect of decision-making in business. In this world of big data, storytelling through data has emerged as an important aspect of all data analysis. Complex ideas can be understood easily through storytelling. In this part, we build your storytelling skills by improving your ability to visualise and communicate findings.

In this task, you are required to do the following. Please read carefully.

Problem background: House Prices

Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that price negotiations influence price much more than the number of bedrooms or a white-picket fence.

With 79 explanatory variables describing (almost) every aspect of residential homes in Ames,

Iowa, this competition challenges you to predict the final price of each home.

The link:https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques

Data definition:

We have selected a sample with fewer variables in the data file provided. Refer to the data   file labelled Assignment Data.xlsx and open the worksheet labelled House Prices. In the   worksheet, you are provided with both numeric and categorical data. Note that this data has already been cleaned for you, and any missing records are removed. The following table contains the data definition.

Column

Column Name

Data Definition

A

ID

The general zoning classification

B

MSZoning

The general zoning classification

RH Residential High Density

RL Residential Low Density

C

LotArea

Lot size in square feet

D

Street

Type of road access

Grvl Gravel

Pave Paved

E

LotShape

General shape of property

Reg Regular

IR1 Slightly irregular

IR2    Moderately Irregular

IR3    Irregular

F

LandContour

Flatness of the property

Lvl Near Flat/Level

Bnk   Banked - Quick and significant rise from street grade

to building

HLS   Hillside - Significant slope from side to side

Low Depression

G

LandSlope

Slope of property

Gtl

Gentle slope

Mod

Moderate Slope

Sev

Severe Slope

H

BldgType

Type of dwelling

1Fam           Single-family Detached

2FmCon        Two-family Conversion; originally built as

one-family dwelling

Duplx

Duplex

TwnhsE

Townhouse End Unit

TwnhsI

Townhouse Inside Unit

I

OverallQual

Overall material and finish quality

10      Very Excellent

9        Excellent

8        Very Good

7 Good

6        Above Average

5        Average

4        Below Average

3 Fair

2 Poor

1 Very Poor


J

OverallCond

Overall condition rating

10 Very Excellent

9 Excellent

8 Very Good

7 Good

6 Above Average

5 Average

4 Below Average

3 Fair

2 Poor

1 Very Poor

K

SalePrice

the property's sale price in dollars. This is the target

variable that you're trying to predict

Using the data set, answer the following questions.

Q1 About the data

In less than 100 words, write a summary of the databased on the following points:

•  what is the data about (do not copy from the website)

•  what information (variables) does it contain

•  select one numerical and one categorical variable from your data set and provide full data classification for each variable you selected.

•  explain your choice of classification for each variable selected above

(4 marks)

Q2 Pivot table

Instruction for Q2

Create a new column L labelled as Price Category” . Use the information provided in Table  1 to  categorise Sale Price (column K). Label each  Sale Price into “High”, “Medium” and Low” price. Use the VLOOKUP function to complete this task. Once you have  done this, filter and select RM and RL categories  for  the MSZoning variable (in column B). Based on the data, answer the following questions.

a.  Construct a pivot table showing a grand total of counts with “Price Category in rows and the two selected general zoning classification (MSZoning) in columns. Label and format the pivot table accordingly. Marks will be deducted for poor presentation. (3 marks)

b.  Using your pivot table from (a), provide one example of each of the following:

•   Marginal probability

•   Conditional probability

•   Joint probability

In your answers, you are required to show the following for each of the three

probabilities:

1.  probability statement

2.  workings

3.   final answers

Your answers must be stated in 2 decimal places. (3 marks)

c. Provide a contextual interpretation for each of the probability values obtained above. (3 marks)

d.  State  two  methods  that  can  be  used  to  investigate  if Price Category and  Zoning

Classification (MSZoning) are related. (2 marks)

e. Use  the  data  provided  and  apply both methods  mentioned  in  your  answer  (d)  to

investigate if the two variables in (d) are related. (6 marks)

Q3.   Visualisation and Overall Summary

In this section, you are required to draw a visualisation and write an overall report on your observations.

a.   Select at least two appropriate variables from your dataset and provide ONE suitable visualisation (graph). Your variable selection here can be similar or different to the variables you selected above. Be sure to select variables that tell an interesting story, as you are required to write a summary in part (b).

Instructions for this task

•   You are required to use Microsoft Excel for this task.

•   Your visualisation must be appropriately labelled and formatted. (4 marks)

b.   Using the visualisation (graph) you provided in your answer above.

•  briefly explain why you have chosen this type of visualisation (graph)

•  write  a  summary    describing  the  main  findings  and  patterns  visible  from  the visualisation.

The word limit here is strictly less than 150 words.  (5 marks)



联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!