首页 > > 详细

辅导INMR77 解析asp编程、asp编程解析

 
 
Informatics MSc Programme Area 
Henley Business School 
University of Reading 
 
Assessed Coursework Set Front Page 
 
 
Module code: INMR77 
Module name: Business Intelligence and Data Mining 
Lecturer responsible: Dr Yin Leng Tan 
 
Work to be handed in by: 
Full time students: 26 May 2020 
Part time students: 15 June 2020 
 
Assignment Specification 
 
The module is assessed 100% through this coursework assignment. 
 
The aim of this coursework is to assess your understanding of business intelligence and ability 
to perform data mining tasks by applying concepts, methods and techniques learned during 
the lectures and practical sessions. 
 
The coursework is carried out individually. Students are required to produce an individual 
report for the tasks as set out below. The complete report should not exceed 20 pages of A4 
(with a variation of 20%) with a minimum font size of 10, including tables and diagrams but 
excluding references and appendices. An appendix can be used to include more detailed 
materials to back up main body points but will not be assessed. In addition, you are also 
required to submit the supplementary materials of your output from SAS Enterprise Miner via 
blackboard by the specified deadline. 
 
 
Case Study - Airbnb and Inside Airbnb 
 
Airbnb - Holiday Lets, Homes, Experiences Places (airbnb.co.uk) 
 
Airbnb is an online marketplace for arranging or offering lodging i.e. temporary 
accommodation, primarily homestays, or tourism experiences. It was founded in August 2008 
and has 12,736 employees as of 2019. 
 
Service overview: Airbnb provides a platform for hosts to accommodate guests with short-term 
lodging and tourism-related activities. Guest can search for accommodation using filters such 
as location, price, and specific types of homes. Before booking, users must provide personal 
and payment information. Some hosts also require a scan of government-issued identification 
before accepting a reservation. Hosts provide prices and other details for their rental or listing 
e.g. number of guests included in the price, type of property, type of room, number of 
bathrooms, number of bedrooms, number of beds and type of bed, minimum number of nights 
for a reservation, and amenities. In addition, Airbnb also provides a review system where hosts 
and guests can leave reviews about their experience, and rate each other after a stay. By 
October 2019, two million people were staying with Airbnb each night. 
 
Cancellation policy: Airbnb allows hosts to choose between five types of cancellation policies, 
made to protect both hosts and guests. Options include: strict_14_with_grace_period, 
moderate, flexible, super_strict_30, super_strict_60. 
(see https://www.airbnb.co.uk/home/cancellation_policies for definition for each categories) 
 
Security Deposits: some reservations include a security deposit, which can be required by either 
Airbnb or the host. This helps build trust for both guests and hosts. Some hosts require a 
security deposit for their listing. If you are a guest and you are booking a listing with a host with 
host-required security deposit, you will be shown the amount before you make your 
reservation. The amount is set by the host, not Airbnb. In this case, no authorisation hold will 
be placed, and you will only be charged if a host makes a claim on the security deposit. 
(see https://www.airbnb.co.uk/help/article/140/how-does-airbnb-handle-security-deposits 
 
Sources: Wikipedia, Airbnb.co.uk 
Further information of Airbnb, please visit: https://www.airbnb.co.uk/ 
 
 
Inside Airbnb – adding data to the debate (http://insideairbnb.com/index.html) 
 
Inside Airbnb is an independent, non-commercial set of tools and data that allows an individual 
to explore how Airbnb is really used in cities around the world. It was set up by Murray Cox and 
John Morries in 2016. 
 
Airbnb claims to be part of the “sharing economy” and disrupting the hotel industry. However, 
data shows that the majority of Airbnb listings in most cities are entire homes, many of which 
are rented all year round – disrupting housing and communities. For example, local residents 
and governments are more concerned with people who are not present when the rental takes 
place and those who have multiple listing on the site, as opposed to a user who is renting a 
spare room. 
 
By analysing publicly available information about a city’s Airbnb’s listings, Inside Airbnb 
provides filters and key metrics so user can see how Airbnb is being used to compete with the 
residential housing market. With Inside Airbnb, user can ask fundamental questions about 
Ainrbnb in any neighbourhood, or across the city as a whole, such as: 
• how many listings are in my neighbourhood and where are they? 
• how many houses and apartments are being rented out frequently to tourists and not 
to long-term residents? 
• how much are hosts making from renting to tourists (compare that to long-term 
rentals)? 
• which host are running a business with a multiple listings and where are they? 
 
These questions (and the answers) get to the core of the debate for many cities around the 
world, with Airbnb claiming that their hosts only occasionally rent the homes in which they live. 
In addition, many city or state legislation or ordinances that address residential housing, short 
term or vacation rentals, and zoning usually make reference to allowed use, including: 
• how many nights a dwelling is rented per year 
• minimum nights stay 
• whether the host is present 
• how many rooms are being rented in a building 
• the number of occupants allowed in a rental 
• whether the listing is licensed 
 
The Inside Airbnb tool or data can be used to answer some of these questions. Some 
understanding of how the Airbnb platform is being used will help clear up the laws as they 
change. 
 
Source: insideairbnb.com 
Further information of Inside Airbnb, please visit: http://insideairbnb.com/index.html 
 
 
Airbnb in Greater Manchester, UK 
 
Dataset: Airbnb_man_reduced.csv (available to download on blackboard), two additional 
datasets man_reviews.csv, and man_calander.csv are also provided for information only. 
 
Description of the dataset: The Airbnb data for Greater Manchester is made available by Inside 
Airbnb. The original data set was downloaded from the website in November 2019. The 
number of variables however is reduced from the original data set. There are 4,848 listings in 
the data set with a total of 57 variables. Each row represents a single listing and contains 
information about the host of the property, the property’s characteristics and overall rating of 
the property, and its associated features by guests. Table 1 shows the name, description, and 
type of the 57 variables. 
 
Table 1: variable name and description of the variable for the dataset. 
 
# Variable Name Description Variable Type 
1. listing_id Unique identifier for each Airbnb 
listing 
Numeric 
2. listing_url url of the listing Text 
3. description Description of the listing Text 
4. house_rule Description of house rules Text 
5. host_id Unique identifier of the host Numeric 
6. host_url url of the host Text 
7. host_name Name of the host Text 
8. host_since Date since the host is a member Date 
9. host_about Description of the host Text 
10. host_response_time How quickly the host responds to 
inquiries. 5 categories: within a day, 
with an hour, a few days or more, 
within a few hours, N/A 
Categorical 
11. host_response_rate Rate at which host responded to 
inquiries (percentage value) 
Numeric 
12. host_is_superhost Is the host a superhost (1 = Yes, 0 = 
No) 
Binary 
13. host_identity_verified Whether the host is verified or not (1 = 
Yes, 0 = No) 
Binary 
14. neighbourhood_cleased Name of the neighbourhood (41 
categories) 
Categorical 
15. borough Name of the borough (10 categories) Categorical 
16. property_type Type of the property (30 categories) Categorical 
17. room_type Type of the room. 4 categories: Entire 
home/apt, Private room, shared room, 
hotel room 
Categorical 
18. accomodates Number of people that can be 
accommodated 
Numeric 
19. bathrooms Number of bathrooms Numeric 
20. bedrooms Number of bedrooms Numeric 
21. beds Number of beds Numeric 
22. bed_type Type of bed. 6 categories: Real Bed, 
Pull-out Sofa, Futon, Couch, Airbed 
Categorical 
23. amenities List of amenities included Text 
24. price Price per night (in GBP) Numeric 
25. weekly_price Price per week (in GBP) Numeric 
26. monthly_price Price per month (in GBP) Numeric 
27. Security_deposit Amount of host-required security 
deposit. 
Numeric 
28. cleaning_fee One-time fee charged by host to cover 
the cost of cleaning their space. 
Numeric 
29. guest_included Number of quests included in the price Numeric 
30. extra_people Additional charge per person (GBP) Numeric 
31. minimum_nights Minimum number of nights for a 
reservation 
Numeric 
32. maximum_nights Maximum number of nights for a 
reservation 
Numeric 
33. calendar_updated Calendar last updated by the host (70 
categories) 
Categorical 
34. has availability Weather the host has availability or 
not (1 = Yes, 0 = No) 
Binary 
35. availability_30 Number of days available for the next 
30 days 
Numeric 
36. availability_60 Number of days available for the next 
60 days 
Numeric 
37. availability_90 Number of days available for the next 
90 days 
Numeric 
38. availability _365 Number of days available for the next 
365 days 
Numeric 
39. number_reviews number of reviews in total Numeric 
40. first_review Date of first review Date/Time 
41. last_review Date of last review Date/Time 
42. review_scores_rating Overall rating of the property 
(percentage value) 
Numeric 
43. review_scores_accuracy Rating for the accuracy of the 
description 
Numeric 
44. review_scores_cleanliness Rating for the cleanliness of the 
property 
Numeric 
45. review_scores_checkin Rating for the check in experience Numeric 
46. review_scores_communication Rating for the host communication 
with guests 
Numeric 
47. review_scores_location Rating for the location of the property Numeric 
48. review_scores_value Rating for the value of the property Numeric 
49. instant_bookable Whether the property can be booked 
in an instance (1 = Yes, 0 = No) 
Binary 
50. cancellation_policy The cancellation policy for the host. 5 
categories: 
strict_14_with_grace_period, 
moderate, flexible, super_strict_30, 
super_strict_60 
Categorical 
51. require_guest_profile_picture Whether guest profile picture is 
required or not (1= Yes, 0 = No) 
Binary 
52. require_guest_phone_verificati 
on 
Whether guest phone verification is 
required or not (1= Yes, 0 = No) 
Binary 
53. host_listings_count The number of listings of the host Numeric 
54. host_listings_count_entire_ho 
mes 
The number of listings of the entire 
home 
Numeric 
55. host_listings_count_private_ro 
oms 
The number of listings of private 
rooms 
Numeric 
56. host_listings_count_shared_roo 
ms 
The number of listing of shared rooms Numeric 
57. reviews_per_month Number of reviews per month for the 
property 
numeric 
 
The local government and residents would like to know how Airbnb is used in the region and 
seek your help on this. They would particularly like to know how many of the listings/hosts are 
offering lodging and not running as a business i.e. temporary accommodation, primarily 
homestays, or tourism experiences and, as opposed to hosts offering long term let with 
multiple listing with no owner present (likely to be running a business) which could be illegal. 
You goals are to: 
 
a) identify clusters of listings based on different (or a combination) set of variables e.g. 
host’s characteristics, listings/property’s characteristics and availability, and reviews 
from guests so as to provide insights to the local government and residents. 
 
Note: The are many measurements could be used to differentiate the two e.g. single 
listing vs multiple listings although a host may list separate rooms in the same 
apartment, or multiple apartments or entire homes. Availability is another measure, 
likewise, occupancy. You are asked to justify the variables/measurements used for your 
clustering tasks. Greater Manchester uses the following parameters for the 
measurements: 
• a high availability metric and filter of 60 days per year 
• a frequent rented filter of 60 days per year 
• a review rate of 50% for the number of guests marking a booking who leave a 
review 
• an average booking of 3 nights unless a higher minimum nights is configured 
for a listing 
• a maximum occupancy rate of 70% to ensure the occupancy model does not 
produce artificially high results based on the available data (see 
http://insideairbnb.com/greater- 
manchester/?neighbourhood=filterEntireHomes=falsefilterHighlyAv 
ailable=falsefilterRecentReviews=falsefilterMultiListings=false 
 
b) select what you think is the best segmentation/clustering based on the results obtained 
in a) and comment on the characteristics. E.g. clusters that best separate between those 
are genuine lodging vs those could be illegal i.e. running as a business. 
 
c) develop a classification model to identify those are genuine listings/host vs those could 
be considered illegal based on your results obtained in b). 
 
 
Useful information/websites: 
• Clampter (2014) Airbnb in NYC: The Real Numbers Beind the Sharing Story – available 
at https://skift.com/2014/02/13/airbnb-in-nyc-the-real-numbers-behind-the-sharing- 
story/ 
• Inside Airbnb http://insideairbnb.com/index.html 
 
 
 
What to deliver in the final report: 
You report should include the following sections: 
 
1. Introduction: This should include background of Airbnb and Inside Airbnb, 
opportunities and challenges of the sharing economy to the business (Airbnb), home 
owners (hosts), local residents and governments, and guests/tourists, and how 
business intelligence and data mining could be used to address the opportunities and 
challenges for the various stakeholders. It should also outline how the report is 
structured. Justify your answer with examples/data and findings from literature and 
related work in this area. 
 
2. Model building and Results Discussion 
a) Identify clusters of listings 
In this section, you should discuss the purpose of the data mining tasks, the data 
mining process, including data exploration and data preparation/preprocessing, 
and approaches taken e.g. variables used for the clustering. You are expected to 
justify and discuss any action/decision you made during the data mining process 
and models building, make references to your output in SAS Enterprise Miner 
within your report where necessary. 
 
Note: In deciding what k to use (and also how many variables to include), the 
following factors should be considered: How distinct are the clusters? Is good 
separation achieved? How consistent are they? If cluster#1 shows low values on 
one measure, does it also show low value on other measures. How simple are they 
to describe? Simple clusters are more interpretable by domain knowledge experts, 
easier to take action on, and are more likely to be statistically stable and not the 
result of random chance. 
 
b) Discuss what is the best segmentation/clustering based on the results obtained 
from the process in a). You should discuss what you think is the best 
segmentation and comment on the characteristic of these clusters. Consider how 
this information could be used by local government and residents. Use 
screenshots and/or make references to your output in SAS Enterprise Miner to 
illustrate important and interesting findings where necessary. 
 
c) Develop a classification model that classify the data into these segments. 
In this section, you should discuss the purpose of the data mining, including the 
target segment/cluster, the data mining process, including data 
preparation/preprocessing, and rationale and approaches taken e.g. variables used 
for the model building. You are expected to justify and discuss any action/decision 
you made during the data mining process and models building, as well as model 
evaluation, make references to your output in SAS Enterprise Miner within your 
report where necessary. 
 
3. Conclusion, critical evaluation and suggestion for improvement 
In this section, you are required to conclude and provide a summary of your key 
findings, and discuss the limitations of your data models/mining/analyses and 
suggestion for improvement by taking into consideration current research issues in 
data mining. 
 
 
 
 
 
 
The criteria used for grading assignment: 
 
Aspects/Criteria % Range Descriptors 
Introduction 
(ILO-1, ILO3, ILO5) 
 
70% and 
above 
A highly effective introduction, setting context and 
indicating content that will follow. 
Wide background reading; novel examples and use of 
relevant literature/sources in supporting the 
arguments/viewpoints. 
60-69% A very good introduction, setting context and indicating 
content that will follow. 
Good background reading; generally very good use of 
examples and relevant sources/literature in supporting 
the arguments/viewpoints. 
50-59% Adequate introduction incorporating one or more of 
the above, yet lacking in clarity in some area(s). 
Good use examples and sources/literature in 
supporting the arguments/viewpoints. 
49% and 
below 
A basic introduction with a narrow or limited reference 
to defining the area, setting the context and indicating 
content that will follow. 
Little evidence of appropriate reading or ability to 
synthesise information. No or little examples given. 
Model Building, 
Results Discussion 
and Model 
Evaluation 
(ILO2, ILO3, ILO4, 
ILO6) 
 
70% and 
above 
Novel and originality. A coherent, well focused, original 
approaches in the model building, entirely relevant to 
the tasks with excellent support and justifications for 
the variables, techniques used for the modelling. 
Excellent discussion and interpretation of the obtained 
results/analysis with original insights. 
Excellent model evaluations and comparisons provided 
with clear evidence of critical analysis of findings. 
60-69% A generally clear and coherent discussion with good 
support or justification for the model building, which is 
directly relevant to the tasks. Clear rationale for the 
approaches taken. 
Very good discussion and interpretation of the 
obtained results/analysis. 
Very good model evaluations and comparisons 
provided with some critical analysis of findings. 
50-59% Reasonable attempt of the modelling but prone to 
being descriptive or narrative; little rationale for the 
approaches taken or justification of the variable used. 
Generally relevant to the stated tasks. 
Reasonable discussion and interpretation of the 
obtained results/analysis. 
Reasonable discussion of model evaluations and 
comparisons though with little evidence of critical 
analysis of findings. 
49% and 
below 
Little discussion and evidence of model building. 
Failure to understand the purpose of the task. 
Little discussion and interpretation of the obtained 
results/analysis. 
Little or no discussion of model evaluations and 
comparisons 
Conclusion, critical 
evaluation and 
future 
improvements 
(ILO1, ILO5 and 
ILO6) 
70% and 
above 
Comprehensive and extremely well discussed with 
original insights drawing from the analyses conducted 
and suggestion for future improvements. 
 
69-69% Very well discussed with interesting insight, drawing 
from the results/analyses conducted. Very good critical 
evaluation and suggestion for future improvement. 
 
50-59% Reasonably discussed but prone to being descriptive 
with little critical analysis based on the results/analyses 
conducted. Generally relevant to the stated tasks. 
Some critical analysis but prone to being descriptive or 
narrative; evidence supports the conclusion, but not 
always very directly /clearly. The question is not fully 
addressed. 
 
49% and 
below 
Largely descriptive. The discussion is limited in scope 
and/or relevance. The question is only partially 
addressed. 
 
 
 
 
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!