首页 > > 详细

辅导 MSBA7012/MACC7022 Individual Assignment 2: Fraudulent Job Post Detection讲解 留学生Python程序

MSBA7012/MACC7022 Individual Assignment 2: Fraudulent Job Post Detection

Deadline: Sunday, April 28, 2024 11:59pm

Objective:

•    Leverage Alteryx to develop a workflow that can preprocess data, engineer features, and build a machine learning model to predict whether a job posting is fraudulent.

Dataset:

•    The Balanced_Fraudulent_Job_Posts.xlsx dataset includes attributes related to job postings, with key columns like 'title', 'company_profile', 'description', 'requirements', 'benefits', and 'fraudulent'.

Tasks:

1.    Data Preprocessing:

•     Use Alteryx to load the dataset and create a new column that combines the textual data in 'title', 'company_profile', 'description', 'requirements', and 'benefits' columns.

•     Perform. text pre-processing on the combined text column.

2.    Feature Engineering:

•     Implement TF-IDF vectorization in Alteryx using the Python Tool to convert the text

data into a numerical format suitable for machine learning.

3.    Model Building and Evaluation:

•     Split the data into a training set and a testing set with a ratio of 70:30.

•     Utilize Alteryx's Forest Model tool to train a model using the training set.

•     Consider the TF-IDF counts only as the model features.

•     Evaluate the model's performance on the testing set through the Model Comparison tool and record the metrics (accuracy, F1-score, AUC, and confusion matrix).

4.    Reporting:

•     Create a report in Word to summarize the model evaluation results and insights into the key factors that help predict fraudulent job postings.

Deliverables:

•    An Alteryx workflow (.yxmd) containing the complete analysis, with annotations explaining each tool and step. Use relative path for workflow dependencies in Alteryx so that the grader can run your program without making any change.

•    A Word document (.docx) summarizing the findings and insights from the model.

•     Compress the above two files into a zip file named with your student ID,e.g., 123456.zip.

•    You should not make any modifications to the input file: Balanced_Fraudulent_Job_Posts.xlsx. Also, DO NOT include this input file in your zip file.

Evaluation Criteria:

•     Correctness   and   completeness   of   the    preprocessing   and   feature    engineering   steps implemented in Alteryx.

•    Accuracy  and  thoroughness  of  the  model  evaluation  and  interpretation  of  results within Alteryx.

•     Quality  and clarity of the final  report,  including  insights  and  conclusions drawn from the analysis.





联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!