讲解PM2.5 dataset、辅导Python设计、讲解Exploratory Analysis、Python辅导
辅导Python编
Project Proposal
Exploratory Analysis & Prediction on PM2.5 dataset
Liu, Jianke - liu.jiank@husky.neu.edu
Krishnamoorthy, Yeshwin - krishnamoorthy.y@husky.neu.edu
Ramakrishnan, Ganapathy Subramaniam - ramakrishnan.ga@husky.neu.edu
Objective
The objective is to analyze the impact of meteorological factors like temperature, pressure and wind direction affecting the aggregation, diffusion and spread of PM2.5 levels in Beijing, China and to predict them in the future.
Approach
Firstly, using data recorded by the US Embassy in Beijing for the years 2010-2014, we understand and define the problem by having a look at the input parameters (temperature, pressure, dew factor, wind direction, timeline of data, etc.) and the output parameter (PM2.5 concentration).
Secondly, we analyze and prepare the data by preprocessing (data cleansing, formatting, and sampling) and transformation (scaling and aggregation).
Thirdly, we choose the type of machine learning algorithm to use. Since we have both the input and output data, we plan to use supervised machine learning algorithms like native Bayes, linear regression and neural networks.
Next, we partition the data into three subsets – training, test and validation set. The proportion of a training and a test set is usually 80 to 20 percent respectively. We split the training set again, and use its 20 percent to form a validation set.
Finally, we predict the PM2.5 level the best model based on testing the data using machine-learning techniques. To evaluate the result, we may compare our results with the real value we get from the internet.
Data Acquisition
We find the datasets mainly from https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data.
Coding
Languages – Python
Libraries - Pandas, NumPy, TensorFlow, Keras
Timeline
Oct 19 – Oct 26. Initial analysis and data preprocessing
Oct. 27 – Nov. 12 Selection of model and training the data
Nov. 13 – Nov. 28 Evaluation and prediction
Nov 29 – Dec 2 Final preparation of report and presentation
Team-member roles
XXX: Algorithm implementation and coding
XXX: Coding and testing data
XXX: Analysis, testing and documentation