首页 > > 详细

辅导asp编程、r辅导、讲解留学生c语言、讲解Multiple Regression Computing 程序

AMS 315
Data Analysis, Fall 2017
Multiple Regression Computing Project
Introduction

This assignment is due on Thursday, December 7, 2017. Each group has one
database to analyze, with a single file containing the data. Each file contains one
dependent variable and nineteen independent variables. The values of the dependent
variable are in the DV column. The values of the twenty independent variables are in the
columns with names of E1 to E5 and G1 to G15. Records are in correct order in each file
so you do not need to process the data. There are no missing values. This project is worth
up to 150 points. Each student in the group will receive the same number of points. Each
group should analyze their own dataset. Failure to use the correct dataset will lead to a
grade of zero. The data sets are named by the group number, group#.csv. The
corresponding member of each group will receive the datafile.

Background

The class blackboard has a pdf file of a paper by Caspi et al. that reports a finding
of a gene-environment interaction. This paper used multiple regression techniques as the
methodology for its findings. You should read it for background, as it is the genesis of the
models that you will be given. The data that you are analyzing is synthetic. That is, the
TA used a model to generate the data. Your task is to find the model that the TA used for
your data. For example, one possible model is
2654321 )21 0 0502555 0 0( iiiiiiii ZGGGEGEY  .

The class blackboard also contains a paper by Risch et al. that uses a larger
collection of data to assess the findings in Caspi et al. These researchers confirmed that
Caspi et al. calculated their results correctly but that no other dataset had the relation
reported in Caspi et al. That is, Caspi et al. seem to have reported a false positive (Type I
error). The class blackboard contains a recent paper about the genetics of mental illness
and a technical appendix giving the specifics. Together these papers are an example of
the response of the research community to studying the genetics of mental illness, which
is a notoriously difficult research area.

Report

The report that your group submits should be no more than 2500 words with no
more than 3 tables and 2 figures. It should include references (which do not count in the
2500 words). The report may have a technical appendix. The appendix could include
your computer programs or describe your procedures for computation. Your group should
include whatever additional material it feels is necessary to report your results in the
technical appendix. There are no length restrictions on the appendix. A submission of
only computer output without a report is not sufficient and will receive a grade of zero.
Analyses that report an incorrect number of observations will also receive a grade of zero.

Your report should be in standard scientific report format. It should contain an
introduction, methods section, results section, and a section with conclusions and
discussion. You may add whatever other material you wish in a technical appendix. The
introduction should contain the statement of your problem (namely estimating the
function that the TA used to generate your data). It should discuss the context of finding
GxE interactions, as given by Caspi et al. and others. The methods section should discuss
how you performed your statistical calculations, what independent variables that you
considered, and other methodological issues, such as how you dealt with interaction
variables. The results section should contain an objective statement of your findings. That
is, it should contain the statement of the model that your group proposes for the data, the
analysis of variance table for this model, and other key summary results. The discussion
and conclusion section should include the limitations of your procedures. The class
blackboard has an editorial (by Cummings) that discusses reporting statistical information.

Guidelines for analysis

The first task for this problem is to use the statistical package of your choice to
find the correlations between the independent variables and the dependent variable.
Transformations of variables may be necessary. The Box-Cox transformation may find
potentially nonlinear transformations of a dependent variable. After selecting the
transformations of the dependent variable, use stepwise regression methods to select the
important independent variables. The Lasso technique was helpful to many groups in past
semesters. The TA will usually use at most two-way interactions of the independent
variables (that is, terms like 21GE or
43GG
) in generating your data. There may also be
non-linear environmental variables, such as 2
3E
or 5.04E . The TA may well have used
three factor interactions in the models for a few of the groups.

Hints

Chapter 12 and Chapter 13 in your text contain important information, especially
Chapter 12. Also remember to consider multiple testing issues (as described in Chapter 9).
The p-value for the variables that you select should be much smaller than 0.01.
Remember that you have 4 environmental variables, 15 genes, 60 gene-environment
variables, 105 gene-gene interaction variables, and a very large number of three gene
interaction variables.

Your technical appendix may include:
(a) Your SAS or R script. (If you are using SAS or R)
(b) Additional information that you want to report
(c) Any comments or suggestions

End of Project Assignment
 

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!