Assignment 3 the NHANES dataset built into R

Assignment 3

Due Friday, November 18th 2022, 11:59pm

Assignment 3 requires the use of the same dataset as assignment 1, the NHANES dataset built into R. You can see the data dictionary for the NHANES data using the command help(NHANES) in R, or at this link: https://cran.r-project.org/web/packages/NHANES/NHANES.pdf

In this assignment you are asked to create a conference abstract, answering the question what influences sleep time? You are expected to use a multiple regression model to answer the question, which will involve controlling for other variables, either from the dataset or newly created variables of your own. Sleep time is measured using the variable SleepHrsNight. You have to use a minimum of 3 variables in your model, though I would strongly encourage you to pick 5-7 predictors variables to make the model interesting to write about.

The abstract will have a traditional conference abstract structure, using the following sections: Background, Objectives, Methods, Results, Discussion. You are limited to 500 words total. You are allowed to be brief with the background and objectives as you did not collect the data, and the methods section should focus on statistical methods. A sample abstract is provided below for a previous year.

Two files will need to be submitted:

1.A PDF document of your conference abstract

2.A .R or .Rmd file with your analytic code

Collaboration on assignments is acceptable, but each student must produce and submit their own assignment. To improve the marking process, you are required to list the people that you worked with on your submitted assignment. This makes it easier to detect where groups are having issues and improves the learning process for all. Not listing collaboration with other students constitutes plagiarism and will be prosecuted as such.

Grading Rubric

Presentation (30%): Does the format fit the conference structure? Word count? Grammar and writing quality will be evaluated as well.

Methods (30%): Is the analysis plan well described? Does it make sense for the data? Were the proper relationships explored and reported on? Were data handling measures taken and well described?

Results (30%): Are the results correct? Do they make sense in relation to the objectives of the project? Do they answer the stated hypotheses?

Code (10%): Was the code file included? Does it produce the results in your submitted assignment?

The table below is from the evaluation rubric from the course syllabus, to help guide the evaluation of these categories:

Excellent Average Weak

Understanding Shows in-depth and comprehensive understanding of the statistical methods and their application Shows in-depth or comprehensive understanding of the statistical methods and their application Show neither in-depth or comprehensive understanding of the statistical methods and their application

Assessment Demonstrates ability to identify and weigh key strengths and limitations of the statistical methods and their application Identifies key strengths and limitations of the statistical methods and their application in own term Identifies key strengths and limitations of the statistical methods and their application already stated by others

Synthesis Synthesizes information by developing or applying a relevant structure Summarizes information but unable to prioritize important information Gathers information unevenly

Writing Writes clearly and concisely with sufficient detail and appropriate citation*; exhibits the development of own writing style beyond grammatically correct use of language Writes clearly with appropriate citation* and mostly grammatically correct use of language Unable to write clearly; unable to cite appropriately*; and/or the use of language often grammatically incorrect

*Note that, in this project, no citation is necessary

Example：

Background: Cardiovascular disease (CVD) is a serious problem in Canada. As our population ages this risk is only expected to increase, therefore it is imperative that we understand the nature of CVD and how CVD-related conditions relate to death.

Objectives: The purpose of this research is to investigate the relationship between CVD indicators and survival in amongst patients discharged from a cardiology inpatient unit.

Methods: 500 consecutive patients were taken from the QEII health sciences centre cardiology unit starting January 1, 2013. For all 500 patients information on their age, sex, height and weight, blood pressure, and history of both heart attack and diabetes were captured at the time of discharge, and then their survival 5-years after discharged was captured. BMI and hypertensive status variables were both created as 4-level ordinal variables.

Before the analysis was conducted outliers were detected in two blood pressure values, and for 5 patients their 5-year follow-up was not able to be completed. These 7 patients were removed from the sample, leaving a final sample of 493 patients.

Simple summary statistics and univariate analyses were done using logistic regression, and then a multiple linear regression model was built to predict death within 5 years using age, diabetes, bmi category, hypertensive category, sex and history of heart attack. Sex-stratified analyses were performed to investigate the potential of effect modification, and patients without a history of heart attack were investigated as a subgroup.

Results: The sample included 251 females (51%), with an average age of 49 (range of 20-74 years). 26 patients (5%) had a history of heart attacks and 19 (4%) had diabetes. 234 subjects had normal BMI (48%), with 184 (37%) being overweight and 67 (14%) being obese. Only 104 patients (21%) were normotensive, with 49 and 124 patients (10% and 25%) being in stages 1 and 2 of pre-hypertension and 216 (44%) being hypertensive.

The only unadjusted effect that achieved statistical significance was a history of heart attack: Those with a history showing an OR of 7.1 (95% CI: [3.1, 16.2]). This pattern continued into the multiple model, with the OR dropping to 6.9 (95% CI: [2.9, 16.6]) after controlling age, diabetes, sex, BMI and hypertensive group. No other variable approached statistical significance.

In the sex stratified analysis the effect of heart attack remained for both males (OR=8.1, 95% CI: [2.5, 26.3]) and for females (OR = 12.0, 95% CI: [2.5, 58.6]), and there was no meaningful change in the effects of the other variables. For females diabetes approached significance and BMI show a statistically significant but non-linear pattern, and both results are most likely due to small sample sizes.

In the subgroup analysis on those without a heart attack there were no results that were of interesting statistical or clinical significance.

Discussion: History of heart attack was the only variable that showed a strong relationship with survival. Sex-stratification revealed nothing of interest, and surprisingly neither blood pressure nor BMI group had a strong relationship. A larger sample might reveal more interesting patterns in the sex-stratified analysis, but moving forward other more relevant predictors should be investigated.