首页 > > 详细

Midterm project At this point, we’ve covered

Midterm project
Context
At this point, we’ve covered Building Blocks, Data Wrangling I, Visualization and EDA, and Data Wrangling II. These three topics give a broad introduction into the commonly-used tools of data science, and are the main focus of this project.
Due date
Due: October 26 at 4:00pm.
Reproducibility
The course’s emphasis on workflow – especially the use git and GitHub for reproducibility, R Projects to organize your work, R Markdown to write reproducible reports, relative paths to load data from local files, and reasonable naming structures for your files– will be reflected in your Midterm Project submission.
To that end:
create a private GitHub repo + local R Project; we suggest naming this repo / directory p8105_mtp_YOURUNI (e.g. p8105_mtp_ajg2202 for Jeff), but that’s not required
onon-private repos will be treated as inconsistent with the independent work requirement and as violations of the academic integrity policy
add the GitHub user “bst-p8105” as a collaborator on the project, which will give us (and only us) access to your repo
create a data directory and add data/ to the .gitignore file
othe data folder and its contents should not appear in your git history or on GitHub, but you can still access files in this folder using relative paths
oafter downloading your submission, we’ll add the data directory and data files to
create a single .Rmd file named p8105_mtp_YOURUNI.Rmd that renders to github_document
submit a link to your repo via Courseworks
We will assess adherence to the instructions above and whether we are able to knit your .Rmd in the grading of this project. Adherence to appropriate styling and clarity of code will be assessed. This project includes figures; the readability of your embedded plots (e.g. font sizes, axis labels, titles) will be assessed.
Data
Accelerometers have become an appealing alternative to self-report techniques for studying physical activity in observational studies and clinical trials, largely because of their relative objectivity. During observation periods, the devices measure electrical signals that are a proxy for acceleration. “Activity counts” are then devised by summarizing the voltage signals across a short period known as an epoch; one-minute epochs are common. Because accelerometers can be worn comfortably and unobtrusively, they produce around-the-clock observations of many kinds of activity.
The data for this project was collected on a 63 year-old male with BMI 25, who was admitted to the Advanced Cardiac Care Center of Columbia University Medical Center and diagnosed with congestive heart failure (CHF). He wore an accelerometerdevice that continuously recorded physical activity for several months. The goal of this study was to understand patterns of physical activity over long periods of time; more generally, it is hoped that trends that emerge prior to and happen right after adverse clinical eventsand can be potentially used for dynamic update of patient status.
The data can be downloaded here. In this spreadsheet, variables activity.* are the activity counts for each minute of a 24-hour day starting at midnight.
Problem
Throughout, you should describe your work in writing that targets a reasonably sophisticated collaborator – not an expert data scientist, but an interested observer. Include code and output; structure your report clearly. Write in a reproducible way (e.g. using inline R code where necessary). Include only relevant information, and adhere to a strict-500 word limit (this excludes figures and tables, code chunks, inline code, YAML, and other non-text elements). You can check your word count using wordcountaddin::text_stats("p8105_mtp_YOURUNI.Rmd"); installation instructions can be found on the wordcountaddin package website. We’ll use the “koRpus” count.
First, load, tidy, and otherwise wrangle the data. Your final dataset should include all originally observed variables and values; have useful variable names; and code data with reasonable variable classes. Describe the resulting dataset (e.g. what variables exist, how many observations, etc). Discuss any additional exploratory analyses you conduct, but only include results you think are interesting (e.g. visually inspect distributions for outliers, but include only if these are informative).
Traditional analyses of accelerometer data focus on the total activity over the day. Using your tidied dataset, aggregate accross minutes to create a total activity variable. Using these data, explore the hypothesis that this participant became more active over time. You may want to want to make comparisons visually and / or quantitatively, or using formal statistical analyses. Additionally, examine the possibility that day of the week affects activity (in isolation and in addition to the effect of time).
One advantage of accelerometer data is the ability to inspect activity over the course of the day, rather than aggregating over 24 hours. Explore the distribution of activity counts in the full dataset, taking into account other variables of interest. Make a visualization that shows the 24-hour activity “profiles” for each day. Also visualize effects of time and day of the week on 24-hour activity profiles; incorporating smooth estimates of mean activity profiles may clarify these effects. Comment on relationships you think are interesting. (No formal statistical analyses are needed.)

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!