Analysis Goal

Use the given data to detect the presence of heart disease. There are three tasks associated with this analysis:


Data

The response num variable has five levels. The documentation does not appear to be extremely clear about this, so we will assume that the levels mean the following.

In other words, the response variable num is the number of vessels with greater than 50% diameter narrowing.

The following code is available to show how the data was created, but please use the .csv linked above.

# load packages
library("tidyverse")
library("caret")
# install.packages("devtools")
# devtools::install_github("coatless/ucidata")

# load data for each location form ucidata package
hd_ch = as_tibble(ucidata::heart_disease_ch)
hd_cl = as_tibble(ucidata::heart_disease_cl)
hd_hu = as_tibble(ucidata::heart_disease_hu)
hd_va = as_tibble(ucidata::heart_disease_va)

# add location variable for each dataset
hd_ch$location = "ch"
hd_cl$location = "cl"
hd_hu$location = "hu"
hd_va$location = "va"

# determine number of NA values in each column of "combined" dataset
# bind_rows(hd_ch, hd_cl, hd_hu, hd_va) %>%
#   mutate_all(is.na) %>% 
#   summarise_all(sum)

# combine the four locations into one dataset
# remove columns with large proportion of NA values (reasonable in practice)
# remove remainder of rows with NA values (not the best idea in practice)
hd = bind_rows(hd_ch, hd_cl, hd_hu, hd_va) %>% 
  select(-slope, -ca, -thal) %>%
  na.omit()

# coerce location variable to factor
# may need to do this again after reading data
hd$location = factor(hd$location)

# re-define response variable (names will play better with caret)
hd$num = factor(case_when(
  hd$num == 0 ~ "v0",
  hd$num == 1 ~ "v1",
  hd$num == 2 ~ "v2",
  hd$num == 3 ~ "v3",
  hd$num == 4 ~ "v4"
))

# write to disk
write_csv(hd, "analyses/analysis-02/heart-disease.csv")

IMRAD

For this analysis, do the following:


IMRAD Submission

Submit a .zip file to Compass that contains:

The zip file should contain no other files. (Whether or not these two files are within another folder does not matter.)

Submit your .zip file to the correct assignment on Compass2g. You are granted an unlimited number of submissions. Only your final submission will be graded.


R Environment

We assume that your R, R packages, and RStudio are all up-to-date. (Or at least as recent as the versions found on RStudio Cloud.) You’ve been warned.


R Style

Your code will be graded based on its style. We don’t expect you to have a mature coding style, so we have a list of rules which must be followed.

The following will be explicitly checked for in your code:

The following are suggested, but will not be directly assessed:

Much of this is derived from the tidyverse style guide. If you follow the tidyverse guide, be aware of our use of ^ and =.


Analysis Quiz

There will be a PL quiz associated with this analysis to check some of the “objective” numeric results of your analysis.


Self Assessment

After submission of the analysis, an example “solution” will be released. In addition, a set of reflection questions will be released. By comparing your submitted analysis to the “solution” together with the reflection questions, you will write a short self-assessment of you analysis.


Grading

This analysis is worth a total of 10 points.

Failure to submit the correct files will results in 0 points for the IMRAD.

Quiz grading will be similar to regular quizzes.

Grading of the self reflection will largely be based on completion. A template will be provided after submission of the analysis.

Late Policy

The late policy will apply to each individual task. See above for due dates.

Late submissions for both will be accepted up to 48 hours after the initial deadline.

  • Up to 24 hours late, the assignment will incur a 10 percent reduction.
  • Up to 48 hours late, the assignment will incur a 30 percent reduction.
  • No exceptions! Start early and make sure your environment is working correctly and you are able to produce a working document.

If you submit multiple attempts, the final attempt will be graded. If your first submission is on time, but your final submission is late, you will incur the late submission penalty.