Grading

Grading will be based almost entirely on effort. (Point loss could occur for missing the deadline or submitting in a format other than PDF.) A member of the course staff will quickly review the document to determine if some thought was put into the writing.


Notes


Purpose

There are several reasons we’re performing these reflections:

What follows are some loosely organized question that could spark some reflection. These are not questions that need to be answered. You do not need to write about each topic or question. (But you should on some level think about all of them when performing analyses and reviewing your work.)


Template

Consider using the following as the start of an R Markdown document when writing your reflection.


---
title: "Analysis Reflection"
author: "Your Name Here"
output: pdf_document
---

***

# Some Header

This is some text.

## A Subheader

- This
- Is
- A
- List

***

# Another Header


```r
# a code chunk
x = 42
```

***

Reflection Topics


General

  • Did you find this dataset interesting?
    • Do you feel that you had enough documentation to understand this dataset?
  • Did the assignment seem useful? That is, did you get something out of performing this analysis? (Other than points.)
  • Does this dataset seems useful? That is, did this dataset allow you to create a useful model?

Abstract

  • Do you feel that a reader has a reasonable idea of what they are about to read?

Introduction

  • Do you feel that a reader would have a reasonable idea what the data is, and why the analysis is being performed?

Methods

  • Do you feel that a reader would have a reasonable idea how you have modeled the data?
  • Do you feel that a reader would have a reasonable idea how you are validating your decisions?

Results

  • Does your results section contain useful information? That is, you’ve reported your results, but only those that are most relevant to your decision and discussion.

Discussion

  • Do you feel that you properly contextualized your results?

Effort

  • Do you feel that you spent enough time on this assignment?
  • Do you feel like your time was well spend completing this assignment?
  • Are some parts of the analysis easier than others? Where did you spend your time when you did this analysis?

Formatting

  • Does your report pass the eye test? That is, when quickly scrolling through, do you notice any obvious formatting or rendering issues?
    • Consider both the R Markdown document, and the rendered document.
  • Does your document contain spelling errors? (Note: There is a spell-check feature in RStudio.)
  • When displaying results, are a reasonable number of digits shown? (Enough that differences can be seen, but not so many that it is hard to read.)
  • Is code keep to a minimum in the final document?
    • Code should only be shown sparingly, if it is the easiest way to convey what is happening.
    • Code for graphics should always be hidden.
    • Output of code should almost never be including in output without modification.
  • Are numeric results reported in well formatted tables?
  • Are graphics and tables free of “code”?
    • Example “Validation RMSE” instead of val_rmse.
  • Have you given your analysis a title? (That isn’t specific to the assignment. That is, not something like: “Analysis 01.”)
  • Are warnings and messages suppressed from your final knitted report?

Modeling

  • Did you make the same modeling decision as the example solution?
    • If not, what did you do differently? Can you justify this difference?
  • Do your numeric results “differ” from the example solution? If so, can you find a reason why?
    • A difference here doesn’t imply something is wrong.
  • How are you evaluating your modeling choices? Did you make any choices that seem unjustified?

Graphics

  • Do your graphics appear to be publication quality? (Well labeled, easy to read, generally well formatted.)
    • Quickly made EDA plots in an appendix somewhat excluded, but should still look reasonable.

Code

  • How does your code compare to the example solution?
    • Do you find the code in the example solution easy to read? Is yours easier to read?
    • Do you see changes you could make to your coding habits and style?
  • Does your code follow the tidyverse style guide? (With the known exceptions for STAT 432.)

Questions?

  • Do you have any questions about an assignment, course materials, or course administration?
    • If so, consider asking a few! Do so in the Piazza thread for the specific analysis in question. (See below for link.)
      • We will attempt to answer as many of these as possible!
      • Please try to not submit duplicate questions! (You should read questions from other students!)

Analysis 01 (Airbnb)

  • Is there some obvious data missing here?
    • Hint: Airbnb actually makes more data available publicly.

Questions Thread: https://piazza.com/class/jzeh4ckghek5cd?cid=152


Analysis 02 (Heart Disease)

  • Did you view the original source data? Did you use additional source data? Extra variables? Extra observations?
    • If you used additional data, did you encounter missing data? If so, how did you deal with it?
  • Did you do any additional EDA? (Beyond the very basic analysis in the quiz.)
    • If so, did you notice anything interesting?
  • Were you careful to consider the population that this data was sampled from when recommending this model for use? (It is sampled from four very specific locations, and is seemingly sampled from patients that were already seeking medical attention, in particular from cardiology.)
    • It would be a terrible idea to use this model on randomly chosen individuals from the a big population, for example, the United States.
  • How did you evaluate your models? What metrics did you use? How did you define and weight false positives and false negatives in your decision making?

Questions Thread: [ @193 ] Example Solution: [ .html ] [ .Rmd ]


Analysis 03 (Credit Card Fraud)

  • Did you use the full data or the provided subset? Did increasing the sample size make the performance better?
  • How did you quantify the performance of your model?
    • Did you utilize the loss values given in the quiz? Did you consider the amount of the transaction in some other way?
  • Did you consider the time it takes to make predictions?
  • Note that in the “solution” there are some meta “instructor notes” at the bottom.

Questions Thread: [ @248 ] Example Solution: [ .html ] [ .Rmd ]