There are four assignments for the project. Their due dates are:
The overall goal of the project is to work in groups to apply (mostly supervised) statistical learning methods to a dataset of your choice.
You may use any dataset of your choice, so long as it contains at minimum 500 observations and was not previously used in class. This dataset might be relevant to research outside of this course, another field, or some other interest of yours. If you have any questions about whether your data is appropriate, do not hesitate to ask. If you plan to use data from another endeavor of yours, such as a research project, be sure to gain permission from the controlling authority first.
The two most common sources of data used by students:
If you find an interesting data source (not necessarily a dataset) feel free to share it here:
The final product of this project will be a written report in the IMRAD style used throughout the course.
Your goal is not to use as many methods as possible. Your task is to use appropriate methods to find a good model that can perform the desired statistical learning task. Most importantly, you should motivate and discuss why that task is being completed, and how well it is being completed.
An email requesting a group formation must be received by Friday, November 8, 2019, 11:59 PM if you would like to select your own group. (If you skip this task, you will be assigned a group.) Groups may be as small as three members, and as large as four members. You may submit a group of two, but the instructor will merge it with another group, or assign additional students without a group.
For a group to be considered, an email must be sent to dalpiaz2@illinois.edu
by the deadline. It must do the following:
[STAT 432] Project Group - "Some Name Here"
"Some Name Here"
with some clever name for your group. (This is just to prevent the emails from being grouped by subject.)Failure of any of the above steps will received an email that simply says “try again.” Absolutely no late requests will be considered, for any reason. A non-conforming email before the deadline does not count. Please send one email per group. (This is easy to accomplish via communication and CCing group members on the request email.)
A proposal of your intended project is due by Monday, November 18, 2019, 11:59 PM. It should be submitted online via Compass by a single group member.
After review of the proposal, it will be evaluated in one of two ways:
A proposal of your intended project should include the following:
R
. - Load the data, and print the first few values of the response variable as evidence. - Create at least one plot that helps the reader understand the data.R
. - Use either lm()
(regression) or glm()
(classification) then call predict()
on the results and return the first few values. You may need to perform some data cleaning before this step.As a group, you will submit a .zip
file as you would for an analysis that contains an .html
and .Rmd
file, as well as the data if it cannot be linked online. If your data is too large to submit, and cannot be linked, please let us know and we will find an alternative. There is no required format or template, but you should follow reasonable R Markdown practices discussed in class.
The final report of your analysis is due by Monday, December 16, 2019, 11:59 PM. It should be submitted online via Compass by a single group member.
As a group, you will submit a .zip
file as you would for analyses which contains a .html
and .Rmd
file, as well as the data if it cannot be linked to online.
A peer evaluation of the group members is due by Monday, December 16, 2019, 11:59 PM. It should be submitted online via Compass by each group member.
Individually, you will write a short review of each of your group members, including yourself. For each member, comment on:
Individually, you will submit a single file (.pdf
preferred) that contains your reviews. (A template will likely be released for this task.)
You will be graded on formatting, clarity, appropriateness of data, and motivation of task.
Points Possible: 70
R
is used appropriately.
rmarkdown
is used appropriately.
rmarkdown
? (Headers, chunks, etc.)It is more important that you honestly review your team than give each member good remarks. You will be graded on how well you review your group members. If you simply give each of your team members good marks, you will likely receive far fewer points for the portion of the grade dedicated to evaluating your peers. (For example, if you give everyone 90%+ in each category, your grade will be reduced.)
Formatting and clarity will also account for a portion of the grade for your evaluations.
Results of the peer analysis will not be shared with your group members.
The instructor reserves the right to further reduce a students overall project grade if their teams reports that they did not attempt to make a significant contribution to the project.
This section will likely be updated as we progress through the remainder of the semester.
How long should the report be?
Isn’t this a lot to do at the end of the course while we have other things going on in the course? And it’s due during finals week?
How should we split up this analysis?
How do we deal with an unresponsive group member?