STAT 29000: Project 14 — Spring 2022
Motivation: Rearranging data to and from "long" and "wide" formats sounds like a difficult task, however, tidyverse
has a variety of function that make it easy.
Context: This is the last project for the course. This project has a focus on how data can change when grouped differently, and using the pivot
functions.
Scope: R, tidyverse, ggplot
Dataset(s)
The following questions will use the following dataset(s):
-
/depot/datamine/data/death_records/DeathRecords.csv
Questions
Question 1
Calculate the average age of death for each of the MaritalStatus
values and create a barplot
using ggplot
and geom_col
.
-
Code used to solve this problem.
-
Output from running the code.
Question 2
Now, let’s further group our data by Sex
to see how the patterns change (if at all). Create a side-by-side bar plot where Sex
is shown for each of the 5 MaritalStatus
values.
-
Code used to solve this problem.
-
Output from running the code.
Question 3
In the previous question, before you piped the data into ggplot
functions, you likely used group_by
and summarize
. Take, for example, the following.
dat %>%
group_by(MaritalStatus, Sex) %>%
summarize(age_of_death=mean(Age))
MaritalStatus Sex age_of_death <chr> <chr> <dbl> D F 70.34766 D M 65.60564 M F 69.81002 M M 73.05787 S F 56.83075 S M 49.12891 U F 80.80274 U M 80.27476 W F 85.69817 W M 83.98783
Is this data "long" or "wide"?
There are multiple ways we could make this data "wider". Let’s say, for example, we want to rearrange the data so that we have the MaritalStatus
column, a M
column, and F
column. The M
column contains the average age of death for males and the F
column the same for females. While this may sound complicated to do, pivot_wider
makes this very easy.
Use pivot_wider
to rearrange the data as described.
-
Code used to solve this problem.
-
Output from running the code.
Question 4
Create a ggplot plot for each month. Each plot should be a barplot with the as.factor(DayOfWeekOfDeath)
on the x-axis and the count on the y-axis. The code below provides some structure to help get you started.
g <- list() # to save plots to
for (i in 1:12) {
g[[i]] <- dat %>%
filter(...) %>%
ggplot() +
geom_bar(...)
}
library(patchwork)
library(repr)
# change plot size to 12 by 12
options(repr.plot.width=12, repr.plot.height=12)
# use patchwork to display all plots in a grid
# https://cran.r-project.org/web/packages/patchwork/vignettes/patchwork.html
-
Code used to solve this problem.
-
Output from running the code.
Question 5
Question 4 is a bit tedious. tidyverse
provides a much more ergonomic way to create plots like this. Use facet_wrap
to create the same plot.
You do not need to use a loop to solve this problem anymore. In face, you only need to add 1 more line of code to this part.
|
Are there any patterns in the data that you find interesting?
-
Code used to solve this problem.
-
Output from running the code.
Question 6
It has been a fun year. We hope that you learned something new!
-
Write down 3 (or more) of your least favorite topics and/or projects from this past year (for STAT 29000).
-
Write down 3 (or more) of your favorite projects and/or topics you wish you were able to learn more about.
-
Code used to solve this problem.
-
Output from running the code.
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connect ion, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |