Submission Details

Due date: the homework is due before class on Thursday.

Submission process: submit both the R Markdown file and the corresponding html file on canvas. Please submit both the .Rmd and the .html files separately and do not zip the two files together.


Measles Vaccination Rates

  1. Download the RMarkdown file with these homework instructions to use as a template for your work. Make sure to replace “Your Name” in the YAML with your name.

  2. The data this week comes from The Wallstreet Journal. The data set includes immunization rate data for schools across the U.S. The accompanying article is published here.

library(dplyr)
library(ggplot2)
library(readr)
measles <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-25/measles.csv')
  1. For how many schools do we have data? How many of these schools recorded their Measles, Mumps, and Rubella (MMR) vaccination rate? Use the variable mmr to answer this question. Only consider schools with a rate > 0 for the remainder of the homework.
dim(measles)
## [1] 66113    16
measles <- measles %>% filter(mmr > 0)
dim(measles)
## [1] 44157    16

We have data for 66,113 schools and 44,157 of those have recorded their MMR vaccination rate.

  1. Using mutate(), reorder the levels of the variable state according to the median MMR vaccination rate. Then “pipe” your results into ggplot and create box plots of MMR vaccination rates for each state. Map the variable state to color, include the parameter show.legend = FALSE within geom_boxplot(), and flip the coordinates. Interpret.
measles %>% 
  mutate(state = forcats::fct_reorder(state, mmr)) %>% 
  ggplot(aes(x = state, y = mmr, color = state)) +
  geom_boxplot(show.legend = FALSE) + 
  coord_flip()

Most schools have a MMR vaccination rate over 95% with South Dakota and New York having the highest median rates. Most states have quite a few low rate outliers. Arkansas has the lowest median MMR vaccination rate.

  1. According to the CDC, 95% of a population needs to be vaccinated to stop the spread of measles and preserve herd immunity. Using mutate() and case_when(), introduce a new variable into the data set mmr_threshold where the value is “above” when mmr is greater than 95 and “below” otherwise. Is there a relationship between the type of school and the proportion of schools that did not reach that threshold? For each type of school, calculate the mean MMR vaccination rate. On how many responses are the averages based? Show these numbers together with the averages. Additionally, calculate the percentage of schools that did not reach that threshold. Arrange your results from greatest percentage to lowest. Comment on your results.
measles %>% 
  mutate(mmr_threshold = case_when(mmr > 95 ~ "above",
                                   TRUE ~ "below")) %>% 
  group_by(type) %>% 
  summarise(n = n(),
            mean_mmr = mean(mmr, na.rm = TRUE),
            perc_below = mean(mmr_threshold == "below")) %>% 
  arrange(desc(perc_below))
## # A tibble: 7 x 4
##   type             n mean_mmr perc_below
##   <chr>        <int>    <dbl>      <dbl>
## 1 Charter        217     88.0     0.733 
## 2 Private       4597     93.3     0.462 
## 3 Kindergarten  1486     94.2     0.380 
## 4 <NA>         18030     94.5     0.310 
## 5 Nonpublic       18     94.4     0.278 
## 6 Public       19762     96.2     0.239 
## 7 BOCES           47     98.8     0.0426

Charter school are the type of school with the lowest percentage of schools meeting the reccommended threshold whlie public and BOCES schools have the highest percentage of schools meeting the threshold.

  1. Use dplyr functions to:
  1. Now use the previous data set to draw a scatter plot with the mean MMR rate for each city on the y-axis and the student enrollment on the x-axis and color by the state mean MMR rate. Use the code below as your starting point and add in the necessary aesthetic mappings within ggplot(aes( )). Describe and summarise the chart.
question6_data %>%
  ggplot(aes( )) + 
  geom_hline(yintercept = 95, linetype = "dashed", size = 0.25, color = "grey40") +
  geom_point(size = 2, alpha = .3) +
  scale_color_gradient(low = "red", high = "blue", limits=c(88, 96), oob = scales::squish, 
  guide = guide_colorbar(direction = "horizontal", title.position = "top", title = "State average immunization rate", barwidth = 15, barheight = 0.25, ticks = FALSE, title.hjust = 0.5)) +
  theme_minimal() +
  theme(legend.position = "bottom") +
  ggtitle("MMR immunization rates at schools grouped across US cities") +
  labs(subtitle="According to data collected by The Wall Street Journal",
       x = "Student Enrollment", y = "") +
  scale_x_continuous(labels = scales::comma) 
measles %>% 
  filter(enroll > 0, name != "West Valley School Prek-6") %>%
  distinct(year, city, state, name, type, enroll, mmr) %>%
  group_by(state) %>%
  # get state averages (for color scale)
  mutate(state_avg = weighted.mean(mmr, w = enroll, na.rm = T)) %>% 
  group_by(city, state) %>%
  summarise(mmr_rate = weighted.mean(mmr, w = enroll, na.rm = T),
            enroll = sum(enroll, na.rm = T),
            state_avg = mean(state_avg, na.rm = T)) %>% 
  filter(enroll < 100000, enroll > 250) %>%
  ggplot(aes(enroll, mmr_rate, color = state_avg, text = city)) + 
    geom_hline(yintercept = 95, linetype = "dashed", size = 0.25, color = "grey40") +
  geom_point(size = 2, alpha = .3) +
  scale_color_gradient(low = "red", high = "blue", 
                       limits=c(88, 96),
                       oob = scales::squish,
                       guide = guide_colorbar(direction = "horizontal", title.position = "top", title = "State average immunization rate", barwidth = 15, barheight = 0.25, ticks = FALSE, title.hjust = 0.5)) +
  theme_minimal() +
  theme(legend.position = "bottom") +
  ggtitle("MMR immunization rates at schools grouped across US cities") +
  labs(subtitle="According to data collected by The Wall Street Journal",
       x = "Student Enrollment", y = "") +
  scale_x_continuous(labels = scales::comma)