Submission Details

Due date: the homework is due before class on Thursday.

Submission process: submit both the R Markdown file and the corresponding html file on canvas. Please submit both the .Rmd and the .html files separately and do not zip the two files together.


Flying etiquette

  1. Download the RMarkdown file with these homework instructions to use as a template for your work. Make sure to replace “Your Name” in the YAML with your name.

  2. FiveThirtyEight is a website founded by Statistician and writer Nate Silver to publish results from opinion poll analysis, politics, economics, and sports blogging. One of the featured articles considers flying etiquette. This article is based on data collected by FiveThirtyEight and publicly available on github. Use the code below to read in the data from the survey:

fly <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/flying-etiquette-survey/flying-etiquette.csv")

The next couple of lines of code provide a bit of cleanup of the demographic information by reordering the levels of the corresponding factor variables. Run this code in your session.

fly$Age <- factor(fly$Age, levels=c("18-29", "30-44", "45-60", "> 60", ""))
fly$Household.Income <- factor(fly$Household.Income, levels = c("$0 - $24,999","$25,000 - $49,999", "$50,000 - $99,999", "$100,000 - $149,999", "150000", ""))
fly$Education <- factor(fly$Education, levels = c("Less than high school degree", "High school degree", "Some college or Associate degree", "Bachelor degree",  "Graduate degree", ""))
  1. Some people do not travel often by plane. Provide a breakdown of travel frequency (use variable How.often.do.you.travel.by.plane.). Reorder the levels in the variable by travel frequency from least frequent travel to most frequent. Draw a barchart of travel frequency and comment on it.
summary(fly$`How.often.do.you.travel.by.plane.`)
## A few times per month  A few times per week             Every day 
##                    29                     4                     3 
##                 Never  Once a month or less   Once a year or less 
##                   166                   205                   633
fly$`How.often.do.you.travel.by.plane.` <- forcats::fct_relevel(fly$`How.often.do.you.travel.by.plane.`, "Never", "Once a year or less", "Once a month or less", "A few times per month", "A few times per week", "Every day")
ggplot(fly) + geom_bar(aes(x = `How.often.do.you.travel.by.plane.`)) + coord_flip()

Most of the survey respondents fly once a year or less and the vast majority fly once a month or less.

  1. In the demographic variables (Education, Age, and Houshold.Income), replace all occurrences of the empty string "" by a missing value NA. How many responses do not have any missing values? (Hint: the function is.na might come in handy)
levels(fly$Education)[6] <- NA
levels(fly$Age)[5] <- NA
levels(fly$Household.Income)[6] <- NA

sum(is.na(fly$Education))
## [1] 39
sum(is.na(fly$Age))
## [1] 33
sum(is.na(fly$Household.Income))
## [1] 214
  1. Run the command below and interpret the output. What potential purpose can you see for the chart?
library(ggplot2)
fly$Education = with(fly, factor(Education, levels = rev(levels(Education))))

ggplot(data = fly, aes(x = 1)) + 
  geom_bar(aes(fill=Education), position="fill") +
  theme(legend.position="bottom") +
  scale_fill_brewer(na.value = "grey50") + 
  labs(y = "Ratio") +
  coord_flip()

Because there is a natural ordering of the levels, this chart would allow us to lump categories together mentally. For instance, we can easily see that approximately 40% of the respondents replied that their education level is less than a Bachelors degree.

  1. Rename the variable In.general..is.itrude.to.bring.a.baby.on.a.plane. to baby.on.plane.. How many levels does the variable baby.on.plane have, and what are these levels? Rename the level labeled "" to “Not answered”. Reorder the levels of baby.on.plane from least rude to most rude. Put the level “Not answered” last. Draw a barchart of variable baby.on.plane. Interpret the result.
names(fly)[19] <- "baby.on.plane."
levels(fly$`baby.on.plane.`)[1] <- "Not answered"
fly$`baby.on.plane.` <- forcats::fct_relevel(fly$`baby.on.plane.`, "No, not at all rude", "Yes, somewhat rude", "Yes, very rude", "Not answered")
ggplot(fly) + geom_bar(aes(x = `baby.on.plane.`))

Only a small number of respondents consider it somewhat rude to bring a baby on a plane and an even smaller number consider it to be very rude.

  1. Investigate the relationship between gender and the variables Do.you.have.any.children.under.18. and baby.on.plane. How is the attitude towards babies on planes shaped by gender and having children under 18? Find a plot that summarises your findings (use ggplot2).
levels(fly$`Do.you.have.any.children.under.18.`)[1] <- NA
levels(fly$Gender)[1] <- NA

ggplot(fly) + geom_bar(aes(x = Gender, fill = `baby.on.plane.`), position = "fill") + facet_grid(~`Do.you.have.any.children.under.18.`) 

It seems that those with children under 18 are less likely to be bothered by a baby on a plane. Additionally, whether or not the respondent had a child under 18 or not, if the respondent was a female, they were less likely to be bothered by a baby on a plane.