The Joy of Painting with Bob Ross


Submission Details

Due date: the homework is due before class on Thursday.

Submission process: submit both the R Markdown file and the corresponding html file on canvas. Please submit both the .Rmd and the .html files separately and do not zip the two files together.


Questions

  1. Download the RMarkdown file with these homework instructions to use as a template for your work. Make sure to replace “Your Name” in the YAML with your name.
# load necessary libraries
library(dplyr)
library(tidyr)
library(ggplot2)
library(forcats)
library(readr)
  1. The data this week comes from fivethiryeight. The data set includes information on the 403 episodes of “The Joy of Painting”. The accompanying article is published here.
# read in the data
bob_ross <- read_csv('https://raw.githubusercontent.com/Stat480-at-ISU/Stat480-at-ISU.github.io/master/homework/data/bob-ross.csv')
  1. Variables alizarin_crimson through burnt_umber correspond to the binary presence (0 or 1) of that color in the painting. Gather all of these variables and create a long form of the data, introducing two new variables called color and presence. Save the result in a data frame called bob_ross_colors.
## your answer here
  1. Does Bob Ross have a favorite color to paint with? Use the data bob_ross_colors as your starting point and for each color calculate the number of times that color was used throughout the series. After using this number to reorder the levels of the variable color, create a bar chart using the code below as your starting point and add in the necessary aesthetic mappings within ggplot(aes( )). Describe and summarize the chart.
question4_data %>%
  ggplot(aes( )) + 
  geom_bar(show.legend = FALSE) + 
  coord_flip() + 
  theme_minimal() +
  labs(y = "Number of paintings",
       x = "", 
       title = "Frequency of colors in Bob Ross Paintings") +
  scale_fill_manual(values = c("#CD5C5C", "#8A3324", "#2C6436", "#3C67A7", "#643914", "#E7BD2F", "#546F1F", "#C36A4A", "#346BB1", "#B58A30", "#F8ED5F", "#372518", "#973B29")) 
## your answer here
  1. For this question use the original data again. Variables aurora_borealis through winter correspond to the binary presence (0 or 1) of that element in the painting. Use pivot_longer() as shown in class to transform the data into a tidier format with new variables element and presence. Save the result in a data frame called bob_ross_elements.
## your answer here
  1. What are the most frequent elements in his paintings? Use the data bob_ross_elements as your starting point and for each element calculate the number of times that element was included. Then use this number to reorder the levels of element. Exclude elements that were featured in fewer than 50 paintings and create a bar chart. Use the code below as your starting point and add in the necessary aesthetic mappings within ggplot(aes( )). Describe and summarize the chart.
question6_data %>%
  ggplot(aes( )) + 
  geom_bar(fill = "seagreen") + 
  coord_flip() + 
  theme_minimal() +
  labs(y = "Number of paintings",
       x = "", 
       title = "What were most common features in Bob Ross paintings?",
       subtitle = "Paintings by the numbers")
## your answer here
  1. Did the content of the paintings change over time? We will attempt to answer this question by looking at elements that appeared in more than 90 paintings and their trends over the seasons. Use the data bob_ross_elements as your starting point and for each season and element, calculate the number of times an element was included. Exclude elements that were included in less than 90 paintings total. Create a line plot showing number of times an element was included for each season with season on the x-axis and facet by element. Use the code below as your starting point and add in the necessary aesthetic mappings within ggplot(aes( )) and add in the faceting. Describe and summarize the chart.
question7_data %>%
  ggplot(aes( )) + 
  geom_line(color = "deepskyblue") + 
  # add faceting here
  labs(y = "Number of paintings with element",
       title = "The content of Bob Ross paintings over time",
       subtitle = "Dashed line is number of episodes in the season") + 
  geom_hline(yintercept = 13, lty = 2, color = "grey70") + 
  theme_minimal() + 
  expand_limits(y = 0) 
## your answer here