The Joy of Painting with Bob Ross


Submission Details

Due date: the homework is due before class on Thursday.

Submission process: submit both the R Markdown file and the corresponding html file on canvas. Please submit both the .Rmd and the .html files separately and do not zip the two files together.


Questions

  1. Download the RMarkdown file with these homework instructions to use as a template for your work. Make sure to replace “Your Name” in the YAML with your name.
# load necessary libraries
library(dplyr)
library(tidyr)
library(ggplot2)
library(forcats)
library(readr)
  1. The data this week comes from fivethiryeight. The data set includes information on the 403 episodes of “The Joy of Painting”. The accompanying article is published here.
# read in the data
bob_ross <- read_csv('https://raw.githubusercontent.com/Stat480-at-ISU/Stat480-at-ISU.github.io/master/homework/data/bob-ross.csv')
  1. Variables alizarin_crimson through burnt_umber correspond to the binary presence (0 or 1) of that color in the painting. Gather all of these variables and create a long form of the data, introducing two new variables called color and presence. Save the result in a data frame called bob_ross_colors.
bob_ross_colors <- bob_ross %>% pivot_longer(cols= alizarin_crimson:burnt_umber, names_to = "Color", values_to= "presence")
  1. Does Bob Ross have a favorite color to paint with? Use the data bob_ross_colors as your starting point and for each color calculate the number of times that color was used throughout the series. After using this number to reorder the levels of the variable color, create a bar chart using the code below as your starting point and add in the necessary aesthetic mappings within ggplot(aes( )). Describe and summarize the chart.
question4_data %>%
  ggplot(aes( )) + 
  geom_bar(show.legend = FALSE) + 
  coord_flip() + 
  theme_minimal() +
  labs(y = "Number of paintings",
       x = "", 
       title = "Frequency of colors in Bob Ross Paintings") +
  scale_fill_manual(values = c("#CD5C5C", "#8A3324", "#2C6436", "#3C67A7", "#643914", "#E7BD2F", "#546F1F", "#C36A4A", "#346BB1", "#B58A30", "#F8ED5F", "#372518", "#973B29")) 
question4_data <- bob_ross_colors %>%
  group_by(Color) %>%  
  summarise(ColorCount = sum(presence)) %>% 
  mutate(Color = fct_reorder(Color, ColorCount)) 

question4_data %>% 
  ggplot(aes(x = Color)) + 
  geom_bar(aes(weight=ColorCount, fill = Color), show.legend = FALSE) + 
  coord_flip() + 
  theme_minimal() +
  labs(y = "Number of paintings",
       x = "", 
       title = "Frequency of colors in Bob Ross Paintings") +
  scale_fill_manual(values = c("#CD5C5C", "#8A3324", "#2C6436", "#3C67A7", "#643914", "#E7BD2F", "#546F1F", "#C36A4A", "#346BB1", "#B58A30", "#F8ED5F", "#372518", "#973B29"))

Ross uses alizarin crimson and van dyke brown the most, with it showing up in over 350 of his paintings. Most of the other colors show up in about the same amount of paintings ranging from about 275 to 350. Finally, indian red shows up in the least amount of paintings.

  1. For this question use the original data again. Variables aurora_borealis through winter correspond to the binary presence (0 or 1) of that element in the painting. Use pivot_longer() as shown in class to transform the data into a tidier format with new variables element and presence. Save the result in a data frame called bob_ross_elements.
bob_ross_elements <- bob_ross %>% pivot_longer(cols= aurora_borealis:winter, names_to= "element", values_to = "prescence")
  1. What are the most frequent elements in his paintings? Use the data bob_ross_elements as your starting point and for each element calculate the number of times that element was included. Then use this number to reorder the levels of element. Exclude elements that were featured in fewer than 50 paintings and create a bar chart. Use the code below as your starting point and add in the necessary aesthetic mappings within ggplot(aes( )). Describe and summarize the chart.
question6_data %>%
  ggplot(aes( )) + 
  geom_bar(fill = "seagreen") + 
  coord_flip() + 
  theme_minimal() +
  labs(y = "Number of paintings",
       x = "", 
       title = "What were most common features in Bob Ross paintings?",
       subtitle = "Paintings by the numbers")
question6_data <- bob_ross_elements %>% group_by(element) %>%  summarise(ElementCount = sum(prescence)) %>% filter(ElementCount>= 50) %>% mutate(element = fct_reorder(element, ElementCount))

question6_data %>%
  ggplot(aes(x = element)) + 
  geom_bar(fill = "seagreen",aes(weight=ElementCount)) + 
  coord_flip() + 
  theme_minimal() +
  labs(y = "Number of paintings",
       x = "", 
       title = "What were most common features in Bob Ross paintings?",
       subtitle = "Paintings by the numbers")

Some form of tree appears in the highest number of paintings with variations of trees making up the top four elements. Other common elements include clouds, mountains, lakes, and grass which show up in about 150 paintings each. Finally, some elements such as cabins, winter, and snow only appear in about 50 paintings a piece.

  1. Did the content of the paintings change over time? We will attempt to answer this question by looking at elements that appeared in more than 90 paintings and their trends over the seasons. Use the data bob_ross_elements as your starting point and for each season and element, calculate the number of times an element was included. Then calculate the total number of times an element. Exclude elements that were included in less than 90 paintings total. Create a line plot showing number of times an element was included for each season with season on the x-axis and facet by element. Use the code below as your starting point and add in the necessary aesthetic mappings within ggplot(aes( )) and add in the faceting. Describe and summarize the chart.
question7_data %>%
  ggplot(aes( )) + 
  geom_line(color = "deepskyblue") + 
  # add faceting here
  labs(y = "Number of paintings with element",
       title = "The content of Bob Ross paintings over time",
       subtitle = "Dashed line is number of episodes in the season") + 
  geom_hline(yintercept = 13, lty = 2, color = "grey70") + 
  theme_minimal() + 
  expand_limits(y = 0) 
question7_data <- bob_ross_elements %>% group_by(season, element) %>% mutate(ElementCount= sum(prescence)) %>% group_by(element) %>% mutate(ElementTotal = sum(prescence)) %>% filter(ElementTotal>=90)


question7_data %>%
  ggplot(aes(x= season, y= ElementCount )) + 
  geom_line(color = "deepskyblue") + 
  facet_wrap(~element) +
  labs(y = "Number of paintings with element",
       title = "The content of Bob Ross paintings over time",
       subtitle = "Dashed line is number of episodes in the season") + 
  geom_hline(yintercept = 13, lty = 2, color = "grey70") + 
  theme_minimal() + 
  expand_limits(y = 0)

Many of the elements flucuate depending on the season. Tree and trees appear in high amounts throughout all seasons while most of the lesser used elements vary widely based on the season (like rivers).