Due date: the homework is due before class on Thursday.
Submission process: submit both the R Markdown file and the corresponding html file on canvas. Please submit both the .Rmd
and the .html
files separately and do not zip the two files together.
# load necessary libraries
library(dplyr)
library(tidyr)
library(ggplot2)
library(forcats)
library(readr)
# read in the data
bob_ross <- read_csv('https://raw.githubusercontent.com/Stat480-at-ISU/Stat480-at-ISU.github.io/master/homework/data/bob-ross.csv')
alizarin_crimson
through burnt_umber
correspond to the binary presence (0 or 1) of that color in the painting. Gather all of these variables and create a long form of the data, introducing two new variables called color
and presence
. Save the result in a data frame called bob_ross_colors
.bob_ross_colors <- bob_ross %>% pivot_longer(cols= alizarin_crimson:burnt_umber, names_to = "Color", values_to= "presence")
bob_ross_colors
as your starting point and for each color calculate the number of times that color was used throughout the series. After using this number to reorder the levels of the variable color
, create a bar chart using the code below as your starting point and add in the necessary aesthetic mappings within ggplot(aes( ))
. Describe and summarize the chart.question4_data %>%
ggplot(aes( )) +
geom_bar(show.legend = FALSE) +
coord_flip() +
theme_minimal() +
labs(y = "Number of paintings",
x = "",
title = "Frequency of colors in Bob Ross Paintings") +
scale_fill_manual(values = c("#CD5C5C", "#8A3324", "#2C6436", "#3C67A7", "#643914", "#E7BD2F", "#546F1F", "#C36A4A", "#346BB1", "#B58A30", "#F8ED5F", "#372518", "#973B29"))
question4_data <- bob_ross_colors %>%
group_by(Color) %>%
summarise(ColorCount = sum(presence)) %>%
mutate(Color = fct_reorder(Color, ColorCount))
question4_data %>%
ggplot(aes(x = Color)) +
geom_bar(aes(weight=ColorCount, fill = Color), show.legend = FALSE) +
coord_flip() +
theme_minimal() +
labs(y = "Number of paintings",
x = "",
title = "Frequency of colors in Bob Ross Paintings") +
scale_fill_manual(values = c("#CD5C5C", "#8A3324", "#2C6436", "#3C67A7", "#643914", "#E7BD2F", "#546F1F", "#C36A4A", "#346BB1", "#B58A30", "#F8ED5F", "#372518", "#973B29"))
Ross uses alizarin crimson and van dyke brown the most, with it showing up in over 350 of his paintings. Most of the other colors show up in about the same amount of paintings ranging from about 275 to 350. Finally, indian red shows up in the least amount of paintings.
aurora_borealis
through winter
correspond to the binary presence (0 or 1) of that element in the painting. Use pivot_longer()
as shown in class to transform the data into a tidier format with new variables element
and presence
. Save the result in a data frame called bob_ross_elements
.bob_ross_elements <- bob_ross %>% pivot_longer(cols= aurora_borealis:winter, names_to= "element", values_to = "prescence")
bob_ross_elements
as your starting point and for each element calculate the number of times that element was included. Then use this number to reorder the levels of element
. Exclude elements that were featured in fewer than 50 paintings and create a bar chart. Use the code below as your starting point and add in the necessary aesthetic mappings within ggplot(aes( ))
. Describe and summarize the chart.question6_data %>%
ggplot(aes( )) +
geom_bar(fill = "seagreen") +
coord_flip() +
theme_minimal() +
labs(y = "Number of paintings",
x = "",
title = "What were most common features in Bob Ross paintings?",
subtitle = "Paintings by the numbers")
question6_data <- bob_ross_elements %>% group_by(element) %>% summarise(ElementCount = sum(prescence)) %>% filter(ElementCount>= 50) %>% mutate(element = fct_reorder(element, ElementCount))
question6_data %>%
ggplot(aes(x = element)) +
geom_bar(fill = "seagreen",aes(weight=ElementCount)) +
coord_flip() +
theme_minimal() +
labs(y = "Number of paintings",
x = "",
title = "What were most common features in Bob Ross paintings?",
subtitle = "Paintings by the numbers")
Some form of tree appears in the highest number of paintings with variations of trees making up the top four elements. Other common elements include clouds, mountains, lakes, and grass which show up in about 150 paintings each. Finally, some elements such as cabins, winter, and snow only appear in about 50 paintings a piece.
bob_ross_elements
as your starting point and for each season and element, calculate the number of times an element was included. Then calculate the total number of times an element. Exclude elements that were included in less than 90 paintings total. Create a line plot showing number of times an element was included for each season with season on the x-axis and facet by element. Use the code below as your starting point and add in the necessary aesthetic mappings within ggplot(aes( ))
and add in the faceting. Describe and summarize the chart.question7_data %>%
ggplot(aes( )) +
geom_line(color = "deepskyblue") +
# add faceting here
labs(y = "Number of paintings with element",
title = "The content of Bob Ross paintings over time",
subtitle = "Dashed line is number of episodes in the season") +
geom_hline(yintercept = 13, lty = 2, color = "grey70") +
theme_minimal() +
expand_limits(y = 0)
question7_data <- bob_ross_elements %>% group_by(season, element) %>% mutate(ElementCount= sum(prescence)) %>% group_by(element) %>% mutate(ElementTotal = sum(prescence)) %>% filter(ElementTotal>=90)
question7_data %>%
ggplot(aes(x= season, y= ElementCount )) +
geom_line(color = "deepskyblue") +
facet_wrap(~element) +
labs(y = "Number of paintings with element",
title = "The content of Bob Ross paintings over time",
subtitle = "Dashed line is number of episodes in the season") +
geom_hline(yintercept = 13, lty = 2, color = "grey70") +
theme_minimal() +
expand_limits(y = 0)
Many of the elements flucuate depending on the season. Tree and trees appear in high amounts throughout all seasons while most of the lesser used elements vary widely based on the season (like rivers).