Submission Details

Due date: the homework is due before class on Thursday.

Submission process: submit both the R Markdown file and the corresponding html file on canvas. Please submit both the .Rmd and the .html files separately and do not zip the two files together.


Spotify data

  1. Download the RMarkdown file with these homework instructions to use as a template for your work. Make sure to replace “Your Name” in the YAML with your name.

  2. Using the command below, read in the spotify data set into your R session.

spotify <- read.csv("https://raw.githubusercontent.com/Stat480-at-ISU/Stat480-at-ISU.github.io/master/homework/data/spotify.csv")
  1. Use one of our object inspecting functions and interpret the result in the data that you see.
dim(spotify)
## [1] 10000    13

This shows the dimensions of the matrix spotify. It has 13 columns and 1000 rows.

  1. Use the package ggplot2 to draw a barchart of the genres. In addition, map the genre categories to the fill color of the barchart.
ggplot(spotify, aes(x= playlist_genre)) + 
  geom_bar(aes(fill = playlist_genre))

This is a bar plot with playlist genre as the x variable and count of each genre as the y variable. The playlist genre is also mapped to the color of each bar. This plot should show the viewer the amount of times each playlist genre shows up in the data set.

  1. Use the package ggplot2 to draw a histogram of one of the continuous variables in the dataset. Use fill color to show the genre categories and adjust the binwidth if necessary. Use facet_wrap() to create a histogram for each of the genre categories.
ggplot(spotify, aes(x = tempo)) + 
  geom_histogram(binwidth = 5, aes(fill = playlist_genre)) + 
  facet_wrap("playlist_genre")

This is a histogram where tempo is mapped to the x variable and frequency of the binned values of tempo is mapped to the y variable. The playlist genre is again mapped to the color. This plot should show the viewer the distribution of tempo values for the various genres of music.

  1. Use the package ggplot2 to draw a boxplot to compare one of the continuous variables with the genre categories. Again, use fill color to show the genre categories.
ggplot(spotify, aes(x = playlist_genre, y = tempo)) +
  geom_boxplot(aes(fill = playlist_genre))

This is a boxplot where tempo is mapped to the y variable and genre is mapped to the x variable and to the fill color. This plot should show the viewer a where the median and the middle 50% of the values tempo are for each genre. Since edm has a high concentration of tempos around 125, it creates a very thin boxplot with a lot of outliers.

  1. For each of the three figures above, write a two-three sentence summary, describing the
    1. structure of the plot: what type of plot is it? Which variables are mapped to x, to y, and to the (fill) color?
    2. main message of the plot: what is your main finding, i.e. what do you want viewers to learn from the plot?
    3. additional message: point out anomalies or outliers, if there are any.