Due date: the homework is due before class on Thursday.
Submission process: submit both the R Markdown file and the corresponding html file on canvas. Please submit both the .Rmd
and the .html
files separately and do not zip the two files together.
Download the RMarkdown file with these homework instructions to use as a template for your work. Make sure to replace “Your Name” in the YAML with your name.
Using the command below, read in the spotify data set into your R session.
spotify <- read.csv("https://raw.githubusercontent.com/Stat480-at-ISU/Stat480-at-ISU.github.io/master/homework/data/spotify.csv")
dim(spotify)
## [1] 10000 13
This shows the dimensions of the matrix spotify. It has 13 columns and 1000 rows.
ggplot2
to draw a barchart of the genres. In addition, map the genre categories to the fill color of the barchart.ggplot(spotify, aes(x= playlist_genre)) +
geom_bar(aes(fill = playlist_genre))
This is a bar plot with playlist genre as the x variable and count of each genre as the y variable. The playlist genre is also mapped to the color of each bar. This plot should show the viewer the amount of times each playlist genre shows up in the data set.
ggplot2
to draw a histogram of one of the continuous variables in the dataset. Use fill color to show the genre categories and adjust the binwidth if necessary. Use facet_wrap()
to create a histogram for each of the genre categories.ggplot(spotify, aes(x = tempo)) +
geom_histogram(binwidth = 5, aes(fill = playlist_genre)) +
facet_wrap("playlist_genre")
This is a histogram where tempo is mapped to the x variable and frequency of the binned values of tempo is mapped to the y variable. The playlist genre is again mapped to the color. This plot should show the viewer the distribution of tempo values for the various genres of music.
ggplot2
to draw a boxplot to compare one of the continuous variables with the genre categories. Again, use fill color to show the genre categories.ggplot(spotify, aes(x = playlist_genre, y = tempo)) +
geom_boxplot(aes(fill = playlist_genre))
This is a boxplot where tempo is mapped to the y variable and genre is mapped to the x variable and to the fill color. This plot should show the viewer a where the median and the middle 50% of the values tempo are for each genre. Since edm has a high concentration of tempos around 125, it creates a very thin boxplot with a lot of outliers.