Due date: the homework is due before class on Thursday.
Submission process: submit both the R Markdown file and the corresponding html file on canvas. Please submit both the .Rmd
and the .html
files separately and do not zip the two files together.
Download the RMarkdown file with these homework instructions to use as a template for your work. Make sure to replace “Your Name” in the YAML with your name.
Using the command below, read in the spotify data set into your R session.
spotify <- read.csv("https://raw.githubusercontent.com/Stat480-at-ISU/Stat480-at-ISU.github.io/master/homework/data/spotify.csv")
str(spotify)
## 'data.frame': 10000 obs. of 13 variables:
## $ track_artist : Factor w/ 5041 levels "!!!","!deladap",..: 3969 387 222 1565 4562 366 3192 3756 3364 2433 ...
## $ track_popularity : int 30 20 2 44 54 67 54 58 13 56 ...
## $ playlist_genre : Factor w/ 6 levels "edm","latin",..: 6 3 3 5 6 5 2 1 5 2 ...
## $ playlist_subgenre: Factor w/ 24 levels "album rock","big room",..: 8 11 11 9 8 9 20 5 7 12 ...
## $ release_date : Factor w/ 2679 levels "1957-01-01","1963-05-27",..: 2587 1815 1724 2507 814 2561 2170 2465 1615 2298 ...
## $ duration_min : num 1.92 5.27 4.44 3.49 3.16 ...
## $ danceability : num 0.297 0.546 0.589 0.74 0.451 0.652 0.701 0.698 0.625 0.863 ...
## $ energy : num 0.974 0.591 0.846 0.721 0.884 0.862 0.772 0.895 0.805 0.627 ...
## $ loudness : num -4.1 -5.65 -5.46 -6.32 -3.45 ...
## $ speechiness : num 0.135 0.0284 0.0377 0.177 0.0335 0.206 0.184 0.101 0.388 0.206 ...
## $ acousticness : num 0.00366 0.0616 0.069 0.762 0.00177 0.161 0.0515 0.00764 0.552 0.0485 ...
## $ liveness : num 0.113 0.275 0.0904 0.0922 0.193 0.203 0.0959 0.217 0.108 0.0968 ...
## $ tempo : num 131.3 120 122 120 97.5 ...
This data set contains 13 variables and 10,000 observations. Of the 13 variables, 3 are of type factor
, 1 of type integer
, and the rest of numeric
.
ggplot2
to draw a barchart of the genres. In addition, map the genre categories to the fill color of the barchart.library(ggplot2)
ggplot(spotify, aes(x = playlist_genre, fill = playlist_genre)) + geom_bar()
This is a bar chart with the categorical variable playlist_genre
mapped to the x axis and to the fill color. The heights of the bars represent the number of songs that fall within that bar’s category. From this plot we gain the impression that there is roughly an equal number of songs for each category in this dataset.
ggplot2
to draw a histogram of one of the continuous variables in the dataset. Use fill color to show the genre categories and adjust the binwidth if necessary. Use facet_wrap()
to create a histogram for each of the genre categories.ggplot(spotify, aes(x = duration_min, fill = playlist_genre)) + geom_histogram()
ggplot(spotify, aes(x = duration_min, fill = playlist_genre)) + geom_histogram() + facet_wrap(~playlist_genre)
This is a histogram with the quantitative variable duration_min
mapped to the x axis and the categorical variable playlist_genre
mapped to the fill color. The heights of the bars represent the number of songs that have a duration within that bin. From this plot we gain the impression that most songs are 3-4 minutes long and there are only a few songs less than a minute and only a few songs more than 7 minutes. There is not a large difference in the distributions of the different genres.
ggplot2
to draw a scatterplot to compare the length of the song with the energy measure. Use color to show the genre categories.ggplot(spotify, aes(y = energy, x = duration_min, color = playlist_genre)) + geom_point()
This is a scatterplot with the quantitative variables duration_min
and energy
mapped to the x and y axis, respectively. In addition, the categorical variable playlist_genre
is mapped to the color of the points. From this plot we again gain the impression that most songs are 3-4 minutes long and there are only a few songs less than a minute and only a few songs more than 7 minutes. There is does not appear to be a relationship between duration_min
and energy
, nor is there a large difference in the different genres.