THIS IS AN INDIVIDUAL ASSESSMENT, THIS DOCUMENT AND YOUR ANSWERS ARE FOR YOUR EYES ONLY. ANY VIOLATION OF THIS POLICY WILL BE IMMEDIATELY REPORTED.
Replace the underscores below with your name acknowledging that you have read and understood your institution’s academic misconduct policy.
I, ____________, hereby state that I have not communicated with or gained information in any way from my classmates or anyone other than the Professor or TA during this exam, and that all work is my own.
The coronavirus pandemic has sickened more than 1.4 million people, according to official counts. Here, we will explore both the global and local growth of COVID-19 using data sourced on April 8th, 2020.
This data set contains information on some of the first fully recovered cases of COVID-19. We will look at the time it took these patients to recover, defined as the number of days between a confirmed test and an official discharge date. The data is available at https://raw.githubusercontent.com/Stat480-at-ISU/Stat480-at-ISU.github.io/master/exams/data/covid19-recovered.csv
recovery_data <- readr::read_csv("https://raw.githubusercontent.com/Stat480-at-ISU/Stat480-at-ISU.github.io/master/exams/data/covid19-recovered.csv")
## your answer here
In order to continue with an analysis of this data, we should make some modifications to it.
tidyverse
package to make the following modifications:
confirmed
and discharged
into variables of type “date”.recovery
.recovery
as the number of days between confirmed
and discharged
and save as recovery_days
.category
from type character
to type factor
.recovered
and use this data for the remaining questions in part I.## your answer here
## your answer here
## your answer here
## your answer here
If indeed infected, how long would it take for you to be free of the novel coronavirus?
ggplot2
to look at the distribution of the variable recovery
(you may need to adjust the size of the bins).## your answer here
age_blks
from age
that introduces age categories that groups the ages of the patients into intervals: < 10, 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, and >80. (see ?cut).age_blks
to the fill aesthetic.## your answer here
gender
to the aesthetic fill
.## your answer here
## your answer here
## your answer here
## your answer here
Province/State
, Country/Region
, Lat
, and Long
to be province
, country
, lat
, and long
, respectively.## your answer here
## your answer here
## your answer here
## your answer here
In order to continue with an analysis of this data, we should reshape it.
tidyverse
package to modify the shape and form of the data:
dplyr
to remove the lat
and long
variables from the cases
data.tidyr
package to move from wide format into long format where each row represents the number of confirmed cases on a particular date for each country-province pair.lubridate
to convert the variable date
from a string into an object of type date
.cases_long
.## your answer here
cases_by_country
. Plan of attack:
cases_long
.cases_by_country
.## your answer here
cases_by_country.
ggplot2
to plot the number of confirmed cases for each of the nine countries over time.country
to color and use the function fct_reorder2()
from the forcats
package to align the colors of the lines with the colors in the legend.+ scale_y_continuous(labels = scales::comma)
.## your answer here
ggplot2
to create a barchart of the number of cases for the top nine countries for the two dates, sorted according to the total number of cases in that country.## your answer here
## your answer here
cases100
.cases100
that contains only the last date and save as cases100_last
.cases100
and cases100_last
, recreate the visualization below.## your answer here