Heike Hofmann
group_by and summariseThe General Social Survey (GSS) has been run by NORC every other year since 1972 to keep track of current opinions across the United States.
An excerpt of the GSS data called happy is available from the classdata package:
devtools::install_github("heike/classdata")You can find a codebook with explanations for each of the variables at https://gssdataexplorer.norc.org/
library(tidyverse)
library(classdata)
data("happy", package="classdata")
happy %>% str()## 'data.frame': 62466 obs. of 11 variables:
## $ happy : Factor w/ 3 levels "not too happy",..: 1 1 2 1 2 2 1 1 2 2 ...
## $ year : int 1972 1972 1972 1972 1972 1972 1972 1972 1972 1972 ...
## $ age : num 23 70 48 27 61 26 28 27 21 30 ...
## $ sex : Factor w/ 2 levels "female","male": 1 2 1 1 1 2 2 2 1 1 ...
## $ marital : Factor w/ 5 levels "never married",..: 1 3 3 3 3 1 4 1 1 3 ...
## $ degree : Factor w/ 5 levels "lt high school",..: 4 1 2 4 2 2 2 4 2 2 ...
## $ finrela : Factor w/ 5 levels "far below average",..: 3 4 3 3 4 4 4 3 3 2 ...
## $ health : Factor w/ 4 levels "poor","fair",..: 3 2 4 3 3 3 4 3 4 2 ...
## $ polviews: Factor w/ 7 levels "extrmly conservative",..: NA NA NA NA NA NA NA NA NA NA ...
## $ partyid : Factor w/ 8 levels "strong republican",..: 5 6 4 6 7 5 5 5 7 7 ...
## $ wtssall : num 0.445 0.889 0.889 0.889 0.889 ...
HAPPYhappy %>%
ggplot(aes(x = happy)) + geom_bar()
agehappy %>% ggplot(aes(x = age)) + geom_histogram(binwidth=1)
degreehappy %>% ggplot(aes(x = degree)) + geom_bar()
Use scores for happy factor to summarise overall happiness level, i.e. not too happy = 1, pretty happy = 2, and very happy = 3
happy %>% summarise(
m.happy = mean(as.numeric(happy), na.rm=TRUE)
)## m.happy
## 1 2.186969
happy %>% group_by(sex) %>% summarise(
m.happy = mean(as.numeric(happy), na.rm=TRUE)
)## # A tibble: 2 x 2
## sex m.happy
## <fct> <dbl>
## 1 female 2.19
## 2 male 2.18

For this your turn use the happy data from the classdata package
partyid.
For this your turn use the happy data from the classdata package
n() provides the number of rows of a subset:happy %>% group_by(sex) %>% summarise(n = n())## # A tibble: 2 x 2
## sex n
## <fct> <int>
## 1 female 34904
## 2 male 27562
tally() is a combination of summarise and nhappy %>% group_by(sex) %>% tally()## # A tibble: 2 x 2
## sex n
## <fct> <int>
## 1 female 34904
## 2 male 27562
count() is a further shortcut of group_by and tally:happy %>% count(sex, degree)## # A tibble: 12 x 3
## sex degree n
## <fct> <fct> <int>
## 1 female lt high school 7500
## 2 female high school 18419
## 3 female junior college 2047
## 4 female bachelor 4731
## 5 female graduate 2112
## 6 female <NA> 95
## 7 male lt high school 5825
## 8 male high school 13598
## 9 male junior college 1425
## 10 male bachelor 4279
## 11 male graduate 2357
## 12 male <NA> 78